Cortical Coupling Reflects Bayesian Belief Updating in the Deployment of Spatial Attention

The deployment of visuospatial attention and the programming of saccades are governed by the inferred likelihood of events. In the present study, we combined computational modeling of psychophysical data with fMRI to characterize the computational and neural mechanisms underlying this flexible attentional control. Sixteen healthy human subjects performed a modified version of Posner's location-cueing paradigm in which the percentage of cue validity varied in time and the targets required saccadic responses. Trialwise estimates of the certainty (precision) of the prediction that the target would appear at the cued location were derived from a hierarchical Bayesian model fitted to individual trialwise saccadic response speeds. Trial-specific model parameters then entered analyses of fMRI data as parametric regressors. Moreover, dynamic causal modeling (DCM) was performed to identify the most likely functional architecture of the attentional reorienting network and its modulation by (Bayes-optimal) precision-dependent attention. While the frontal eye fields (FEFs), intraparietal sulcus, and temporoparietal junction (TPJ) of both hemispheres showed higher activity on invalid relative to valid trials, reorienting responses in right FEF, TPJ, and the putamen were significantly modulated by precision-dependent attention. Our DCM results suggested that the precision of predictability underlies the attentional modulation of the coupling of TPJ with FEF and the putamen. Our results shed new light on the computational architecture and neuronal network dynamics underlying the context-sensitive deployment of visuospatial attention. SIGNIFICANCE STATEMENT Spatial attention and its neural correlates in the human brain have been studied extensively with the help of fMRI and cueing paradigms in which the location of targets is pre-cued on a trial-by-trial basis. One aspect that has so far been neglected concerns the question of how the brain forms attentional expectancies when no a priori probability information is available but needs to be inferred from observations. This study elucidates the computational and neural mechanisms under which probabilistic inference governs attentional deployment. Our results show that Bayesian belief updating explains changes in cortical connectivity; in that directional influences from the temporoparietal junction on the frontal eye fields and the putamen were modulated by (Bayes-optimal) updates.


Introduction
Saccades enable us to explore our visual environment efficiently and to focus on informative cues by foveal sampling with the highest visual acuity. Eye movements are crucially related to co-vert attention shifts (Rizzolatti et al., 1987;Awh et al., 2006), which precede saccades to select salient visual targets and adjust oculomotor programming. Therefore, eye movements and co-vert attention shifts share a common neuroanatomy (Corbetta et al., 1998;Nobre et al., 2000;Beauchamp et al., 2001;Fairhall et al., 2009).
Prior beliefs about the location of behaviorally relevant stimuli such as imminent saccade targets facilitate stimulus detection and increase response speed (RS). Prior beliefs can be induced by spatial cues, which indicate the probability of a target's location (percentage of cue validity, %CV) on a trial-by-trial basis (Posner, 1980). Sensory events that violate prior beliefs (invalid trials on which the target appears at the uncued location) elicit a response in a ventral frontoparietal network (comprising the temporoparietal junction, TPJ), but also in dorsal regions such as the frontal eye fields (FEFs) and intraparietal sulcus (IPS) (Corbetta et al., 2008;Corbetta and Shulman, 2011). In addition, behavioral performance and BOLD responses within these networks are modulated by the precision or predictability (%CV) (Vossel et al., 2006). The TPJ exhibits higher reorienting-related activity when the cue predicts the target location with a high probability (Vossel et al., 2012). The predictability of reorienting also affects activity in the striatum, insula, and frontal cortex (Shulman et al., 2009). In summary, attentional effects, both neuronally and behaviorally, are highly sensitive to probabilistic context; namely, the predictability or precision of prediction errors. However, the formal (computational) nature of these effects, and how they are implemented physiologically, has not been resolved as yet. Therefore, this study investigated the origin of these probabilitydependent effects in terms of their computations and connectivity patterns within cortical networks.
In contrast to the classic location-cueing paradigm, in which subjects are explicitly informed about %CV, the likelihood of events in real life needs to be continuously inferred on the basis of recent observations. There is now considerable evidence that this inference process can be plausibly represented by hierarchical Bayesian models that provide a principled/normative description of how beliefs are updated optimally in the presence of new data (Behrens et al., 2007;Nassar et al., 2010;Payzan-LeNestour and Bossaerts, 2011;Iglesias et al., 2013). In the particular context of saccades, using an adapted version of Posner's location-cueing paradigm with unpredictable changes in %CV, RSs were found to covary with trialwise quantities derived from such a model (Vossel et al., 2014a). Although this suggests that the effects of predictability in the location-cueing paradigm follow Bayesian principles, it remains to be established whether-and in which brain areas-these computations are implemented neuronally. Moreover, it remains to be determined whether a Bayesian model outperforms simpler models of brain responses.
In the present study, we combined computational modeling (Vossel et al., 2014a(Vossel et al., , 2014b with fMRI to investigate which neurophysiological processes may implement Bayesian inference for the deployment of attention and the programming of saccadic eye movements. We used a hierarchical Bayesian learning model (Mathys et al., , 2014 to quantify the subjects' trialwise beliefs formally and used brain responses to identify the neuronal correlates of these beliefs. Accordingly, trialwise parameters from our computational model of behavioral responses were used as parametric explanatory variables in a general linear model (GLM) of our fMRI data. We then tested for modulation of neuronal activity in the areas of the frontoparietal attention networks and/or the striatum. The striatum is implicated in the learning of stimulus associations, actions, and rewards (Liljeholm and O'Doherty, 2012) and might contribute to attentional reorienting when reorienting is unexpected (Shulman et al., 2009). The parameters from the Bayesian model of behavior were also used as modulatory variables in analyses of effec-tive connectivity to determine how Bayesian inference is mediated by connectivity within cortical or corticostriatal networks.

Materials and Methods
Subjects. Eighteen healthy volunteers gave written informed consent to participate in the current study. Two subjects had to be excluded from further analysis due to technical difficulties with eye tracking in the MRI environment. Therefore, data from 16 subjects were analyzed (8 males, 8 females; age range, 19 -31 years; mean age 24.9 years). All subjects were right-handed and had normal or corrected to normal vision. The study was approved by the local ethics committee (University College London).
Stimuli and experimental paradigm. Stimuli were presented using a video projection screen mounted at the back of the magnet bore. Participants viewed the screen via a mirror system attached to the head coil. A location-cueing paradigm with central predictive cueing was used (Posner, 1980). On each trial, two peripherally located boxes were shown (2°w ide and 8.4°eccentric in each visual field; Fig. 1A) that could contain target stimuli. A central diamond (0.7°eccentric in each visual field) was placed between them, serving as a fixation point. Cues were signaled by a 200 ms period of increasing brightness of one side of the diamond, creating an arrowhead pointing to one of the peripheral boxes. After an 800 ms stimulus onset asynchrony, the target, a vertical circular sinusoidal grating, appeared for 200 ms in one of the boxes.
Subjects were instructed to maintain central fixation during the cue period and to make a saccade to the target stimulus as fast as possible. They were encouraged to blink and refixate the central fixation dot after the saccade. On a separate day before the fMRI experiment, each subject completed two short practice sessions (one session with 100 trials with constant 80% predictive validity (%CV) and one session with 121 trials with changes in %CV). The fMRI experiment comprised 612 trials with blockwise changes in %CV that were unknown to the subjects. Each block with constant %CV contained an equal number of left and right targets, counterbalanced across valid and invalid trials. In addition, to optimize the statistical efficiency of our design, 192 null events (in which only the fixation dot and the two peripheral boxes were shown) were randomly intermixed with the experimental trials. %CV changed every 32-36 trials, switching randomly among levels of 88%, 69%, and 50% (Fig. 1B). Subjects were told in advance that there would be changes in %CV over the course of the experiment, but were not informed about the levels of these probabilities or when they would change. Each subject was presented with the same sequence of trials. This is a standard procedure in computational studies of trial-by-trial learning (Behrens et al., 2007;Daunizeau et al., 2010b;Iglesias et al., 2013) because the parameters of the learning process depend on the exact sequence of trials. This dependency will diminish asymptotically with an increasing number of trials. However, for the relatively short sequences (of a few hundred trials at best) that are feasible within a standard experiment, different trial sequences per participant could increase the variability of parameter estimates over and above the intrinsic interindividual differences per se. We therefore decided to keep the trial sequence constant to ensure that differences in model parameters can be attributed to subject-specific rather than task-specific factors. During the experiment, the subjects had four short rests of 50 s each in which the word "pause" was shown on the display. In total, the fMRI session lasted for 39 min.
Eye movement data recording and analysis. Eye movements were recorded from the right eye with an EyeLink 1000 MR-compatible eyetracker (SR Research) with a sampling rate of 1000 Hz. A 9-or 5-point eye-tracker calibration and validation were performed at the start of the experiment. The validation error was Ͻ1°of visual angle.
Eye movement data were analyzed with MATLAB (The MathWorks) and ILAB (Gitelman, 2002). Blinks were filtered out and pupil coordinates within a time window of 20 ms around the blink were removed. Trials with Ͼ20% missing data were discarded from the analyses. After target appearance, only the first saccade was analyzed. Saccades were identified when the eye velocity exceeded 30°/s (Fischer et al., 1993;Stampe, 1993). Moreover, the saccade amplitude needed to subtend at least 2/3 of the distance between fixation point and the target location. Saccadic RT was defined as the latency between target and saccade onset. Saccades in which the starting position was not within a region of 1°from the fixation point; saccades with a latency Ͻ90 ms (i.e., anticipated re-sponses) were discarded. Our analyses focused on inverse response times (RTs) or RSs (RSs) because, in contrast to RTs, RSs are distributed normally (c.f. Carpenter and Williams, 1995;Brodersen et al., 2008).
In a first analysis, mean RSs were analyzed as a function of validity (valid/invalid) and true %CV (50%/69%/88%) using a within-subject ANOVA. Results from this analysis are reported at a significance level of p Ͻ 0.05 after Greenhouse-Geisser correction. This analysis was used to test for significant effects of predictability, which we then sought to model in terms of trialwise Bayesian belief updating.
Single-trial RSs were used to estimate parameters from a hierarchical Bayesian learning scheme. Herein, we will refer to this Bayesian hierarchical model as the perceptual model because this model provides a mapping from experimental causes to observations (Daunizeau et al., 2010a(Daunizeau et al., , 2010bVossel et al., 2014a;Fig. 2). Please note that, in our case, the observations do not represent the physical visual inputs (i.e., left and right cues and targets), but rather are defined at a higher level of abstraction in terms of the cue-target relationship (i.e., targets appearing at the cued or uncued location, respectively). However, as demonstrated in previous work (see derivation in the supplementary material to Iglesias et al., 2013), this formulation is formally identical to separately modeling two belief trajectories (for the two possible outcomes). The present formulation has the advantage of requiring a single belief trajectory only, allowing for a more compact model.
In contrast, the response model describes the mapping from the subjects' beliefs, as derived from the perceptual model, to their responses as observed by the experimenter (i.e., saccadic RS; Fig. 2).
The perceptual model comprises three states denoted by x (Fig. 2). The state x 1 ͑t͒ represents the observation/environmental state of each trial, which, in the present paradigm, consisted of either a validly or invalidly cued saccade target (with x 1 ͑t͒ ϭ 1 for valid and x 1 ͑t͒ ϭ 0 for invalid trials).
The probability distribution of x 1 ͑t͒ ϭ 1 is a Bernoulli distribution governed by a sigmoidal transformation of the next higher state x 2 ͑t͒ , which in turn changes over time as a Gaussian random walk. The volatility of x 2 ͑t͒ (i.e., how fast x 2 ͑t͒ changes after new observations) is determined by two quantities: x 3 ͑t͒ (the state of the next upper level of the hierarchy) and a subject-specific parameter . The third state x 3 ͑t͒ also changes as a Gaussian random walk, with the dispersion of the random walk being determined by a second subject-specific parameter . The values of the subject-specific parameters and were estimated from the individual RS data (see below).
To infer the probabilistic representations of the subject from environmental states, the perceptual model needs to be inverted; this yields the posterior densities of the three hidden states x ͑t͒ . In the following, the sufficient statistics of the subject's posterior belief will be denoted by ͑t͒ (mean) and ͑t͒ (variance) or ͑t͒ ϭ 1 ͑t͒ (precision). We use the hat symbol (^) to denote predictions before the observation of x 1 ͑t͒ on a given trial t. As described in detail in , variational model inversion under a mean field approximation yields simple analytical update equations in which belief updating rests on precision-weighted prediction errors. These update equations provide approximately Bayes-optimal rules for the trialby-trial updating of the beliefs. In this experiment, they provide us with the subject's estimate of the probability that the target appears at Figure 1. Experimental design and behavioral results. A, Exemplary illustration of a trial with a right, validly cued, target. In invalid trials, the target appeared at the location opposite to that cued. The subjects were instructed to maintain central fixation during the cue period and to make a saccade to the target stimulus as fast as possible. B, Trial-by-trial changes in precision-dependent attention ␣͑ 1 ͑t͒ ͒ that reflects the precision of the subject's belief that the target will appear at the cued location (black line) in relation to the experimentally manipulated cue validity (%CV, shaded areas). For this graph, ␣͑ 1 ͑t͒ ͒ was calculated on the basis of the average parameter estimates over all subjects. C, Mean RS for valid and invalid trials as a function of true (unknown) percentage of cue validity %CV. Error bars indicate SEM. D, Observed and predicted saccadic RSs as a function of precision-dependent attention ␣͑ 1 ͑t͒ ͒ derived from the hierarchical Bayesian learning scheme. For this graph, ␣͑ 1 ͑t͒ ͒ and predicted RS were calculated on the basis of group average values of the model parameters. Error bars indicate SEM.
the cued location on a particular trial (note that this is an individualized approximate Bayes optimality in reference to the subjectspecific values for the parameters and ).
A response model was used to map the subject's posterior beliefs to observed responses (Fig. 2). In our previous work (Vossel et al., 2014a), we compared three alternative response models and found that the most plausible model (the model with the highest evidence) was based upon the trialwise precision of the prediction at the first level of the perceptual model, 1 ͑t͒ . In this model, the precision 1 ͑t͒ determines the amount of attentional resources allocated to the cued location, ␣͑ 1 ͑t͒ ͒, which varies between 0 and 1. Trialwise RSs can then be described as a linear function of ␣͑ 1 ͑t͒ ͒ as follows: ͑t͒ quantifies the precision of the prediction at the first level of the model before the observation of the target in trial t; that is, the precision of the prediction that the target will appear at the cued location. In our specific case, 1 ͑t͒ has a minimal value of 4 when 1 ͑t͒ ϭ 0.5 (both target locations are equally likely) and approaches infinity as 1 ͑t͒ approaches 1. The most parsimonious way to meet the constraints of the response model (namely that the amount of attentional resources ␣ ͑t͒ varies between 0 and 1 and amounts to 0.5 for 1 ͑t͒ ϭ 0.5) is to equate ␣ ͑t͒ with a logistic function of 1 ͑t͒ minus its minimum, ␣͑ 1 ͑t͒ ͒ ϭ s͑ 1 ͑t͒ Ϫ 4͒. Because the cue becomes a counterindication of outcome location when 2 ͑tϪ1͒ falls below 0 (or equivalently, when 1 ͑t͒ drops below 0.5), a suitable definition of ␣ for the whole range of 1 In the response model as outlined above ␣͑ 1 ͑t͒ ͒ determines trialwise RS according to a linear function. The linear relationship depends on the subject-specific response model parameters, , which are estimated from the data. While 1v and 1i determine the constants of the linear equation (i.e., the overall levels of RSs), 2 parametrizes the slope of the affine function (i.e., the strength of the increase in RS with increased precisiondependent attention ␣͑ 1 ͑t͒ ͒). The perceptual model parameters and , as well as the response model parameters 1v , 1i , and 2 were estimated from the trialwise RS measures using variational Bayes as implemented in the HGF toolbox (http://www.translationalneuromodeling.org/tapas/). Variational Bayes optimizes the (negative) free-energy F as a lower bound on the log evidence, such that maximizing F minimizes the Kullback-Leibler divergence between exact and approximate posterior distributions or, equivalently, the surprise about the inputs encountered (for details, see Friston et al., 2007).
Log-model evidence values can also be used to compare alternative models (Kass and Raftery, 1995). The relative differences between logevidence values of different models (summed over individual subjects in a fixed-effects approach; Stephan et al., 2009) can be expressed as posterior probabilities of the model given the observed data. Here, we compared the hierarchical Bayesian model with a standard Rescorla-Wagner learning model (Rescorla and Wagner, 1972) in which the update of the probability estimate is the product of a fixed learning rate and a prediction error (i.e., the difference between the observed and predicted outcome). The learning rate was estimated from RSs by assuming a linear relationship with the estimated cue probability (analogous to the response model of the Bayesian model outlined above). Furthermore, we compared the Bayes and Rescorla-Wagner model with a model that assumed that RSs were explained by the true %CV levels.
MRI data acquisition. T2*-weighted echoplanar (EPI) images with BOLD contrast (matrix size 64 ϫ 64, voxel size 3 ϫ 3 ϫ 3 mm 3 ) were obtained using a 3 T MRI System (Trio; Siemens). Before the functional scans, a B0 field map was acquired using a double-echo FLASH sequence for distortion correction of the acquired EPI images (Weiskopf et al., 2006). Field maps were estimated from the phase difference between the images acquired at the short and long TE using the FieldMap toolbox (Hutton et al., 2002). Additional high-resolution anatomical images (voxel size 1 ϫ 1 ϫ 1 mm 3 ) were acquired using a T1-weighted 3D MDEFT sequence (Deichmann et al., 2004).
A total of 825 EPI volumes, each consisting of 40 axial slices were acquired sequentially (repetition time 2.8 s, echo time 30 ms). The first five volumes were discarded to allow for T1 equilibration effects. The data were preprocessed and analyzed with Statistical Parametric Mapping software SPM8 (Wellcome Department of Imaging Neuroscience, London; Friston et al., 1995; http://www.fil.ion.ucl.ac.uk/spm). Images were bias corrected. To correct for interscan movement, the images were spatially realigned to the first of the remaining 820 volumes and subse- ͑t͒ ͒ that depends on the precision of the prediction at the first level of the perceptual model. Circles represent constants; diamonds represent quantities that change over time (trials); hexagons, like diamonds, represent quantities that change in time but that additionally depend on their previous state in time in a Markovian fashion.
quently rerealigned to the mean of all images. The mean EPI image for each subject was then spatially normalized to the MNI single subject template using the "unified segmentation" function in SPM8. The ensuing deformation was subsequently applied to the individual EPI volumes and the T1 scan, which was coregistered to the mean of the realigned EPIs. All images were thereby transformed into standard stereotaxic space and resampled into 2 ϫ 2 ϫ 2 mm 3 voxels. The normalized images were spatially smoothed using an 8 mm full-width half-maximum Gaussian kernel.
During scanning, peripheral measurements of subject pulse and breathing were acquired, together with scanner slice synchronization pulses using the Spike2 data acquisition system (Cambridge Electronic Design). The cardiac pulse signal was measured using an MRI-compatible pulse oximeter attached to the subject's finger. The respiratory signal (thoracic movement) was monitored using a pneumatic belt around the abdomen, close to the diaphragm. A physiological noise model was used to account for artifacts related to cardiac and respiratory phase and changes in respiratory volume (Hutton et al., 2011). Models for cardiac and respiratory phase and their aliased harmonics were based on RETROICOR (Glover et al., 2000). A Fourier series basis set, extending to the third harmonic, was used to model the physiological fluctuations. Additional terms were included to model changes in respiratory volume (Birn et al., 2006(Birn et al., , 2008 and heart rate (Chang and Glover, 2009). This resulted in a total of 14 regressors, which were included as confounds in the first-level analysis for each subject (see below).
Statistical analysis of imaging data. Data were analyzed using a randomeffects general linear model (GLM). Four regressors of interest were defined at the single-subject level (valid and invalid trials for left and right targets, respectively). For each of these regressors, two parametric modulators were defined. The first parametric modulator was the (subject-specific) attentional weight (precision) for the cued location, ␣͑ 1 ͑t͒ ͒. The second regressor was the (subject-specific) volatility estimate ( 3 ͑t͒ ), orthogonalized with respect to ␣͑ 1 ͑t͒ ͒ (cf. den . Error trials (anticipated responses and incorrect/missing responses) were modeled separately. Events were time locked to the onset of the target and the resulting stimulus functions were convolved with a canonical hemodynamic response function (and its first and second derivative). The four rest periods, six movement parameters of the (rigid body) realignment, and the physiological regressors (see above) were included in the design matrix as additional regressors. Data were high-pass filtered at 1/128 Hz. For each subject, 12 condition-specific contrast images were created (for each trial type and parametric modulator).
For the main hemodynamic response function (HRF) regressor and the two parametric regressors precision ␣͑ 1 ͑t͒ ͒ and volatility 3 ͑t͒ , the respective contrast images were analyzed according to 2 (validity: valid/ invalid) ϫ 2 (hemifield of target presentation: left/right) within-subject second-level ANOVAs. We focused on the analysis of main effects of validity, as well as its interaction with hemifield, by using planned t-contrasts. Moreover, to characterize the saccade network in the present study, we tested for a positive effect of the HRF regressor across all four conditions (valid and invalid left and right trials) in relation to the implicit baseline. In the analyses of the parametric regressors, we also tested for positive or negative effects across all four conditions. All contrasts were thresholded at p Ͻ 0.05 familywise error whole-brain corrected at the cluster-level (with a voxel-level cutoff of p Ͻ 0.001).
Bayesian model selection of alternative GLMs. To determine whether the observed responses in the areas revealed by the ANOVA on the ␣͑ 1 ͑t͒ ͒ regressor were best explained by precision-dependent attention as derived from the hierarchical Bayesian model, we calculated logevidence maps in these regions for the first-level GLM with precisiondependent attention ␣͑ 1 ͑t͒ ͒ for each subject . Moreover, log-evidence maps were calculated for GLMs with probability estimates derived from a Rescorla-Wagner model and from a model with the true probabilities as parametric regressors. The log-evidence maps from these three alternative models in the clusters of interest were compared at the second level using Bayesian model selection to evaluate group-level posterior probabilities .
DCM. To investigate effective connectivity and compare different models of functional architecture, DCM was performed using SPM12 (r6225).

Time series extraction.
DCMs were fitted to distributed BOLD time series from individual subjects. Subject-specific time series were extracted from specific ROIs that were selected on the basis of the group GLM analysis. Time series were extracted from the nearest local maximum within a radius of 8 mm from the group maximum in right FEF, TPJ, and the putamen. The first eigenvariate was then computed across all voxels within 6 mm of the subjectspecific maximum. The resulting time series were adjusted for effects of no interest (e.g., rest periods, error trials) and physiological confounds so that the analyses focused on BOLD responses reflecting the effects of valid and invalid cueing and the modulation by precision (effects of the volatility regressor were also excluded).
Specification of DCMs and the model space. On the basis of our GLM results, we specified bilinear deterministic DCMs . DCMs are defined in terms of fixed (endogenous) connections between brain areas and input-specific changes in the strength of these connections (i.e., modulatory or bilinear effects). In the present analysis, we focused on the connectivity between those three regions in which the response to invalidly versus validly cued targets was modulated by ␣͑ 1 ͑t͒ ͒ according to our GLM results (i.e., areas FEF, TPJ, and putamen in the right hemisphere). In all of the models we compared, we assumed full endogenous connectivity among the three areas (see Fig. 5). The target stimuli (collapsed over left and right validly and invalidly cued targets) were used as driving inputs. Driving input was assumed to enter the putamen because it receives visual input from the superior colliculi and thalamus (Redgrave and Gurney, 2006). Furthermore, we explored whether additional driving inputs into FEF and/or TPJ, respectively, improved the model. For simplicity, we did not include lower-level visual areas in our DCMs, focusing instead on the subgraph modeling of the attentional network engaged by our paradigm. Because the ranking of FEF and TPJ within the visual hierarchy is unclear, we specified three model families to determine whether the sensory input drives: (1) putamen and FEF; (2) putamen and TPJ; or (3) putamen, FEF and TPJ. For each of these three families, we specified models with different modulatory (bilinear) effects (see Fig. 5). Specifically, we tested whether the source of the precision-dependent processing was located in the putamen, FEF, or TPJ and how self-connections or efferent connections between the source region and connected regions changed as a function of ␣͑ 1 ͑t͒ ͒ on valid and invalid trials. Systematic combinations of the source of precision and its modulation of efferent connections resulted in 12 models for each visual input family (see Fig. 5). Because ␣͑ 1 ͑t͒ ͒ reflects the number of precision-dependent attentional re- sources directed to the cued location, we expected differential modulatory effects for valid and invalid trials: namely, negative modulation on valid trials and positive modulation on invalid trials (compare the results for the contrast of parametric regressors in the GLM analysis). Note that the DCM parameters for fixed and bilinear connections have zero prior means so that the directions of the modulations are not a priori included in the models. Model selection and parameter inference. We used a two-step fixed-effects approach to Bayesian model selection to determine which model best explained our observed responses in putamen, FEF, and TPJ. This analysis assumes that all subjects engage the same network for attentional deployment, but with different connection strengths . In a first step, we used family-level inference  to determine whether models with visual input into putamen and FEF, putamen and TPJ, or all three areas best explained the observed data. Second, the models of the winning family were compared to identify the most plausible DCM of precision-related effects. The parameters of the winning DCM were summarized by Bayesian parameter averaging, which computes a joint posterior density for the entire group by combining the individual posterior densities (Neumann and Lohmann, 2003;Garrido et al., 2007).

Behavioral data
The percentage of anticipated responses, incorrect or missing saccades, and saccades not starting from the fixation zone amounted to 1.6% (Ϯ 0.5 SEM), 2.3% (Ϯ 0.4), and 4.4% (Ϯ 1.2), respectively. These trials were excluded from further analysis of behavioral data and were modeled separately in the analysis of the fMRI data.
The comparison of the relative logmodel evidences between the hierarchical Bayesian model, a Rescorla-Wagner model and a model informed by true %CV values revealed that the Bayesian model was clearly superior to the alternative models in explaining variations in RS (posterior probability of the Bayesian model ϭ 1.0).

fMRI data
Whole-brain SPM analysis In a first step, contrast images of the main HRF regressor were analyzed according to a 2 (validity: valid/invalid) ϫ 2 (hemifield: left/right) factorial design (Table 1). Moreover, those brain areas that were generally more active in the experimental task than in the implicit baseline were identified. This contrast disclosed activation in bilateral precentral gyrus/FEF, left IPS, bilateral cerebellum, bilateral V1 and V5, as well as in the bilateral putamen.
The network that was activated by invalid versus valid cues is shown in Figure 3 and comprised bilateral precentral gyrus/ FEF (extending into ventral frontal cortex), IPS (extending into the inferior parietal lobe and into the postcentral gyrus in the left hemisphere), and middle temporal gyrus (see Table 1 for a complete list of activated regions and MNI coordinates). Figure 3B shows the overlay between the activity related to eye movements per se (as tested using the contrast of all trials vs baseline above, depicted in blue) and the effects of validity in the FEF. No significant effects were obtained when testing for interaction effects of validity with the hemifield of target presentation.
The ANOVA on the precision-dependent attention regressor ␣͑ 1 ͑t͒ ͒ did not reveal positive or negative effects of ␣͑ 1 ͑t͒ ͒ across all trials (Table 2). However, we observed a significant main effect of validity: precision-dependent attentional reorienting effects were expressed in the right FEF (x ϭ 42, y ϭ 4, z ϭ 42, 314 voxels, Z ϭ 3.9), right TPJ (x ϭ 46, y ϭ Ϫ46, z ϭ 6, 214 voxels, Z ϭ 3.75) and the right anterior putamen (x ϭ 22, y ϭ 16, z ϭ 4, 173 voxels, Z ϭ 4.38) (see red activations in Fig.  3C). In these areas, the sensitivity to ␣͑ 1 ͑t͒ ͒was significantly higher for invalid than for valid trials, with positive slopes for invalid and negative slopes for valid trials (cf. bar charts in Fig. 3C). In other words, higher confidence or a more precise prediction that the target would appear at the cued location decreased BOLD amplitudes on valid trials (when the prediction was fulfilled) and increased BOLD amplitudes on invalid trials (when the target appeared at the uncued location and the prediction was violated). Again, there was no interaction effect with the hemifield of target appearance. The analyses of parametric effects with the trial-specific volatility estimates ( 3 ͑t͒ ) did not reveal any significant effects.

Bayesian model selection of alternative GLMs
In analogy to the model comparison of the behavioral data, we tested whether brain responses in FEF, TPJ, and the putamen were best   figure). In a first step, the three model families were compared to reveal the most likely sources of driving inputs. Subsequently, the optimal model within the winning family was determined. PUT, Putamen. explained by the hierarchical Bayesian model, or if simpler models such as the Rescorla-Wager model or a model based on true %CV would provide better explanations. Voxelwise log-evidence maps were calculated for the three alternative models in the three ROIs in each subject and compared at the second level. In all three regions, the posterior probabilities were clearly higher for the Bayesian model than for the other two models (Fig. 4) and there were no voxels in which responses were better explained by the Rescorla-Wagner or true %CV model.

DCM
DCM was used to shed light on the context-sensitive interactions between those areas that our SPM analysis found to exhibit precision-dependent attention effects (i.e., right putamen, FEF, and TPJ; Fig. 5). First, family-level Bayesian model selection was used to reveal the most likely configuration of driving inputs. Models with visual input into putamen and TPJ were clearly superior to models with input into putamen and FEF or all three regions (posterior probability ϭ 1.0). Among the 12 models of the winning visual input family, model 10 was superior to the other 12 models (posterior probability ϭ 0.99). Figure 6 depicts the results of the Bayesian parameter averaging across subjects for the connections of this model and their attentional modulation by precision.
Inspection of the modulatory parameters on valid and invalid trials showed that higher values of precision-dependent attention ␣͑ 1 ͑t͒ ͒ decreased connectivity between TPJ and FEF on valid trials. In contrast, connectivity strongly increased with higher ␣͑ 1 ͑t͒ ͒ on invalid trials between TPJ and FEF. The modulations of the TPJ¡putamen connection showed the same pattern, but the posterior probabilities just failed to reach the 90% threshold. However, the parameters were higher for invalid than for valid trials for both the TPJ¡FEF and TPJ¡PUT connections (Fig. 6).

Discussion
In this study, we combined computational modeling of psychophysical data with fMRI to investigate how probabilistic inference guides the deployment of visuospatial attention and explains the activity of attentional networks. Replicating previous results, RS was affected by probabilistic context (%CV) and could plausibly be explained by Bayesian updates of precision that govern the deployment of attention. Neuronal activity was enhanced on invalid compared with valid trials in a bilateral frontoparietal network. In the right FEF, TPJ, and putamen, reorientingrelated responses were modulated by the precision of-or confidence in-the belief that the target will appear at the cued location. DCM suggested that precision-dependent attention differentially modulated connectivity between TPJ and FEF, as well as between TPJ and the putamen, on valid and invalid trials. The trial-by-trial connection strength from TPJ to FEF changed with precision-dependent attention, depending upon the outcome: connection strength decreased with precision on valid and increased on invalid trials. These context-sensitive coupling changes may be interpreted as a reflection of the optimal deployment of attentional resources mediated by (Bayes-optimal) precision updates.
Contrasting invalid and valid trials revealed brain areas of two well described attention networks (i.e., dorsal and ventral frontoparietal regions). The TPJ, for which a right-hemispheric lateralization has been proposed (but see Geng and Vossel, 2013), was activated in the left hemisphere in this contrast between all invalid and valid trials for the main HRF regressor. However, the right TPJ showed a parametric reorienting response that was modulated by precisiondependent attention. This finding is consistent with previous results according to which right TPJ activity is modulated by the explicit knowledge of %CV (Vossel et al., 2006;Vossel et al., 2012). In addition, the right putamen and right FEF showed parametric precisiondependent reorienting responses. More specifically, these regions showed decreased activity on valid trials and this deactivation grew with the precision of the prediction. Conversely, activity was increased on invalid trials and this activation increased with precision.
The effect in this right-hemispheric network did not depend on the hemifield of target presentation (or saccade direction, respectively). From the perspective of monkey single-unit recordings, one could have expected contralateral activation in the FEF. However, human imaging studies often fail to report this lateralization, which might be too subtle to be picked up with fMRI (Neggers et al., 2012). Interestingly, fMRI studies using anti-saccade tasks have observed a preferential involvement of right frontal cortex structures in antisaccades compared with prosaccades (regardless of saccade direction; McDowell et al., 2002;Desouza et al., 2003) and gray matter volume within right FEF is negatively correlated with anti-saccade errors (Ettinger et al., 2005). These findings highlight the different specialization of left and right FEF, which has also been demonstrated by concurrent transcranial magnetic stimulation (TMS)/fMRI studies (Ruff et al., 2009). Interestingly, Ronconi et al. (2014) showed that TMS of the right (but not left) FEF disrupts updates of attentional focus (zooming) in response to cues with different spatial precision. This dynamic zooming may also play a role in the precision-dependent FEF response in the present study.
The involvement of the putamen in the present study could be related to two factors. First, the putamen has been implicated in the processing of stimulus contingencies and probabilistic inference and the coding of prediction errors Liljeholm and O'Doherty, 2012). With regard to attentional reorienting, it has been observed that basal ganglia activation is affected by the degree to which reorienting can be expected (Shulman et al., 2009) and this is consistent with the present findings. Second, functional and structural imaging data have highlighted the role of the putamen in the control of saccadic eye movements and have challenged the designation of the nucleus caudatus and putamen as "oculomotor" versus "skeletomotor" striatum, respectively (Neggers et al., 2012). In the study by Neggers et al. (2012), the putamen was consistently involved in three different saccade tasks and was principally connected with FEF subregions, as revealed by DTI fiber tracking. Also in the present task, the mere execution of saccades activated the putamen (Table 1). Given its involvement in both probabilistic inference per se and eye movement control, the results of the present study motivate future research on the generality and specificity of the putamen in Bayesian inference in different attentional and motor-intentional cognitive systems.
Comparison of different model families revealed that visual stimulation evoked by the target stimuli was primarily conveyed to the TPJ and the putamen. Moreover, model comparison within this winning model family identified the TPJ as a likely source of precision-dependent effects on putamen and FEF. High values of precision (high certainty about the imminent saccade target location) decreased the modulatory influences from the TPJ to the FEF on valid trials, whereas the reverse effect (increased connectivity) was observed for invalid trials. A similar but weaker pattern was observed for the modulatory influences from the TPJ to the putamen. The TPJ is the key node of the ventral attention system of the human brain, whereas the FEF is part of the dorsal system. Previous DCM of the architecture of the dorsal and ventral system has highlighted the role of ventral to dorsal modulatory influences during attentional reorienting (Vossel et al., 2012). In this previous study, invalid cueing enhanced connectivity between visual areas and the right TPJ, as well as between the right TPJ and the IPS. Our present finding is consistent with this earlier result, because visual stimulation drove right TPJ activity and ventral to dorsal pathways (TPJ to FEF) were differentially modulated on invalid and valid trials. These findings do not necessarily imply that the right TPJ sends an early signal that triggers attentional reorienting. This idea was initially proposed by Corbetta and Shulman (2002), but more recent work provides evidence against an early reorienting signal in the right TPJ (Geng and Vossel, 2013;Macaluso and Doricchi, 2013;DiQuattro et al., 2014;Han and Marois, 2014). Instead, the TPJ might be involved in the updating of internal models (Geng and Vossel, 2013;Han and Marois, 2014) and the detection of mismatches between expected and actual stimuli (Doricchi et al., 2010). Our present finding can be plausibly interpreted along these lines: depending on the confirmation or violation of the model predictions that the target appears at the cued location, the TPJ suppressed or boosted activity in the FEF (and, to a lesser degree, also in the putamen), respectively. Given that the right FEF is involved in the top-down controlled allocation of attentional resources and the scaling of the attentional focus (Ronconi et al., 2014), the TPJ inputs could be regarded as update signals to these regions after the observation of new outcomes.
From a hierarchical message-passing perspective, one might expect that precision effects are reflected by changes in postsynaptic gain. DCM studies of other tasks using EEG and MEG have supported this notion . In contrast to DCM for M/EEG, where changes in gain are represented by changes of postsynaptic response amplitude (colloquially referred to as "intrinsic connectivity"), DCM for fMRI only allows for a more phenomenological representation of gain via the modulation of self-connections. In the present study, models with modulations of self-connections were inferior to models with modulations of interregional connections. Given that DCM for M/EEG relies upon more detailed neurobiological models with different cell populations and a more veridical representation of gain, our future work will use EEG or MEG studies of the precision-dependent effects observed in this study.
Although the effects were weaker, precision-dependent attention also affected the connection from the TPJ to the putamen. It has been proposed that the striatum receives a phasic dopaminergic input via retino-tecto-nigro-striatal projections (Redgrave and Gurney, 2006). Given its short latency (70 -100 ms), this dopamine signaling is presumably based on preattentive/presaccadic sensory processing and may underlie the learning of contingencies. One could speculate that these signals provide the basis for an early reorienting response to unexpected events. However, such an effect was not observable in our DCM results (i.e., we did not find modulations of efferent connections from the putamen). Instead, the DCM results were dominated by TPJ signaling. Nevertheless, this does not preclude the existence of both early and late modulations, and neuromodulatory (e.g., dopaminergic) influences are likely to be related to precisiondependent gain in cortical systems .
The test for general effects of the precision-dependent attention regressor across valid and invalid trials did not reveal any significant effects (nor did the test of the volatility regressor). The design of our experimental task, in which the target followed the cue with a fixed 800 ms stimulus onset asynchrony, did not allow for a separate characterization of cue-and target-related BOLD responses. The finding of significant validity effects of precision-dependent attention, in the absence of a general condition-unspecific effect, may reflect that the signal was dominated by the target-related response with differential parametric effects (i.e., a negative modulation for valid and a positive modulation for invalid trials). M/EEG will be helpful in investigating preparatory and target-related responses and their modulation by precision-dependent attention in the future.