Abstract
Contextual cues are predictive and provide behaviorally relevant information; they are not the main objective of the current task but can make behavior more efficient. Using fMRI, we investigated the brain networks involved in representing contextual information and translating it into an attentional control signal. Human subjects performed a visual search task for a low-contrast target accompanied by a single non-target that was either perceptually similar or more salient (i.e., higher contrast). Shorter reaction times (RTs) and higher accuracy were found on salient trials, suggesting that the salient item was rapidly identified as a non-target and immediately acts as a spatial “anti-cue” to reorient attention to the target. The relative saliency of the non-target determined BOLD responses in the left temporoparietal junction (TPJ) and inferior frontal gyrus (IFG). IFG correlated with RT specifically on salient non-target trials. In contrast, bilateral dorsal frontoparietal regions [including the frontal eye fields (FEFs)] were correlated with RT in all conditions. Effective connectivity analyses using dynamic causal modeling found an excitatory pathway from TPJ to IFG to FEF, suggesting that this was the pathway by which the contextual cue was translated into an attentional control signal that facilitated behavior. Additionally, the connection from FEF to TPJ was negatively modulated during target-similar trials, consistent with the inhibition of TPJ by dorsal attentional control regions during top-down serial visual search. We conclude that left TPJ and IFG form a sensory-driven network that integrates contextual knowledge with ongoing sensory information to provide an attentional control signal to FEF.
Introduction
Perceptually salient information can frequently act as an attentional cue for less obvious objects. For example, a flashing construction sign on the road may indicate that nearby cars are about to merge; seeing an animal startle may indicate the presence of a predator. The more perceptually salient stimuli in these examples are behaviorally relevant because they direct attention to more critical, but harder to detect, objects. Although stimuli may be behaviorally relevant for a variety of reasons, the salient stimulus in these situations are only relevant because they are predictive cues that enhance behavior. Such contingencies are relatively common in daily life, but there is little understanding of how knowledge associated with a sensory stimulus is translated into an attentional control signal.
Existing studies that have examined the brain networks underlying detection of behaviorally relevant stimuli have generally defined “relevance” by target features, for example, targets in unexpected locations (Arrington et al., 2000; Kincade et al., 2005; Vossel et al., 2006; Indovina and Macaluso 2007; Doricchi et al., 2010), target-colored distracters (Serences et al., 2005; Hu et al., 2009), or target-relevant cues (Shulman et al., 2009; Geng and Mangun, 2011). These studies have identified a right-lateralized ventral frontoparietal network, including the temporoparietal junction (TPJ) and inferior frontal gyrus (IFG) (Corbetta and Shulman, 2002; Fox et al., 2006). Stimulus-driven activity in this network is hypothesized to reorient attention through connections with dorsal frontoparietal attentional control regions (Shulman et al., 2003; Corbetta et al., 2008; Geng and Mangun, 2011), which modulate sensory cortex directly (Kastner and Ungerleider, 2000; Moore and Armstrong, 2003; Ruff et al., 2006; Bressler et al., 2008).
Although right TPJ and IFG are clearly involved in stimulus-driven attentional control, there is increasing evidence that left TPJ also encodes aspects of behavioral “relevance.” In addition to frequent (but less emphasized) coactivation with right TPJ, left TPJ has been hypothesized to orient attention toward stimuli that match a target “template” (Doricchi et al., 2010), relative target saliency (Weidner et al., 2009), or episodic memories (particularly verbal ones) (Cabeza et al., 2008; Ciaramelli et al., 2008; Hutchinson et al., 2009; Ravizza et al., 2011). These hypothesized roles for left TPJ share the theme of encoding non-visuospatial, task-relevant features. Thus, there may be a homologous left-lateralized ventral network that is stimulus driven and reorients attention based on contextual knowledge but differs from the right-lateralized network in the class of information by which relevance is defined.
The critical question for the current study was whether detection of a sensory feature that provides task-relevant information, but is not the target itself, would 1) activate a left-lateralized ventral network and 2) reorient attention via connections with dorsal control regions. We measured BOLD responses in a visual search task in which a salient object could be rapidly identified as a non-target and immediately redirect attention toward the target. Our results provide the first clear demonstration that a left-lateralized TPJ–IFG network is involved in sensory-driven attentional control and initiates attentional orienting via connectivity with frontal eye fields (FEFs).
Materials and Methods
Participants.
Twenty-one healthy adults (mean age, 23.8 ± 5.1 years; range, 18–39 years; seven males; 20 right-handed) participated for payment. All gave written informed consent in accordance with the local ethics clearance as approved by National Institutes of Health. All had normal or corrected-to-normal vision. Handedness was determined by a shortened version of the Edinburgh handedness inventory (Oldfield, 1971).
Task design.
Each trial began with a cross (0.19° visual angle) blinking once. The blink occurred 500 ms before the onset of a search display and signaled the beginning of the trial. The visual search display consisted of two “t”-like stimuli that were visible for 200 ms (Fig. 1). One object was always the target and the other a non-target. We use the term “non-target” to emphasize the fact that this second object could provide some task-relevant information and is therefore not a “distracter” per se. A variable fixation interval ranging from 1900 to 5900 ms followed the search display. Targets were upright or inverted “t” stimuli, and non-targets were identical but rotated 90° to the left or right from vertical. Targets were always low contrast (Michelson contrast ratio of 0.45; foreground, 35.5 cd/m2; background, 93. 5 cd/m2), whereas non-targets were identical to the target on 50% of trials and appeared at a higher contrast on the other 50% of trials (Michelson contrast ratio of 0.91; foreground, 7.1 cd/m2; background, 160.3 cd/m2). Subjects were told in advance that the non-target would sometimes be high contrast but that the target would never be high contrast. Thus, the high-contrast non-target could be used as an anti-cue for the target. We refer to trials with the high-contrast non-target as “salient” and trials with low-contrast non-targets as being target “similar.” There is ample evidence that high-contrast stimuli capture attention relative to their low-contrast counterparts (Mansfield, 1973; Tartaglione et al., 1975; Ling and Carrasco, 2006; Lee et al., 2007; Proulx and Egeth, 2008; Geng and Diquattro, 2010; Mazaheri et al., 2011). The horizontal distance of the nearest edge of the stimuli to fixation was ±2.95° of visual angle, and the vertical distance was −0.85° of visual angle. The stimuli themselves subtended 0.85° visual angle at fixation. The target was equally likely to appear in the left and right visual fields. The background throughout the experiment was an intermediate gray (77.8 cd/m2).
Trial procedure: the beginning of each trial was indicated by a blink of the fixation cross. After 500 ms, a search display consisting of one target and one non-target appeared. The non-target was similar (i.e., identical in contrast to the target) on 50% of trials and salient (i.e., higher contrast) on 50% of trials.
Participants were instructed to find the upright or inverted “t” and report its orientation while maintaining their gaze on the fixation cross. Results from a similar paradigm that did allow eye movements have been reported previously (Geng and Diquattro, 2010; Mazaheri et al., 2011). A target and non-target appeared on every trial. Manual button presses were used to indicate the orientation of the target “t”: an upright “t” was indicated with the right index finger and inverted “t” with the right middle finger. The participant's compliance with maintaining fixation was confirmed using an Applied Science Laboratories Eyetrac 6 sampling at 60 Hz (Fig. 2).
Eye tracking.
Eye position data for 16 participants (data from five were unusable because of excessive noise) were normalized to the mean value from the prestimulus cue period (i.e., all data points from the fixation blink to the onset of the search display) to account for drift. Any data points found beyond the spatial constraints of the experiment (greater than ±10° of visual angle from fixation) were removed because they were likely caused by artifacts unrelated to experimental conditions (e.g., blinks, loss of pupil). Eye-position data were analyzed separately for the period when the visual search display was visible (200 ms) and the subsequent 200 ms. Inclusion of eye data was also used as a between-subjects factor in analyses of behavioral and brain data to test whether subjects without eye tracking performed differently from those that did (note that all subjects believed that their eyes were being tracked). There were no differences (see below, Results).
Imaging data.
MRI data were acquired from a 3 T Siemens Trio scanner equipped with an eight-channel phased array head coil. A T2*-weighted echo planar imaging (EPI) sequence was used to acquire volumes of 31 slices of 3 mm thickness (3.4 × 3.4 mm in-plane resolution) with a distance factor of 10%, every 1750 ms. Slices were oriented to achieve whole-brain coverage. One hundred ninety-nine volumes were collected in each session for five runs. Image data were analyzed using SPM8 (Wellcome Department of Imaging Neuroscience, London, UK; Friston et al., 1995). Images were realigned and unwarped to correct for interactions between movement and field inhomogeneities (Andersson et al., 2001), normalized to the MNI EPI template available in SPM8, and resampled to a resolution of 2 × 2 × 2 mm. The data were additionally smoothed with a three-dimensional 9 mm full-width half-maximum Gaussian kernel. High-resolution T1-weighted structural images were acquired using an MPRAGE sequence, coregistered with each subject's EPI images, and normalized to the MNI template brain. Results are displayed on an average structural image created from normalized T1-weighted images from our participants.
The data were modeled for each voxel using a general linear model (GLM) that included regressors obtained by convolving each event-related unit impulse (“stick function”) with a canonical hemodynamic response function. The main GLM included four experimental conditions given by crossing target location (left, right) and non-target salience (similar, salient). In addition to the four conditions of interest (given by the 2 × 2 factorial design), errors, scan session, and realignment parameters associated with movement artifacts were modeled separately as variables of non-interest. Condition-specific effects estimated by the GLM were entered into a group-level analysis as contrast images, which were then entered into independent one-sample t tests. A second GLM was identical to the first but with the addition of RT entered as a trial-specific parametric regressor for each condition. RTs for each condition were scaled by Euclidean normalization and then mean-centered. The four RT regressors were created by scaling the expected conditional BOLD response by trial RT; the resultant parameter estimates represent the degree to which BOLD activation in response to a particular stimulus condition is scaled by trial RT. We refer to this GLM as the “RT model.”
Regions of interest (used for dynamic causal modeling).
Each ROI was a spherical volume with a radius of 3 mm (i.e., 19 voxels). Mean-adjusted data (i.e., first eigenvariate of the time series) from each participant were extracted from all voxels within left FEF, TPJ, and IFG. Selection of ROIs was based on results from group random-effects analyses from the contrast of salient minus neutral trials. FEF coordinates were based on the group random-effects analyses of the conjunction of all conditions in the RT correlation model (p < 0.05, corrected; cluster size, 379 voxels). Although this conjunction analysis produced significant activation in a number of regions, the BOLD was expected to be highly correlated between regions (Fox et al., 2005); FEF was chosen as the representative “dorsal network” node because of its direct role in top-down attentional selection (Moore and Armstrong, 2003; O'Shea et al., 2004; Ruff et al., 2006; Buschman and Miller, 2007).
The ROI center within each individual subject was determined by the local maximum (p < 0.05, uncorrected) closest to peak coordinates from the corresponding group random effects analysis, within the appropriate anatomical landmark. The group coordinates were used for those without clear individual clusters that met the aforementioned criteria (two for TPJ, eight for IFG). The mean ± SD x,y,z coordinates for individual ROIs were as follows: TPJ, −48 ± 5.6, −52 ±7.2, 33 ± 4.7; IFG mean, −49 ± 3.7, 32 ± 4.3, 8 ± 7.0; FEF, −27 ± 4.2, −2 ± 2.7, 53 ± 4.2). The mean values were <2 mm from the peak coordinate calculated from the group results (see below). Using group statistics and anatomical criteria to define ROIs provided individual specificity while still permitting generalization to the population (Ikkai and Curtis 2007; Stephan et al., 2007a; Geng and Mangun, 2009).
Dynamic causal modeling.
Effective connectivity analyses were conducted using Dynamic Causal Modeling (DCM10) as implemented in SPM8. DCM was used to investigate the connectivity profile of three regions (TPJ, IFG, and FEF) in response to the demands of our task (see above for ROI selection procedure). DCM models effective connectivity between regions by treating the brain as an input–state–output system (Friston et al., 2003; Penny et al., 2004a,b; Stephan et al., 2004, 2007a). The inputs are composed of the experimentally manipulated stimulus events, and the state variables represent the underlying neuronal activity. The outputs are the regional BOLD responses predicted by a biophysical forward model of the hemodynamic response given the state variables. The state of each region is dependent on that of other regions in the model, and this dependency is reflected in the connectivity parameters. The state variables are adjusted in each model to maximize the match between the estimated and observed BOLD responses. DCM models have three types of parameters: (1) driving inputs that describe the response of each region to the experimental stimuli, (2) intrinsic parameters that represent the baseline effective connectivity between regions across the experiment, and (3) modulatory parameters that describe changes in connectivity between regions as a function of the experimental conditions. DCM has been used to successfully model fMRI data in a number of domains (Mechelli et al., 2003; Smith et al., 2006; Stephan et al., 2007b; Leff et al., 2008; Lewis and Noppeney, 2010; Noppeney et al., 2010).
There were two goals of the DCM analyses. The first was to identify the pattern of intrinsic connectivity between left TPJ, IFG, and FEF. The intrinsic parameters are of interest because they reflect the “default” effective connectivity structure between regions in the current experimental context (i.e., performing visual search knowing that saliency was anti-correlated with the target). The second goal was to determine how the appearance of the non-target on each trial (i.e., similar or salient) modulated the intrinsic connectivity between regions.
We formulated a corpus of models using the three left hemisphere ROIs (FEF, IFG, and TPJ) identified from the group analyses. All models shared a common intrinsic connectivity structure such that all regions were reciprocally interconnected (resulting in six total connections). Models differed in the modulatory and input parameter specifications. To be as inclusive as possible in representing alternative models, we included all possible models with two or fewer modulatory parameters with the constraint that a single connection could only contain one modulatory parameter at a time. This resulted in a set of 72 models composed of the following: six models with a one modulatory parameter corresponding to trials with a salient non-target (the modulatory parameter was assigned to a different connection for each model); six models with one similar non-target modulatory parameter; 15 models with two salient modulatory parameters located on every pairwise combination of the six connections; 15 models with two similar modulatory parameters; and 30 models with one salient and one similar parameter organized on every pairwise combination of the six connections. In addition, because we had no strong a priori definition for where driving inputs should be applied (i.e., where stimulus information should enter into the model), each of the 72 models were replicated with inputs to each of the three ROIs. This resulted in a final set of 216 models.
To best understand the connectivity structure between TPJ, IFG, and FEF in our experiment, we used a combination of family-level inference and Bayesian model averaging within families (Liu et al., 2010; Penny et al., 2010; Stephan et al., 2010). Family-level inference is a process by which groups of individual models are compared based on a shared characteristic. We used family inference in a two-step procedure that, first, partitioned the complete model set based on the region of the driving input and then, second, partitioned models within the most likely group based on whether connection parameters were modulated by salient, similar, or both trial types. Family inference avoids “dilution” effects in model selection procedures with similar models and therefore isolates the critical characteristics of interest to compare (Penny et al., 2010). After identifying the family with the highest posterior exceedance probability, we then used model averaging to summarize the likely parameter values for that model family. Model averaging is weighted toward models with greater posterior probabilities. All procedures were conducted using random effects analyses to account for variability between individuals. The final step was to use classical statistics to determine the probability of the model results under the null hypothesis (i.e., that the connection strengths are zero).
Results
Eye tracking
Samples of horizontal eye position during the 200 ms search period on each trial were divided into our four experimental conditions (Fig. 2). A total of 89% of all trial samples fell within ±1° of fixation, indicating participants' ability to maintain fixation during the critical stage of the task. Eye data within ±1° from fixation were entered into a repeated-measures ANOVA with the factors target location and non-target salience to test for any differences between conditions. There were no significant effects of salience (F(1,15) = 1.0, p = 0.33) or target location (F(1,15) = 4.5, p = 0.052), nor an interaction between the two (F(1,15) = 0.80, p = 0.38). The analysis of target location was nearly significant, but all these samples were within 1° of visual angle from the center of the fixation cross and therefore the differences could not be attributable to stimulus-evoked saccades to the target. Furthermore, analysis of the immediately subsequent 200 ms resulted in no significant effects of salience (F(1,15) = 0.37, p = 0.55), target location (F(1,15) = 1.0, p = 0.32), nor their interaction (F(1,15) = 0.01, p = 0.91). The eye-position data demonstrate that subjects were able to maintain fixation during the search task across all conditions and that there were no differences in the number of fixations outside of a 1° radius from the center of the fixation cross (Fig. 2).
Horizontal eye-position samples in each experimental condition (left similar, black; right similar, dark gray; left salient, light gray; right salient, white). Subjects were able to maintain fixation in all conditions: 89% of all samples fell within ±1° of visual angle from fixation, and no differences were found between conditions. Bins are 0.25° of visual angle.
Behavior
Overall accuracy was high (mean of 91%), demonstrating that participants were able to perform the task well. RT and accuracy data were entered into a 2 × 2 repeated-measures ANOVA defined by the target location (left or right) and the salience of the non-target (similar or salient). Accuracy in the salient condition (mean of 93%) was significantly higher (F(1,20) = 15, p < 0.001) than in similar (mean of 89%; Fig. 3A). RT analyses were limited to correct trials that fell within 2 SDs of each subject's correct RT mean. Consistent with the pattern found in the accuracy data, participants responded significantly faster (F(1,20) = 46.5, p < 0.001) in the salient condition (mean of 702 ms) compared with the similar condition (mean of 753 ms; Fig. 3B). This demonstrates that there was no speed–accuracy tradeoff and that the presence of the salient non-target enhanced performance. This confirms that the salient non-target was used to guide attention to the target location (see also Geng and Diquattro, 2010).
Accuracy (A) and RT (B) of target discrimination in each experimental condition (LSim, left similar; RSim, right similar; LSal, left salient; RSal, right salient). Responses were significantly more accurate and shorter when the salient feature was present, regardless of target side. Error bars are SEM.
In addition to the main effect of saliency, we found a significant main effect of target location (F(1,20) = 5.3, p < 0.05) and a significant interaction between target location and non-target salience (F(1,20) = 11.7, p < 0.01). The interaction was attributable to shorter RTs for targets on the left versus right when the non-target was similar (t(20) = −2.6, p < 0.05; left mean of 753 ms, right mean of 702 ms) but no difference when the non-target was salient (t(20) = −1.9, p = 0.076; left mean of 690 ms, right mean of 716 ms). Shorter RTs for left-sided targets in the similar condition is consistent with a general attentional bias for the left visual field (Bowers and Heilman, 1980). Most importantly for the present purpose, however, the left-sided bias was eliminated when the non-target was salient, suggesting that attentional orienting in response to the salient non-target overrode the bias toward the left and put behavior at ceiling. Accuracy data showed neither an effect of side (F(1,20) = 0.003, p = 0.96) nor an interaction between side and salience (F(1,20) = 3.06, p = 0.1).
Imaging data
The main goal of this study was to understand the brain regions involved in the detection of an informative salient feature and translation of that signal into an anti-cue for orienting attention efficiently. Thus, our primary interest was to identify brain regions that were selectively activated by the salient non-target. To be as inclusive as possible, we first conducted a whole-brain group analysis of the contrast salient minus similar, with a threshold set at voxel-wise significance of p < 0.001, uncorrected. This contrast resulted in only two significant regions: the left TPJ (peak MNI, −46, −50, 34; cluster size, 87) and the left IFG (peak MNI, −50, 30, 8; cluster size, 5). These were the only two significant clusters within the whole-brain analysis.
To test for additional effects of target side found in the behavioral RT data, we conducted a repeated-measures ANOVA of mean β values taken from TPJ and IFG ROIs; there were no significant differences between target locations (TPJ, F(1,20) = 0.022, p = 0.883; IFG, F(1,20) = 0.216, p = 0.647) and no significant interactions between target location and non-target salience (TPJ, F(1,20) = 1.1, p = 0.308; IFG, F(1,20) = 0.037, p = 0.848) (Fig. 4). Activation in TPJ and IFG was significantly different in response to the salient non-target, regardless of its spatial location.
Left TPJ and IFG activations revealed by a contrast of salient minus similar non-target trials (left) and β values from each region as a function of the four experimental conditions. Results are illustrated at a statistical threshold of p < 0.005, uncorrected. Error bars are SEM. LSim, Left similar; RSim, right similar; LSal, left salient; RSal, right salient.
Examination of the β values also suggested that the difference in activation between conditions was attributable to relatively greater deactivation in the similar compared with the salient condition (Fig. 4). This suggests that the stimulus-evoked response to the salient stimulus was more in line with the overall activation of IFG and TPJ throughout the experiment; in contrast, the response to the neutral stimulus elicited stimulus-evoked deactivation of TPJ and IFG. This result is consistent with the interpretation that the baseline functional state of these regions was to process the salient stimulus and that the appearance of the similar non-target led frontoparietal regions to inhibit these regions to engage in top-down visual search (see also DCM below).
Notably, the significant TPJ and IFG clusters were in the left hemisphere and not in the right, as might be expected for stimulus-driven attentional reorienting. To ensure that there were no right-sided activations opposite our left hemisphere results, we extracted β values from right TPJ and IFG using ROIs created from our whole-brain results on the left. Results from those flipped ROIs did not show greater activation in response to the salient non-target compared with the similar non-target, even using ROI-guided conventional statistics (right TPJ, F(1,20) = 1.62, p = 0.22; right IFG, F(1,20) = 1.03, p = 0.32; Fig. 5), suggesting that our results were lateralized to the left hemisphere.
Right TPJ and IFG β values showing no effect of condition on activation (for ROI identification, see Results). Note that the y-axis scale is identical to Figure 4. Error bars are SEM. LSim, Left similar; RSim, right similar; LSal, left salient; RSal, right salient.
To quantify the relationship between the location of our left TPJ and IFG clusters and commonly reported locations of the right-lateralized “ventral attentional network” (for review, see Corbetta et al., 2008), we compared our results with those from 10 studies that involved shifts of attention in response behaviorally relevant targets or features (Arrington et al., 2000; Downar et al., 2002; Shulman et al., 2003, 2007, 2010; Kincade et al., 2005; Serences et al., 2005; Vossel et al., 2009; Doricchi et al., 2010; Geng and Mangun, 2011). First, we calculated the distance between our results and TPJ coordinates from all 10 studies and IFG coordinates from nine studies (Shulman et al., 2007 did not report right IFG activity coordinates). Coordinates reported in the Talairach system were transformed to MNI (using the “tal2mni” function by M. Brett, http://imaging.mrc-cbu.cam.ac.uk/imaging/MniTalairach) and then flipped to the left hemisphere. The mean of the reported TPJ locations was centered at x,y,z coordinates −54, −50, 21 (ranges: x = −46 to −66; y = −44 to −65; and z = 11 to 30), and IFG at −46, 19, 16 (ranges: x = −23 to −66; y = 4 to 44; and z = −2 to 36). The average distance of our current peak coordinate was well within the range of existing values (TPJ: x = 7, y = 1, z = 9 mm; IFG: x = 4, y = 11, z = 8 mm).
We next conducted a small-volume-corrected analysis of the salient minus similar contrast using a spherical volume that encompassed all the reported points centered at the group means (resulting radius: TPJ, 15 mm; IFG, 25 mm). As would be expected based on the relative proximity of our results from the flipped right-lateralized results, the original TPJ peak was present in this mask and was significant at an FWE corrected threshold of p < 0.01. The IFG cluster was also present but at uncorrected p < 0.001. These results demonstrate that left TPJ and IFG form a left-lateralized network that controls attention by detecting and translating contextual knowledge of the salient non-target into an attentional control signal that aided visual search performance. This suggests that there is a left hemisphere homolog of the right ventral attentional network that operates independently to control attention based on contextual knowledge of stimulus features. We further tested this hypothesis next by (1) examining the correlation between brain activity and trial RT and (2) conducting connectivity analyses between left TPJ, IFG, and FEF.
RT regression
The previous whole-brain analysis identified left TPJ and IFG as being selectively activated by the salient compared with similar non-target. We next wanted to determine whether activation in either of those regions was directly related to task performance (i.e., RT) on salient trials. A region that is only correlated with RT on salient, but not similar, trials would fit the profile of a control area that used contextual knowledge to more rapidly orient attention to the target. In contrast, regions involved with attentional orienting or target discrimination more generally (i.e., regardless of the current experimental context) would be equally activated by all conditions because of the common need to search for and discriminate the target that occurred on every trial.
Trial RT was entered as a parametric regressor for each of the existing four conditions in a second individual-subject GLM (see Materials and Methods). The primary question of interest was whether left TPJ and IFG (identified in the previous whole-brain analysis) correlated with RT in the salient condition. To test this hypothesis, we examined results from a contrast between the RT regressors for the salient minus similar condition, masked by the results from the previous whole-brain analysis. This revealed a significant activation in left IFG (peak MNI coordinates, −50, 28 10; voxel cluster size, 6; FWE corrected p < 0.05; Fig. 6A). The results from this contrast were attributable to significant positive parameter estimates for the RT × salient trials (t(20) = 3.28, p < 0.005) and nonsignificant (from 0) values in the similar condition (t(20) = 1.35, p > 0.1). IFG activity was sustained throughout the salient trials, suggesting that it continuously provided input to an attentional control signal that determined the speed with which the target could be located. A similar positive trial-by-trial correlation was also seen for dorsal frontoparietal regions but now non-selectively for either condition (see below).
A, Overlapping left IFG activations from the contrast of (salient minus similar) in model 1 (green) and the contrast of trial RT* (salient minus similar) from the RT regression model (magenta). Results are illustrated at a statistical threshold of p < 0.005, uncorrected. B, Dorsal network activations in bilateral IPS, FEF, and supplementary eye fields (SEF) activated by the conjunction contrast of each experimental condition scaled by RT. Higher t-scores reflect a larger positive correlation with RT. C, β values for left FEF, as a representative region of the dorsal frontoparietal network, showed no significant differences in response to the four conditions. β values calculated from the same ROI used in the DCM analyses. LSim, Left similar; RSim, right similar; LSal, left salient; RSal, right salient.
For completeness, we also examined the results without a mask to identify other regions that covaried with salient trial RT. This produced one cluster of activation in right IFG (peak MNI, 58, 32, 10; voxel cluster size, 8) in addition to the previous left IFG result (peak MNI, −48, 28, 10; voxel cluster size, 51). The additional activation of the right IFG here may have been attributable to inhibitory control mechanisms involved specifically in response generation (Hopfinger et al., 2000; Leung and Cai, 2007; Hampshire et al., 2009; Aron, 2011). Similar to left IFG, the positive correlation between right IFG activation and RT was attributable to significant positive RT β values in the salient, but not the similar, condition (salient: t(20) = 2.68, p < 0.05; similar: t(20) = 1.02, p > 0.3). There were no significant clusters in the opposite whole-brain contrast of similar minus salient RT β values, indicating that no regions were selectively more correlated with RT in the similar compared with the salient condition.
IFG was selectively involved in attentional control only on trials in which contextual knowledge of the salient stimulus was used to facilitate behavior. However, this does not suggest that IFG is responsible for initiating the actual shift of attention. Instead, we hypothesized that IFG is an input region into dorsal frontoparietal regions that are known to control spatial attention. To localize these frontoparietal attentional control regions, we created a group-level conjunction SPM of all the condition × RT interaction parameters. Consistent with expectations, the following regions were significant at a threshold of p < 0.05, FWE corrected: bilateral IPS (peak MNI, −22, −64, 50 and 24, −62, 44), bilateral FEF (peak MNI, −28, −2, 54 and 32, 0, 50), and bilateral supplementary eye fields (peak MNI, −6, 20, 46 and 10, 16, 48) (Fig. 6B). These regions are known to control voluntary shifts of spatial attention and were involved in visual search for the target stimulus in this experiment, regardless of condition. The fact that the dorsal frontoparietal regions were not differentially activated in the two conditions suggests that the need for attentional shifting was similar in both conditions; this is not unexpected because the location of the target was unpredictable (i.e., required visual search). For example, the β values extracted from left FEF (as a representative region from the dorsal frontoparietal network used in the DCM analysis) showed that there were no significant main effects (location, F(1,20) = 0.259, p = 0.62; salience, F(1,20) = 0.661, p = 0.43), nor an interaction (F(1,20) = 1.4, p = 0.25; Fig. 6C).
Notably, TPJ was not significantly activated in these analyses, suggesting that TPJ was not directly related to the speed of performance [i.e., the salient and similar RT regressor values for our TPJ ROI were not significant (salient, t(20) = 1.71, p > 0.1; similar, t(20) = 0.32, p > 0.7]. This suggests that TPJ was not directly involved in controlling shifts of attention to detect and discriminate the target but rather may have provided information regarding the contextual relevance of the current stimulus to IFG, which then provided the dorsal network with an orienting signal. We explored this hypothesis using analyses of effective connectivity (see below).
DCM results
The previous results demonstrated that left TPJ and IFG played a critical role in representing the contextual relevance of stimulus information and that IFG translated that knowledge into an attentional control signal. We hypothesized that the attentional control signal was input to dorsal frontoparietal regions, which are known to execute shifts of attention, in a manner analogous to that proposed for the right hemispheric ventral network by Corbetta et al. (2008). To test the hypothesis, we used dynamic causal modeling analyses to determine the connectivity structure between left TPJ, IFG, and FEF. FEF was chosen as the representative “dorsal network” region in these models because of the substantial evidence for FEF in attentional control (Moore and Armstrong, 2003; Moore et al., 2003; O'Shea et al., 2004; Ruff et al., 2006, 2008; Buschman and Miller, 2007; Bressler et al., 2008).
DCM was used to test (1) the task-specific organization of intrinsic information flow between left TPJ, IFG, and FEF and (2) the condition-specific modulatory effects on the strength of those intrinsic connections. The intrinsic connections in DCM models reflect the overall connectivity between two regions within an experiment and modulatory parameters reflect changes to the intrinsic connection strength as a function of trial condition (see Materials and Methods). Intrinsic connections therefore reflect knowledge that is constant over the experiment (e.g., that saliency is anti-correlated with the target), whereas modulatory parameters reflect updates to that connectivity structure based on stimulus-driven trial conditions.
A detailed description of the model space and selection procedure can be found in Materials and Methods. In brief, all models included TPJ, IFG, and FEF, and all regions were fully interconnected. To be comprehensive in testing alternative hypotheses, we included all possible models with two or fewer modulatory parameters such that only one modulatory parameter could be associated with a single connection at a time. Each model was replicated three times to account for stimulus information being input to each of the three model regions. This resulted in a final set of 216 models. Selection of the best representative model of our data occurred through Bayesian family inference to find the commonalities between models with the greatest posterior probability. Family inference was done in two steps to identify (1) the input region and (2) the organization of modulatory parameters. Models within the family with the greatest likelihood were then averaged using Bayesian model averaging procedures (see Materials and Methods).
The first part of model selection was to determine the input region. The 216 original models were divided into three families based on the location of the driving inputs to TPJ, IFG, or FEF. The family with the greatest evidence (i.e., exceedence probability) had driving inputs to FEF (TPJ, 0.06; IFG, 0.13; FEF, 0.81; Fig. 7A). This was consistent with FEF receiving input before regions within the ventral network (Corbetta et al., 2008). The next step was to subdivide models with FEF driving inputs into three families based on modulation by similar trials alone (21 models), salient trials alone (21 models), or a combination of similar and salient information (30 models). This second round of model selection resulted in greatest evidence for the combination models (exceedance probability: similar, 0.4; salient, 0.03; combination, 0.55; Fig. 7B). The most likely models came from the combination family, but there was nearly the same evidence for the similar models (Fig. 7B). The fact that the combination models had more evidence indicated some individual variability in the use of saliency (Penny et al., 2010). Nevertheless, the parameters that are common across subjects are those that remain significant after model averaging; the final model parameters reflect commonalities between the similar and combination families, consistent with their similar levels of evidence.
A, Exceedance probabilities from Bayesian model selection (BMS) procedure of the original 216 DCM models divided into families based on the driving inputs to TPJ, IFG, or FEF. The FEF family of models had the greatest evidence. B, Exceedance probabilities for the 72 FEF input models divided based on the presence of modulation by the similar condition only, salient condition only, or a combination of similar and salient conditions. Evidence favored the models with both similar and salient modulation but was similar to those with only similar modulation (see Results). C, Structure of the Bayesian model average (BMA) from the family of models with both similar and salient modulation. Intrinsic parameter values are noted next to the connection in black and the modulatory parameter in a circle. Significant parameters are indicated by an asterisk (see Results).
The “average model” was constructed by the weighted average of parameters from models within the combination family (based on the exceedance probability of each). Statistical significance of each parameter value was determined using classical random-effects analyses (Fig. 7C). Consistent with our hypothesis that TPJ and IFG form a network that communicates with FEF, positive intrinsic connections were found on TPJ → IFG and IFG → FEF (t(20) = 3.57, 5.5, respectively, both at p < 0.01, Bonferroni corrected for multiple comparisons of six intrinsic parameter values). There was one significant negative connection from FEF → IFG (t(20) = 6.7, p < 0.01, with Bonferroni correction). All the remaining intrinsic connections were negative but only at uncorrected statistical values (t(20) > 2.3, p < 0.05, uncorrected). This suggests that the functional connectivity between regions was organized (in this experiment) to facilitate communication from TPJ to IFG and then to FEF.
There was only one significant modulatory parameter: the FEF → TPJ connection was made more negative in the similar condition (t(20) = −2.77, p < .05; Fig. 7C). This decrease in connection strength on similar trials was consistent with the idea that the dorsal system actively inhibits the stimulus-driven ventral network when purely top-down voluntary attention is engaged in visual search (Shulman et al., 2007). Interestingly, in this study, it appears that the intrinsic connectivity was set to facilitate stimulus-driven attentional orienting based on detection of saliency as an anti-cue; the absence of this cue then triggered downmodulation of TPJ by FEF.
Within the context of our experiment (in which knowledge about a perceptually salient non-target facilitated behavior), connectivity between the attentional networks was configured such that the ventral network provided input to the dorsal network by default; however, when stimulus-driven information was not useful (i.e., in the similar condition), the dorsal network dominated control of attentional selection processes and suppressed activity in the ventral network, primarily via TPJ. Thus, connectivity between regions were “set” to exploit information provided by the salient object, the flip side of which is that similar trials then initiated a modulatory change to inhibit these processes. These results are consistent with pattern of β values seen in TPJ and IFG (see above) in which the similar trials produced deactivations and the salient trials produced activation closer to the “baseline,” which reflects the context of expectations given the experimental design.
Discussion
In the current experiment, we investigated the brain networks involved in detecting the presence of a salient feature that was anti-correlated with the target and translating it into an attentional control signal. The perceptually salient item was not itself the target but provided contextual information that could be used to reorient attention toward the target more rapidly. Subjects exploited their knowledge of the salient stimulus to increase behavioral efficiency: responses were faster and more accurate when the non-target was perceptually salient compared with when it was similar to the target. Our fMRI results suggest that left TPJ and IFG represented the contextual relevance of the salient item (i.e., as a non-target) and translated that into an attentional control signal (i.e., serving as a spatial “anti-cue”) through connectivity with FEF.
Intrinsic settings to process the contextual relevance of salient stimuli are modulated by stimulus-evoked responses on similar trials
Left TPJ and IFG were the only two regions within a whole-brain analysis that showed a difference in activation for the perceptually salient compared with similar non-target. It is unlikely that these results were produced by a generalized alerting signal because salient stimuli that do not carry useful contextual information interfere with (rather than facilitate) target processing and are associated with activation in IPS and FEF, not TPJ (Indovina and Macaluso 2007; Corbetta et al., 2008; Geng and Mangun 2009; Hu et al., 2009). Furthermore, the specific pattern of activation in IFG and TPJ were better described by “deactivation” in the similar compared with the salient condition (Fig. 4); this suggests that there were stimulus-evoked changes in response to the similar non-target trials and that activation in the salient condition more likely reflected the ongoing functional state of those regions. This interpretation suggests that the “baseline” activation for these regions were set by the current experimental context to facilitate processing of the salient stimulus; this was most clearly seen in the positive correlation between IFG activation (within the salient condition only) with behavioral RTs. Activation was therefore only inhibited when the salient object was absent and purely top-down serial visual search was required.
The conclusion that similar trials elicited a change toward deactivation was supported by the DCM analyses in which the intrinsic pathway of information flow from left TPJ → IFG → FEF was modulated on similar trials. Importantly, the positive intrinsic connection values reflected the overall effective connectivity between regions within this particular experiment. The values do not have a broader meaning and should not be understood as implying “default” functional or anatomical connectivity between regions. The results are limited to the current task, and as such suggest that the excitatory functional coupling from TPJ → IFG → FEF was set by knowledge that bottom-up saliency could be used to direct attention. The value of these connection strengths would be expected to be different under another experimental design: the intrinsic connectivity between the ventral and dorsal networks may be flexibly set according to current task demands.
When there was no informative sensory stimulus (i.e., on similar trials), then the connection strength between FEF and ventral regions became more negative. This suggested that FEF inhibited the ventral regions when saliency information was absent. This finding is consistent with the hypothesis that dorsal regions “filter” information to the ventral network and actively inhibit TPJ to reduce stimulus-driven attentional orienting (Shulman et al., 2003, 2007; Wei et al., 2009). Similarly, dorsal and ventral stream “default” activity is anti-correlated (Fox et al., 2005), suggesting that these networks may work in a push–pull manner. The current findings support these ideas and go further to suggest that this inhibition can be modulatory (rather than stationary) when it is more advantageous to set intrinsic connections to facilitate ventral to dorsal communication. In such a task context, the regions within the dorsal frontoparietal network selectively suppress ventral network activity only when the stimulus display lacked the expected task-relevant signal (i.e., on similar trials).
Temporally late ventral activity
An additional feature of the final DCM model was the substantial evidence in favor of inputs entering into the model through FEF. This is also consistent with the idea that FEF and IPS “filter” the signals that reach the ventral network (Shulman et al., 2007; Corbetta et al., 2008). Post hoc analysis of our BOLD time-series data are consistent with this notion and show earlier peaks in FEF than in TPJ and IFG (at 3 s in FEF and 5–6 s in TPJ and IFG). Although it is not possible to directly compare BOLD time series from two different regions, our data also suggest that stimulus information reaches the dorsal network before the ventral network. This is consistent with the idea that the dorsal network orients attention and the role of the ventral network is to update control signals sent to the dorsal network (according to contextual knowledge) to facilitate behavior. In this sense, the response in left TPJ and IFG can be understood as supplying a “reactive” control signal in response to the appearance of the search stimuli (Braver et al., 2009).
Previous studies the right versus left TPJ
Existing studies of the TPJ and IFG in attentional control have generally reported a right-hemispheric network (Arrington et al., 2000; Kincade et al., 2005; Fox et al., 2006; Vossel et al., 2006; Indovina and Macaluso, 2007; Corbetta et al., 2008; Hu et al., 2009; Chambers and Heinen, 2010; Doricchi et al., 2010; Geng and Mangun, 2011). The right-lateralized network is consistent with neuropsychological findings that damage to the right TPJ leads to spatial neglect, a deficit in attending to contralesional information (Kinsbourne, 1977; Mesulam, 1999; Mort et al., 2003; Behrmann et al., 2004). Despite this, several recent studies have proposed independent hypotheses for a left TPJ role in attentional orienting. For example, Doricchi et al. (2010) hypothesized that left TPJ detects stimuli that “match” a target template (as opposed to right TPJ that detects stimuli that “mismatch”). They further hypothesize that left TPJ activation is often not reported because it is subtracted out in studies of attentional selection in which two conditions with targets, or target features, are contrasted. Weidner et al. (2009) have hypothesized that left TPJ integrates top-down attentional set with bottom-up saliency, particularly when attentional orienting is driven by nonspatial features (Coull et al., 2000; Mevorach et al., 2006; Hodsoll et al., 2009). Finally, Ravizza et al. (2011) recently hypothesized that left TPJ is involved in attentional processing of verbal information. The distinction between verbal and visuospatial content is also suggested in the neuropsychological literature: left and right TPJ patients present with working memory deficits in the verbal versus visuospatial domains, respectively (Vallar and Shallice, 1990; Mort et al., 2003). Although quite dissimilar on the surface, these studies have in common the idea that left TPJ is involved in representing nonspatial features that match expectations for what is currently task relevant.
Our current results are consistent with the idea that left TPJ is involved in the contextual orienting of attention based on expected nonspatial features when there was no other previous spatial information to guide attention. This is in contrast to previous studies that have found saliency to activate right TPJ when it served as an unexpected, but informative, spatiotemporal cue (Geng and Mangun, 2011). Thus, left TPJ is not sensitive to saliency per se in guiding attention but rather to nonspatial features that have contextual relevance (of which our salient stimulus is one example). Our RT correlation and DCM results additionally suggest that translation of feature-based relevance into an attentional signal occurred through excitatory connections between left IFG (which was downstream from TPJ) and FEF. This result provides empirical support (although mirrored onto the left hemisphere) for the theorized pathway of communication by which ventral regions inform dorsal frontoparietal regions of where to shift attention (Corbetta et al., 2008). Our data further suggest that these network dynamics exist not only in the right hemisphere but also in the left. The DCM highlight the critical importance of considering how the task constraints and stimulus domain may shape the intrinsic and modulatory connectivity between cooperative regions.
Conclusions
In the current study, we investigated the neural systems that facilitated visual search performance when a salient perceptual feature could be used to identify an object as a non-target and redirect attention to the less salient target. The behavioral results suggested that performance enhancement by salient non-targets was attributable to the rapid integration of a prepotent bottom-up signal with top-down contextual knowledge (Geng and Diquattro, 2010; Mazaheri et al., 2011). Here we reported that the contextual integration of information occurred in left TPJ and IFG and was translated into an attentional control signal through input to FEF. The results suggest that the integration of bottom-up feature information with top-down attentional selection was achieved through a left-lateralized ventral network that decoded the task relevance of a salient perceptual feature, which was then used to control the orientation of spatial attention.
Footnotes
This work was funded by the University of California at Davis. We thank Pia Rotshtein, Uta Noppeney, and Risa Sawaki for discussions and comments on a previous version of this manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Joy J. Geng, Center for Mind and Brain, 267 Cousteau Place, Davis, CA 95618. jgeng{at}ucdavis.edu