Abstract
Multiple studies have documented an inverse relationship between the number of to-be-attended or remembered items in a display (“set size”) and task performance. The neural source of this decline in cognitive performance is currently under debate. Here, we used a combination of fMRI and a forward encoding model of orientation selectivity to generate population tuning functions for each of two stimuli while human observers attended either one or both items. We observed (1) clear population tuning functions for the attended item(s) that peaked at the stimulus orientation and decreased monotonically as the angular distance from this orientation increased, (2) a set-size-dependent decline in the relative precision of orientation-specific population responses, such that attending two items yielded a decline in selectivity of the population tuning function for each item, and (3) that the magnitude of the loss of precision in population tuning functions predicted individual differences in the behavioral cost of attending an additional item. These findings demonstrate that attending multiple items degrades the precision of perceptual representations for the target items and provides a straightforward account for the associated impairments in visually guided behavior.
Introduction
Multiple studies have documented an inverse relationship between the number of to-be-attended or remembered items in a display (“set size”) and task performance (Palmer, 1990; Palmer et al., 1993; Geisler and Chou, 1995; Zhang and Luck, 2008). Two different explanations of these effects have been offered. “Perceptual coding models” posit that perceptual quality is constrained by the distribution of a finite neural resource that must be spread more thinly when set size increases (Posner et al., 1980; Bundesen, 1990; Palmer et al., 2011). In contrast, “decision integration models” posit that the internal noise associated with each perceptual representation is independent of set size (Palmer, 1995; Palmer and McLean, 1995; Eckstein et al., 2000; Verghese, 2001) and that declines in performance reflect increasing decision noise.
Recently, Pestilli et al. (2011) attempted to discriminate between these models by measuring blood oxygenation level-dependent (BOLD) responses in visual areas during a contrast-discrimination task. Although behavioral performance was better—and BOLD responses were larger—during attend-one relative to attend-four trials, quantitative modeling indicated that attention-related changes in behavior could not be solely explained by changes in the quality of stimulus representations in visual cortex (e.g., changes in the mean or variance of BOLD responses across trials). Instead, they favored a decision integration model in which neural responses are subjected to weighted pooling and a “max” rule is applied to identify the target. Two points motivate a reexamination of this question. First, the contrast-discrimination task may be conducive to grouping strategies that could minimize set size effects. Second, Pestilli et al. (2011) used a univariate analytic approach that is not always sensitive to qualitative changes in neural activity (Harrison and Tong, 2009; Serences et al., 2009a; Serences and Saproo, 2012).
Thus, we reexamined these alternative accounts by examining set-size-dependent changes in the relative precision of orientation-specific population responses measured via fMRI. We used a forward encoding model of orientation selectivity to reconstruct orientation-specific response profiles—channel tuning functions (CTFs)—while observers attended one or two peripheral targets (Brouwer and Heeger, 2011; Serences and Saproo, 2012) and performed a challenging visual discrimination task that discouraged perceptual grouping. According to perceptual coding models, CTF precision should decline as set size rises from one to two because two attended representations would each receive a smaller proportion of a finite resource. Alternatively, decision integration models predict no qualitative difference in CTF precision because the internal noise of each representation is independent of set size and load-dependent effects reflect noise in decision-related processes.
To preview our findings, CTF precision was reliably lower when more items were attended. Moreover, the reduction in the precision of the population response profiles was strongly correlated with associated reductions in behavioral performance, showing that reductions in the precision of population responses provides a compelling explanation of declining behavioral performance. Thus, although decision processes may also be challenged by increases in set size, attending additional items can also have consequences for the quality of perceptual codes.
Materials and Methods
Subjects.
Fourteen neurologically intact human subjects (six females) were tested in a single 2.5 h session. All subjects self-reported normal or corrected-to-normal visual acuity and gave both written and oral informed consent. Before scanning, each subject completed a single 1 h behavioral training session. Compensation for participation was $10/h for behavioral training and $25/h for scanning.
Stimuli and apparatus.
Visual stimuli were generated using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) implemented in MATLAB (MathWorks). Stimuli were rendered in white on a black background via a rear projection system. Button-press responses were made on an fMRI-compatible response box using the index and middle fingers of the right hand. Subjects were instructed to hold fixation on a small central diamond (subtending 0.18° from viewing distance of 58 cm) throughout each scan (eye position data were continuously collected to ensure compliance with these instructions; see below, Eye tracking).
Experimental task.
The sequence of events on each trial is presented in Figure 1. Each trial began with the presentation of two full-contrast square-wave grating stimuli (radius of 4°, 2.67 cycles/°) in the upper left and right quadrants of the display (horizontal and vertical eccentricity of ±7° and +5°, respectively). To elicit a strong, stimulus-specific BOLD response, we presented full-contrast stimuli on a black background, thus introducing both a contrast onset and a luminance increment for each stimulus presentation. Each stimulus was randomly assigned to one of two orientation values from distributions spanning 5–162.5° or 15–172.5° (for the left and right stimuli, respectively; both orientation distributions increased in steps of 22.5°) for a total of eight unique orientations in each distribution; the distributions were offset from one another to reduce the likelihood of perceptual grouping of identical or similarly oriented gratings. The two gratings flickered at a rate of 3 Hz (166.7 ms on, 166.7 ms off) for a total of 5 s, and the spatial phase of both stimuli was randomized on each cycle. A fixation cue was highlighted on the left, right, or both sides (red or green, counterbalanced across subjects; Fig. 1) to instruct observers whether to attend the left, right, or both gratings to detect brief clockwise or counterclockwise displacements in orientation. Subjects were instructed to press one of two buttons to distinguish whether a clockwise or counterclockwise displacement was detected. Each trial included two target events that were randomly distributed among cued items across time. During set size 1 trials, when only one stimulus was attended, no rotations occurred on the irrelevant stimulus; during set size 2 trials, when both stimuli were attended, targets were distributed randomly so that rotations occurred only in the left, only in the right, or across both stimuli (balanced across trials). Each trial was followed by a randomly jittered intertrial interval (5, 6, 7, or 8 s).
Stimulus orientation and set size (1 or 2) were balanced within each scan (in which “scan” refers to a continuous, 370-s-long block of 32 trials). Orientation differences between the two stimuli were fully counterbalanced across a total of eight consecutive scans, resulting in 64 total possible orientation pairings for each condition. For set size 1 trials (16 trials per block), there were an equal number of attend left and attend right trials; for set size 2 trials (16 trials per block), each trial was an attend-both trial.
Staircase procedure.
To ensure that the task was sufficiently challenging, we adjusted task difficulty by computing orientation rotation thresholds (defined as the minimum rotation magnitude yielding 75% correct performance) for each subject in a separate behavioral testing session (completed 1–2 d before the scanning session). The staircase procedure was identical to the experimental task with the only exception being that thresholds were independently modulated for set sizes 1 and 2. Rotation thresholds for set size 1 trials were used to set the target disparities for both set sizes during the scan session. This enabled a clear behavioral measure of the cost of attending an additional item.
Eye tracking.
To assess compliance with fixation instructions, eye position data were obtained via an ASL 5000 infrared tracking system (Applied Sciences Laboratory). Recording was performed at 60 Hz, and data were filtered for blinks and corrected for linear drift offline. For each trial, we identified all stable fixations (defined as a 200 ms epoch during which eye position did not deviate >0.25°) during a period extending the entire trial epoch. The x and y positions of fixations made during the trial were saved for offline analyses.
fMRI data acquisition and analysis.
fMRI data were collected using a 3 T Siemens Allegra system at the Robert and Beverly Lewis Center for Neuroimaging at the University of Oregon. Anatomical images were acquired using a spoiled-gradient-recalled T1-weighted sequence that yielded images with a 1 mm3 resolution. Whole-brain echo-planar images (EPIs) were acquired in 33 transverse slices (3 mm2 in-plane resolution, 2000 ms repetition time, 30 ms echo time, 90° flip angle, 64 × 64 matrix, 192 mm field of view, 3.5 mm slice thickness, no gap). EPIs were slice-time corrected, motion corrected (both within and between scans), and high-pass filtered (three cycles per run). Image preprocessing and data analysis were performed using BrainVoyagerQX (version 1.9; Brain Innovation) and custom time series and custom routines written in MATLAB (version 2010a; MathWorks).
Retinotopic mapping.
Retinotopic mapping data were acquired using a rotating checkerboard wedge flickering at 8 Hz and subtending 45° of polar angle (Engel et al., 1994; Sereno et al., 1995; following standard procedures described in detail previously). Each participant completed a single scan lasting 480 s. This procedure was used to identify retinotopic boundaries of primary visual cortex (V1). To aid in the visualization of these regions, data were projected onto a computationally inflated representation of each observer's gray/white matter boundary.
Functional localizer and voxel selection.
Each participant also completed one to two scans (15 trials) of a functional localizer task to identify voxels within V1 that were responsive to the spatial position occupied by the oriented grating stimuli of the primary experiment. Two full-contrast, phase-reversing (10 Hz) checkerboard stimuli (4° radius) that exactly matched the size and spatial position of the oriented grating stimuli of the main task were presented for a total of 10 s; participants were required to detect brief (50 ms) reductions in stimulus contrast that occurred simultaneously in both stimuli at unpredictable intervals on each trial. Each trial was followed by a 10 s fixation interval. Visually responsive regions of V1 were identified using a general linear model (GLM) with a single regressor—denoting stimulus presence (i.e., on vs off)—that was constructed by convolving a boxcar model of the stimulus sequence with a gamma function to account for the assumed shape of the hemodynamic response (Boynton et al., 1996). Voxels that showed a stronger response during epochs of stimulation (relative to fixation; thresholded at p < 0.05 using the false discovery rate algorithm provided in BrainVoyagerQX 1.9; the false discovery rate algorithm is a procedure used to prevent multiple comparison problems while avoiding type II errors prevalent in Bonferroni's correction approaches; see Genovese et al., 2002) were used to define functional regions of interest (ROIs) in V1.
Multivoxel pattern analysis.
The purpose of multivoxel pattern analysis (MVPA) and the forward encoding model is to estimate the influence of attending multiple items on orientation selectivity in V1. Here, we assumed that each voxel in V1 represents a large population of orientation-selective cells, in which a BOLD response within a given voxel reflects the pooled activity across a distribution of cells with dissimilar orientation preferences. Therefore, any observed orientation bias within a given voxel may reflect large-scale feature maps (Sasaki et al., 2006; Freeman et al., 2011) or random anisotropies in the distribution of orientation-selective columns within a voxel (Kamitani and Tong, 2005; Swisher et al., 2010). Thus, the BOLD response measured from many of the voxels in V1 exhibit a robust orientation preference (Haynes and Rees, 2005; Kamitani and Tong, 2005; Serences et al., 2009b; Brouwer and Heeger 2011; Freeman et al., 2011).
To examine whether patterns of activation in V1 contained information about the attended orientation, we first normalized the raw time series from each voxel on a scan-by-scan basis using a z transform. Next, mean BOLD responses were calculated across a time period extending from 4 to 8 s after the onset of the stimulus on each trial, resulting in a single BOLD response value for each trial, and these amplitude estimates were sorted into one of eight orientation bins. Data from all but one scan were then used to construct a “training” dataset that was used to train a linear discriminant function to discriminate between different stimulus conditions separately for each cortical hemisphere. The training set included data from both set sizes. The trained classifier was then used to decode the orientation of the sample stimulus on each trial within the remaining scan (the “test” set). Data could be assigned one of eight orientation labels, so chance discrimination accuracy was 12.5%. Mean responses from each label were removed to spatially normalize the activation patterns. This analysis was iterated using a hold-one-out cross-validation procedure until data from every scan had served as the test set. This analysis was completed for each stimulus for contralateral (e.g., left hemifield stimulus classified in right cortical hemisphere) and ipsilateral (e.g., left hemifield stimulus classified in left cortical hemisphere) hemispheres. For example, to estimate set size 1 classification accuracy in contralateral ROIs, we aggregated single-trial classification accuracies from (1) left V1 during attend right trials (eight trials per block) and (2) right V1 during attend left trials (eight trials per block); conversely, to estimate set size 2 classification accuracy in contralateral ROIs, we obtained two classification values for each set size 2 trial (i.e., classification in right hemisphere for left stimulus and classification in left hemisphere for right stimulus) and aggregated these values across all set size 2 trials (16 trials per block). Classification accuracies were then averaged across each test set, yielding a single classification accuracy value for each set size and cortical hemisphere (contralateral, ipsilateral).
Forward encoding model.
To characterize orientation-specific responses in visual cortex, we used a forward encoding model of orientation selectivity. Our approach was similar to one described by Brouwer and Heeger (2009, 2011), and we therefore adopt their terminology and conventions. Briefly, this model assumes that each fMRI voxel in visual cortex samples from a large number of orientation-selective neurons and that the response of any given voxel is proportional to the summed responses of all neurons in that voxel. Thus, one can characterize the orientation selectivity of a given voxel as a weighted sum of N orientation channels, each with an idealized tuning curve. We modeled the response of each voxel using a basis set of eight half sinusoids (one per sample angle) raised to the fifth power. These functions were chosen to approximate single-unit tuning profiles in V1, where the 1/√2 half-bandwidth of orientation-selective cells has been estimated at ∼20° (although there is a considerable amount of variability in this estimate; Schiller et al., 1976; Swindale, 1998; Ringach et al., 2002a,b; Gur et al., 2005).
To estimate orientation selectivity in V1 across variations in the distribution of attention across a variable number of items, we first normalized and sorted the data into training and test sets (for details, see above, Multivoxel pattern analysis). Thus, each training set had 224 observations for subjects who underwent eight runs in the scanner (7 runs in training set × 32 trials). Critically, every run sampled from each orientation equally. Similarly, the test set for these subjects had 32 observations. This analysis was completed for each ROI (left V1, right V1) and each visual field (left, right) across all trials, regardless of target location or set size. This ensured that the estimated weight profiles were unbiased across conditions.
In the first phase of the analysis, data from the training set were used to estimate weights on the hypothetical orientation channels separately for each voxel. Using the terminology of Brouwer and Heeger (2009, 2011), let m be the number of voxels in an ROI, k be the number of hypothetical orientation channels, and n1 and n2 be the number of observations in the training and test sets, respectively. The channel weights (W, m × k) can be derived via least-squares estimation: where D1 (m × n1) is the training set, and C1 (k × n1) is the basis set discussed above. In the second phase of the analysis, channel responses (C2, k × n2) were estimated given the weight matrix and responses in the test set (D2, m × n2): The steps of the forward encoding procedure outlined so far were collapsed across each stimulus, because parsing each condition and performing the forward encoding procedure independently for each stimulus condition would have biased estimation of channel weights. Thus, there were an equal number of trials from both set sizes, including the unattended stimulus, in the training set. Following the procedure above, each estimated channel response function was then sorted based on set size, and the columns in C2 were then circularly shifted so that the channel aligned with the stimulus presented on each trial was positioned in the center of orientation space, thereby aligning the estimated channel responses to a common center (i.e., 0°). Channel response functions from both stimuli in set size 2 trials were averaged together to form a single-channel response function for set size 2. For example, when estimating contralateral channel response functions, we averaged together left hemisphere channel responses to the right stimulus and right hemisphere channel responses to the left stimulus; similarly, when estimating ipsilateral channel response functions, we averaged together left hemisphere channel responses to the left stimulus and right hemisphere channel responses to the right stimulus. This analysis was repeated iteratively until all scans had served as the test set and the results for each set size were averaged. With respect to a given orientation value, this procedure was repeated across both contralateral and ipsilateral ROIs.
The purpose of this analysis was to reconstruct a separate orientation-selective population response profile for each stimulus, although there were always two stimuli presented. To this end, each possible orientation value for the left stimulus was paired with all possible orientation values of the right stimulus (across all the scans run for each observer). This allowed us to yoke the analysis to a given stimulus while the channel activity associated with the other stimulus was fully counterbalanced (see above, Experimental task). As the results will show, this procedure successfully allowed us to extract orderly tuning functions for each stimulus presented, although the channel activity for any single trial was influenced by two stimuli.
Note that this procedure makes no assumption regarding the shape (e.g., Gaussian) of voxel-based tuning functions at the level of a single voxel. Instead, this analysis models the response of each voxel as a combination of responses across eight hypothetical orientation-selective channels. Using this approach, any response profile may be observed at the single-voxel level. Then, in the testing stage of the analysis, the known weights across all voxels (computed during training) are used to constrain the estimate of the response in each channel given a vector of responses across voxels on each trial in the test set. Thus, this approach is generally more flexible than other approaches that assume a Gaussian distribution of orientation responses within each voxel. However, note also that the Gaussian-like CTF function (see below, Evaluating population-level feature-selective tuning functions in V1) is influenced by the use of a Gaussian-like basis set, because the basis functions overlap and are thus partially correlated. However, similar results are obtained if we use a basis set of orthogonal delta functions (“stick” functions) that are not correlated.
Mutual information analysis.
During the MVPA and forward model analyses, we used an information-theoretic quantity known as mutual information (MI) to rank-order voxels within each visual area based on how well their responses in the training set discriminated between different stimulus orientations [following an approach similar to that described by Saproo and Serences, 2010 but using the kernel-density estimation software developed by Alex Ihler (University of California, Irvine, Irvine, CA), freely available at http://www.ics.uci.edu/∼ihler/code/kde.html, to estimate probability distributions]. Briefly, MI indexes information contained in the response distribution of a voxel about stimulus orientation without making a priori assumptions about the shape of the response distribution. This method ensured that noisy voxels or voxels that did not show orientation selectivity, based on only data from the training set, were excluded from the MVPA and forward encoding modeling. Functional ROIs in V1 were defined using the top 75% of voxels with the highest MI values.
Model predictions.
As shown in Figure 2, perceptual coding and decision integration models make very different predictions regarding the nature of the population response of orientation-selective cells in visual cortex. On the one hand, perceptual coding models posit that the simultaneous encoding of visual items requires the distribution of a finite resource, in which the perceptual quality of the stimulus degrades as more items must be encoded at one time (Posner et al., 1980; Fisher, 1984; Bundesen, 1990; Zhang and Luck, 2008; Palmer et al., 2011). According to these models, the number of simultaneously attended items should modulate the quality of the population response, such that a less precise neural representation should be observed when distributing attention across a greater number of items (Fig. 2B). Thus, these models assume that observed set size effects result from a degradation of orientation-selective responses. On the other hand, decision integration models posit that perceptual noise is identical across large variations in set size such that the quality of the stimulus at encoding is independent of set size (Palmer, 1995; Palmer and McLean, 1995; Eckstein et al., 2000; Verghese, 2001; Wilken and Ma, 2004). According to this class of models, the quality of the population response is unaffected by the number of items being attended at any given time; rather, noise arising from multiple competing representations during the decision stage influence the ability to accurately apprehend noisy internal representation. Under this view, the probability of a single nontarget representation randomly reaching criterion threshold increases as the number of noisy internal representations increases, resulting in an observed set size effect attributable to decision-related noise. Thus, these models assume no influence of set size on population responses in early visual cortex (Fig. 2A).
Results
Task performance behavioral training
In the staircasing procedure, we independently estimated change detection thresholds during the monitoring of one or two flickering gratings. The target event was a change in the orientation of one of the attended gratings. If a subject correctly responded with the direction of orientation change, the magnitude of angular deviation for that condition (set size 1 or 2) decreased; conversely, if a subject responded incorrectly with the direction of orientation change, the magnitude of angular deviation for that condition increased. Thus, we recorded changes in angular deviation thresholds between set sizes 1 and 2.
To estimate discrimination thresholds, we calculated the average angular deviation observed from 10 trials after the start of the experiment (Fig. 3A, black dots represent individual subject estimates, and error bars reflect within-subject error). Consistent with previous studies, we found an effect of set size on detection thresholds (t(13) = 3.89, p < 0.01), in which attending two items required a larger angular deviation to reach the same level of performance as when attending one item.
Task performance: MRI
In the primary task, the magnitude of angular deviations across conditions was yoked to set size 1 detection thresholds estimated during the staircasing procedure. This method revealed a significant effect of set size on detection accuracy (t(13) = 9.47, p < 0.001; Fig. 3B). Thus, this procedure was sensitive to the known performance costs of attending additional items.
Event-related BOLD response
Previous work has reported an effect of attention on overall BOLD response magnitudes in V1 (Kastner et al., 1998; Gandhi et al., 1999; Kastner and Ungerleider, 2000). This observation allows us to set up predictions for the two competing models. According to the perceptual coding model, which posits that set size effects result from a degraded response in early visual areas, we should observe a larger BOLD response within a given ROI when attending one item compared with attending two because fewer attentional resources are allocated to each item with increasing set size. According to the decision integration model, which posits that set size effects arise from competition between multiple noisy representations at the decision stage, we should observe no effect of set size on BOLD activity within a given ROI because unlimited processing capacity allows each item to receive the same amount of resources independent of set size. In line with the predictions of the perceptual coding model, we observed a significant effect of set size (F(1,13) = 8.21, p < 0.05) on the evoked BOLD response (Fig. 4) such that the BOLD response was reduced when monitoring two items compared with monitoring one item. This finding mirrors that of Pestilli et al. (2011), who found that attending one item evoked a larger BOLD amplitude than when attending four. Furthermore, because the activation of the hemodynamic response requires several seconds to initiate, the hemodynamic response functions indicates no difference between set sizes until peak amplitude is reached, at which point the set size effect is evident; this effect elicits a significant interaction between set size and time was observed (F(10,4) = 5.47, p < 0.01). As a result of this difference around the peak, all subsequent analyses focused on response patterns measured 4–8 s after stimulus onset, which demonstrated a similar set size effect (t(13) = 6.44, p < 0.001). The BOLD response evoked by the unattended stimulus was also reduced relative to set size 1 (t(13) = 3.28, p < 0.01) and set size 2 (t(13) = 2.35, p < 0.05) trials. Critically, although an effect of set size was observed in the hemodynamic response function epoch, this effect did not predict set size effects in behavior (R2 = 0.03, t(13) = 0.60, p = 0.28). Here again, this finding echoes that of Pestilli et al. (2011), who found that the observed changes in BOLD amplitude were insufficient to explain the observed declines in behavior as set size increased. However, as our subsequent analyses will suggest, the null result we obtained may reflect a limitation in the sensitivity of univariate analyses to qualitative changes in target-related sensory codes.
Multivoxel pattern classification
Next, we examined whether orientation information could be decoded from the pattern of BOLD-related responses across V1. If the quality of a given representation degrades with the addition of multiple attended items as predicted by the perceptual coding model, then we should observe a decline in decoding accuracy for that representation in V1. The results of the MVPA analysis on both condition (set sizes 1 and 2) and hemisphere (contralateral, ipsilateral) are depicted in Figure 5. Classification accuracy declined reliably as set size increased (F(1,13) = 8.96, p < 0.01) but was equivalent between contralateral and ipsilateral hemispheres (p = 0.98). This latter result corroborates past work showing the spatially global (i.e., bilateral) character of stimulus-specific activity even when a lateralized stimulus is encoded (Serences and Boynton, 2007; Ester et al., 2009). In other words, previous work has shown that, during both encoding and storage in working memory, lateralized stimuli are represented in both the left and right hemispheres (Jehee et al., 2011). Thus, in a task like ours with stimuli presented in both the right and left visual fields, we expect to see both a contralateral and an ipsilateral representation of each stimulus, such that the neural representations of these stimuli are interleaved across both sides of visual cortex (and in cortical regions responsive to locations outside of the retinotopic positions associated with the stimuli). All conditions were significantly above chance (p < 0.01). Critically, left- and right-field stimulus orientations were counterbalanced across trials. Thus, this finding cannot be explained by covariations in stimulus orientation across hemifields. These MVPA results are consistent with the hypothesis that the precision of perceptual representations declines as the number of to-be-attended items increases because the reliability of stimulus decoding concurrently declines. Supporting this inference, we observed a link between declines in stimulus decoding (subtracting classification accuracy for set size 2 from set size 1) and behavioral accuracy, in which larger set-size-dependent declines in decoding accuracy predicted similar declines in behavioral accuracy (R2 = 0.31, p < 0.05).
However, although MVPA is a valuable tool for inferring the attended feature value across a population of weakly selective voxels, this method is limited in its ability to reveal how the sensory population response was influence by the requirement to attend an additional item. Specifically, we can only conclude from this analysis that the patterns of activation associated with each stimulus became less linearly separable as set size increased, whereas the underlying mechanism leading to the poorer classification accuracy is not specified. For example, worse classification accuracy might emerge as a result of decreases in the overall amplitude of orientation-selective population response profile or a decrease in the selectivity of the population response profile; both alternatives can explain the observed link between declines in decoding accuracy and behavior. Thus, we used a forward encoding model to provide a clearer understanding of how population-level feature-selective tuning functions are affected when attention is distributed across multiple items.
Evaluating population-level feature-selective tuning functions in V1
So far, we have demonstrated reliable set size effects in behavioral- and BOLD-related responses during a perceptual monitoring task. To better understand the source of these set size effects, we now turn to the results of the forward encoding model, which estimates population-level orientation-selective responses based on patterns of activation measured in V1 during our experimental task. These response patterns were estimated using a forward encoding model of orientation selectivity (Brouwer and Heeger, 2009; Serences and Saproo, 2012; see forward encoding model) for both contralateral and ipsilateral hemispheres. As reported in the multivoxel pattern classification above, stimulus-specific information was reliably decoded in both contralateral and ipsilateral hemispheres. Therefore, we implemented a similar approach by examining orientation-selective populations responses in both contralateral and ipsilateral ROIs. Because the MVPA was suggestive of a spatially global, stimulus-specific representation (Serences and Boynton, 2007; Ester et al., 2009; Jehee et al., 2011), we anticipated a clear population tuning function centered over the channel preferring the attended orientation for both set sizes and hemispheres (i.e., contralateral and ipsilateral). As shown in Figure 6, we observed a graded orientation-selective profile of channel activations that peaked in the channel corresponding to the orientation attended during the monitoring task in both contralateral (Fig. 6A) and ipsilateral (Fig. 6B) hemispheres.
In contralateral hemisphere (Fig. 6A), there is an apparent attenuation of on-channel responses (channels at or near the attended feature value, e.g., ±0, 22.5) and an increase in off-channel responses (channels farther from the attended feature value, e.g., ±67.5, 90) when monitoring more than one grating. This observed pattern was supported by an ANOVA, which revealed a significant condition (set size 1, 2) × channel interaction (F(2.0,26.1) = 4.52, p < 0.05; Greenhouse–Geisser corrected). Given that the effective allocation of attention simultaneously maximizes neuronal firing rates of on-channel cells and minimizes neuronal firing rates of off-channel cells (Treue and Martinez-Trujillo, 1999; Martinez-Trujillo and Treue, 2004), the observed pattern of results suggests that distributing attention across multiple items leads to an overall loss of selectivity in the population response in regions of visual cortex contralateral to the attended stimulus. We also modeled patterns of activation in ROIs ipsilateral to the attended stimulus (Fig. 6B) and observed no condition × channel interaction (p = 0.67). However, direct comparison of contralateral and ipsilateral responses revealed a main effect on channel responses (F(1,13) = 11.82, p < 0.01), such that channel responses were smaller in the ipsilateral relative to the contralateral hemisphere.
The critical result here is the loss of selectivity in the population response profile contralateral to the attended stimulus in the set size 2 condition. We considered whether this effect could be explained by contamination from the channel activity associated with the ipsilateral stimulus in the set size 2 condition rather than by a true change in the selectivity of the contralateral population response. That is, we considered whether—given that the response to the ipsilateral stimulus was amplified by attention in the set size 2 condition relative to the set size 1 condition—the activity associated with the ipsilateral stimulus could have yielded an artifactual change in the shape of the tuning function for the contralateral stimulus. However, one problem for this explanation is that the amplitude of channel activity at the target value (i.e., on-channel activity) in the set size 2 condition was reliably lower than in the set size 1 condition, consistent with past observations that attenuated feature-based selection yields relative declines in on-channel responses and increases in off-channel responses (Treue and Martinez-Trujillo, 1999; Martinez-Trujillo and Treue, 2004; Scolari et al., 2012). In contrast, the putative contamination by the ipsilateral stimulus considered here should have yielded a uniform increase in activity across all channels because the orientation of the ipsilateral stimulus was fully counterbalanced with respect to the contralateral stimulus. Thus, although the presence of the ipsilateral stimulus could have led to increases in off-channel activity, this alternative explanation cannot account for the observation that on-channel and off-channel activity showed opposite effects of attending an additional item. In addition, we report below that these changes in on-channel and off-channel activity for the contralateral stimulus predicted the behavioral cost of attending an additional item; this correlation is also inconsistent with an artifactual source of the observed changes in the CTF. Thus, contamination from the ipsilateral stimulus is unlikely to explain the observed reduction in the selectivity of the CTF in the set size 2 condition.
Another explanation for the observed pattern of results is that subjects simply fixated the to-be-attended stimulus on set size 1 trials. To examine this possibility, we examined the eye-tracking data recorded during the experimental task. Specifically, we computed the mean x, y position of saccades (defined here as stable fixations lasting >200 ms outside a 0.50° centered on fixation) as a function of target location (left vs right visual hemifield) during set size 1 trials. We failed to observe a difference in horizontal eye position between set size 1 conditions in which subjects had to attend the item on the left or right (t(5) = 0.57, p = 0.61) and both the attend left (t(5) = 1.40, p = 0.26) and attend right (t(5) = 0.04, p = 0.97) conditions did not differ significantly from the set size 2 condition in which both items were attended. Thus, our results cannot be explained by different patterns of eye positions across manipulations of set size.
Reductions in orientation selectivity predict behavioral performance
A key question is whether or not the observed changes in the profile of channel responses were linked with the observed changes in behavioral performance. Thus, we examined whether the set-size-dependent difference in each channel response predicted the observed cost in behavioral accuracy. First, because channel response functions were symmetric (Fig. 6), we collapsed channel response functions across orientation channels equally distant from the target orientation channel (e.g., −45° and 45° orientation channels). Then, we calculated the difference in channel responses between set sizes for every orientation channel (Fig. 7). This was estimated by subtracting the channel responses in set size 2 from the channel responses in set size 1. Thus, a positive channel difference reflects a larger channel response in set size 1 for any given channel, and a negative channel difference reflects a larger channel responses in set size 2 than for set size 1. As can be seen in Figure 7A (and Fig. 6A), there was an overall reduction in on-channel responses and an overall increase in off-channel responses when attending two items compared with when attention was focused on a single item. This analysis suggests that distributing attention to multiple items effectively reduces orientation selectivity during perception by suppressing on-channel responses and enhancing off-channels responses in set size 2 relative to set size 1 trials.
We then used this channel response difference measure to determine whether observed costs in orientation selectivity predict individual difference in the behavioral cost of attending an additional item. If distributing attention across multiple items effectively reduces the quality of orientation-selective representations measured in V1, then we should observe an orderly relationship between the shape of channel difference functions and the relative cost of attending multiple items. Specifically, individuals who evoke a larger gradient in channel response differences as a function of orientation channel should demonstrate a larger cost in behavioral accuracy when attending two items relative to one item because feature selectivity declines as the overall difference between on-channel and off-channel responses in channel difference functions increases. Consider the consequence of reducing or increasing on-channel or off-channel responses, respectively, on the quality of the information available during perception. On the one hand, by reducing on-channel population responses when attention must be allocated to more than one item, the strength of the signal derived from neural populations tuned toward the attended stimulus decreases and the overall activity level in on-channel cells is more similar to the baseline level of activity. On the other hand, by increasing off-channel responses when attention must be allocated to more than one item, the activity level of neural populations tuned away from the attended stimulus feature increases relative to baseline and the overall activity level in off-channel cells is more similar to the activity level of neural populations tuned toward to the attended stimulus feature. Together, decreases in on-channel responses (i.e., a positive channel difference value) and increases in off-channel responses (i.e., a negative channel difference value) are indicative of declines in feature selectivity. To quantify the magnitude of declines in feature selectivity, we calculated a channel modulation value by computing the slope of channel response differences as a function of orientation channel.
In line with the hypothesis that set-size-dependent declines in feature selectivity drive the concurrent declines in behavioral performance, we observed a strong link between channel modulation and accuracy costs such that individuals who evoked a larger decline in feature selectivity—defined as a more negative channel modulation value—were more likely to demonstrate a larger reduction in accuracy when attending multiple items. Importantly, this relationship was only present in cortical ROIs contralateral to the attended stimulus (Fig. 7C; R2 = 0.45, p < 0.01), whereas no relationship was observed in ipsilateral ROIs (Fig. 7D; R2 = 0.06, p = 0.41). Thus, in line with perceptual coding models, observed set size effects in behavioral performance are attributable to a reduction in the selectivity of orientation-specific neural populations in V1 during perception.
Discussion
There is active debate regarding the processing locus of declines in performance with concurrent increases in set size. Perceptual coding models posit that the quality of stimulus representations is constrained by the distribution of a finite neural resource (Posner et al., 1980; Fisher, 1984; Bundesen, 1990; Palmer et al., 2011), such that increased set sizes reduces the resources for each item, yielding declines in representational quality. Conversely, decision integration models posit that the internal noise represented in sensory cortices is independent of set size and that reported set size effects arise from competing noisy representations during post-perceptual decision stages of information processing (Palmer, 1995; Palmer and McLean, 1995; Eckstein et al., 2000; Verghese, 2001).
In the current work, we estimated population-level, orientation-selective responses in V1 during a perceptual monitoring task requiring the distribution of attention to one or two visual stimuli. We observed a loss of orientation selectivity when more than one item was attended. Specifically, there was a relative increase and decrease of off-channel and on-channel responses, respectively, which broadened the distribution of orientation-selective responses. Thus, these findings provide a mechanistic explanation of why decoding results from the MVPA showed a decline in accuracy with set size. Moreover, we observed that the degree to which channel responses were affected by set size predicted costs in behavioral accuracy for each observer in the study. Thus, our findings show a strong relationship between the declining precision in the orientation population response for attended items and the known behavioral costs of attending additional items.
The main conclusion of the current work must be reconciled with recent work demonstrating support for decision integration models (Pestilli et al., 2011). These authors measured univariate patterns of BOLD-related activity in striate and extrastriate visual areas during a contrast-discrimination task requiring the allocation of focal or distributed attention. In this task, subjects were presented with four gratings of variable contrast in each visual quadrant and were required to detect a change in the contrast of a single grating between two presentations of the same display. The shape of the BOLD contrast response functions in the focal and distributed attention conditions failed to support perceptual coding models because (1) behavior could not be explained via response enhancement during focal attention because no change in the slope of BOLD contrast response function was observed and (2) behavior could not be explained via noise reduction during focal attention because a quantitative model constrained by the psychophysical data was used to show that neural responses would need to undergo an unreasonably high reduction in noise. The authors then demonstrated that behavior was well described by a weighted pooling across responses to each stimulus, consistent with a decision integration account of efficient selection.
However, the empirical pattern observed by Pestilli et al. (2011) could arguably be task specific in that their contrast-discrimination task encouraged sensory pooling as an optimal strategy. Specifically, observers were required to indicate which frame contained a higher contrast target. This task may have been conducive to a pooling strategy in which a single representation reflecting average contrast was computed and compared across frames, and then observers selected the frame with the higher average contrast. In contrast, in the current work, observers were required to maintain individuated representations of multiple stimuli throughout the entire trial to detect an unpredictable change in stimulus orientation. Therefore, we argue that, in situations that require the precise encoding of multiple individuated representations, changes in behavioral performance can be well accounted for via changes in the precision of feature-selective sensory responses.
Multiple studies have demonstrated a relative enhancement of on-channel and suppression of off-channel neural response properties when attending to a single stimulus (Treue and Martinez-Trujillo, 1999; Martinez-Trujillo and Treue, 2004; Scolari et al., 2012), suggesting that attention increases the selectivity for attended features by increasing the responses of neurons preferring the attended feature value while decreasing responses of neurons tuned to the orthogonal feature value. Within the scope of these studies, our results suggest that attending multiple items encumbers a neural resource that drives the distinction between neural populations tuned either toward or away from the attended feature value. These findings are in line with perceptual coding models that predict that attending multiple items degrades the quality of perceptual representations. Thus, although our findings do not exclude the possibility of increased decision noise with larger set sizes (Palmer, 1995; Palmer and McLean, 1995; Eckstein et al., 2000; Verghese, 2001), our findings demonstrate that attending more items has a direct impact on the quality of perceptual codes and that these changes in perceptual quality can account for the associated declines in visually guided behavior.
Our results suggest that the selectivity of population responses in sensory cortical regions declines with increases in the number of attended items. Increases in perceptual noise during stimulus encoding would effectively degrade the fidelity of information transmission and processing in cortical units (Pouget et al., 2003), which would lead to lower-quality population responses and compromised stimulus decoding. One possible explanation of this decline in the precision of population responses comes from a class of computational models in which multiple items are represented by oscillatory phase coding mechanisms (Lisman and Idiart, 1995; Singer and Gray, 1995; Roelfsema et al., 1996; Jensen and Colgin, 2007; Canolty and Knight, 2010). According to these phase coding models, subpopulations of feature-selective cells code the details of a single representation through synchronized neural activity (Gray et al., 1989; Engel et al., 1991; Fries et al., 2001; for review, see Fries 2005; Siegel et al., 2008). To individuate each representation, the phase of each subpopulation of feature-selective cells is decoupled and maintained as asynchronous neural assemblies (Gray et al., 1989; Engel et al., 1991; Roelfsema et al., 1996; Eckhorn et al., 1998). Under this view, declines in the integrity of population responses may result from sensory units firing out of phase with their respective neural assembly and in phase with a different neural assembly. Thus, the probability of a given sensory unit firing in phase with a separate neural assembly would increase with the number of to-be-attended representations, leading to a decline in the precision of population responses.
A similar brand of phase coding models has been proposed for the maintenance of visual working memory representations (Lisman and Idiart, 1995; Raffone and Wolters, 2001), suggesting that similar declines in perceptual and mnemonic precision should be observed with increases in item load. Indeed, multiple studies have demonstrated an inverse relationship between mnemonic precision in visual working memory and set size (Wilken and Ma, 2004; Zhang and Luck, 2008; Anderson et al., 2011; Anderson and Awh, 2012). Thus, it is possible that a similar oscillatory coding scheme explains how items are individuated during both perception and memory and that declines in mnemonic precision with increasing set size are also attributable to increased dispersion in the population responses for items stored in memory (for a recent demonstration of declines in decoding accuracy as a function of set size, see Emrich et al., 2013).
The forward encoding model used here provides a powerful complement to standard MVPA methods by generating an estimate of the feature-selective population response profile based on a set of a priori, physiologically plausible assumptions. For instance, although an MVPA in the current study revealed identical classification accuracies for contralateral and ipsilateral hemispheres in both set size conditions, confirming past observations of spatially global mnemonic and perceptual codes (Serences and Boynton, 2007; Ester et al., 2009), the forward encoding model revealed a reduction in the amplitude of channel responses in ipsilateral relative to contralateral cortex. Thus, the forward encoding model revealed a basic distinction between population responses in contralateral and ipsilateral hemispheres that was invisible to the MVPA.
To conclude, we investigated the neural locus of declines in behavioral performance across concurrent increases in set size during a perceptual monitoring task. We measured orientation-selective population response profiles in human V1 by using a combination of fMRI and a forward encoding model of orientation selectivity. In line with the central tenant of perceptual coding models (Posner et al., 1980; Palmer et al., 2011), we observed a quantifiable attenuation in the selectivity of population responses that predicted the degree to which accuracy declined as a function of set size. This finding lends confidence to claim of perceptual coding models that the quality of encoded representations is constrained by the distribution of a finite neural resource and that attending multiple items requires a finer distribution of this limited resource. Thus, the current work provides a novel insight into the neural source of decline in behavioral performance when multiple items must be simultaneously attended.
Footnotes
This work was supported by National Institute of Mental Health Grant R01-MH087214 (E.A.).
The authors declare no competing financial interests.
- Correspondence should be addressed to either David E. Anderson or Edward Awh, Department of Psychology, 1227 University of Oregon, Eugene, OR 97403, dendersn{at}gmail.com or awh{at}uoregon.edu