Abstract
In a neural population driven by a simple grating stimulus, different subpopulations are maximally informative about changes to the grating's orientation and contrast. In theory, observers should attend to the optimal subpopulation when switching between orientation and contrast discrimination tasks. Here we used source-imaged, steady-state visual evoked potentials and visual psychophysics to determine whether this is the case. Observers fixated centrally while static targets were presented bilaterally along with a cue indicating task type (contrast or orientation modulation detection) and task location (left or right). Changes in neuronal activity were measured by quantifying frequency-tagged responses from flickering “reporter” gratings surrounding the targets. To determine the orientation tuning of attentionally modulated neurons, we measured responses for three different probe-reporter angles: 0, 20, and 45°. We estimated frequency-tagged cortical activity using a minimum norm inverse procedure combined with realistic MR-derived head models and retinotopically mapped visual areas. Estimates of neural activity from regions of interest centered on V1 showed that attention to a spatial location clearly increased the amplitude of the neural response in that location. More importantly, the pattern of modulation depended on the task. For orientation discrimination, attentional modulation showed a sharp peak in the population tuned 20° from the target orientation, whereas for contrast discrimination the enhancement was more broadly tuned. Similar tuning functions for orientation and contrast discrimination were obtained from psychophysical adaptation studies. These findings indicate that humans attend selectively to the most informative neural population and that these populations change depending on the nature of the task.
Introduction
Population coding theory (Seung and Sompolinksy, 1993) predicts that observers should base their judgments on the output of the most informative population of sensory neurons. However, even when the stimulus is constant, the identity of the most informative neural population can change depending on the visual feature to be discriminated. Therefore, to perform optimally on any particular task, observers must either modulate these early population responses directly, amplifying informative neuronal responses and down-weighting irrelevant ones, or else pool from them in a selective manner at some higher level.
Here, we asked whether responses from populations of orientation-tuned neurons in humans changed depending on whether the task was orientation or contrast discrimination. For fine discrimination, the most informative neurons are those with the greatest sensitivity in the region of the attribute to be discriminated. This sensitivity depends on the underlying shape of the response function, which is often modeled as a Gaussian for orientation and a sigmoid for contrast. In the case of orientation discrimination, a small orientation change around a vertical target causes a greater differential response in populations with preferred orientations away from the test orientation, compared with a population that matches the test orientation (Fig. 1A). Thus, for orientation discrimination, we expect attention to select responses from off-peak populations. The exact off-peak population that is most sensitive depends on the bandwidth of the underlying tuning curves and on the spatial frequency of the stimulus (Phillips and Wilson, 1984; Snowden, 1992).
For contrast discrimination, the population that matches the test orientation has the greatest differential response to a contrast change while populations tuned to nearby orientations have a smaller differential response to the same contrast change (Fig. 1B; Sclar and Freeman, 1982; Albrecht et al., 1984). However, theory predicts that the optimal population depends on the ratio of the differential response to its SD (Seung and Sompolinsky, 1993; Scolari and Serences, 2009). If the associated noise is Poisson, with variance proportional to the mean, then populations that match the test orientation may be slightly more informative than those tuned to other orientations. Thus, for contrast discrimination, we expect a broad peak at the orientation that matches the target.
We used source-imaged electroencephalography (EEG) to determine whether early visual areas such as V1 showed a difference in the modulation of these orientation-tuned populations, depending on the task. Specifically, we used frequency tagging to separate neural modulations driven by two spatially separated stimuli and to dissociate the effect of attention to location from attention to features in two tasks: contrast discrimination and orientation discrimination.
We found clear evidence of task-dependent changes in neural population responses in primary visual cortex. Remarkably, these changes matched both theoretical predictions for optimal orientation discrimination as well as our own psychophysical data collected using stimuli identical to those in the imaging study. These are the first direct measurements of task-dependent adaptive changes in human neural populations and they are strong evidence that selection for visual attributes occurs at an anatomically early stage in the cortical visual pathway.
Materials and Methods
Observers
A total of 14 observers, nine male and five female, were recruited to our experiments. One of these observers was unable to maintain steady fixation, so her data were discarded. All the remaining observers participated in the steady-state visual evoked potential (SSVEP) experiments with reporter annuli tilted 0 and 20° with respect to the vertical target. Eleven of these 13 observers also participated in the SSVEP experiments with the annulus tilted at 45°. All observers had normal or corrected-to-normal visual acuity, gave informed written consent to participate as paid volunteers, and were tested individually in a dark room. The human subjects review committee of Smith-Kettlewell Eye Research Institute approved the study.
Psychophysics
Display.
Stimuli were displayed on a 19 inch Electron Blue II CRT monitor (Lacie) that subtended 21° degrees horizontally and 17° degrees vertically at a viewing distance of 1 m with a frame rate of 100 Hz. Luminance calibration was performed using a photometer and monitor gamma tables were adjusted to ensure response linearity and a constant mean luminance of 32.5 cd/m2.
Experiment.
Observers made two-alternative spatial forced-choice judgments on two vertical targets [2° diameter circular-windowed grating, 2 cycles per degree (c/deg), 50% contrast] located 5° to the left and right of fixation. Each session began with 1 min of adaptation to a full-field grating undergoing counterphase flicker (90% contrast, 2 c/deg, 5 Hz) and adaptation was “topped up” for 10 s between trials using the same stimulus (Fig. 2A). Targets were presented for 200 ms following an interval of 300 ms after the offset of the adaptor. The orientation of the adaptor grating was varied between 0 and 40° (clockwise tilt with respect to the vertical target) across blocks of trials relative to the default target orientation. Adaptor orientation was kept fixed over a block of trials. In separate blocks, observers (n = 4) performed either contrast increment detection or clockwise orientation change detection. The method of constant stimuli was used to measure proportion correct at five values of contrast increment, or clockwise orientation change with respect to the 50% contrast, vertical target. The values of the increment were adjusted for each observer to span their initial estimate of threshold. The values for each of these steps ranged from 1 to 2% for contrast increment, and from 0.5 to 1° for clockwise orientation change.
Data.
For each adaptor, proportion correct data as a function of orientation (contrast) increment were fit with a Weibull function. Threshold orientation (contrast) was estimated as the value corresponding to 81.6% correct performance. The threshold elevation data shown in Figure 7 were normalized by the threshold for the 0° adaptor for each observer and averaged across observers. The error bars indicate the SD of these normalized thresholds across observers.
EEG
Display.
EEG stimuli were displayed on a 19 inch Electron Blue IV CRT monitor (Lacie) that subtended 20.5° degrees horizontally and 15.3° degrees vertically at a viewing distance of 1 m. Stimuli were generated and displayed using an in-house EEG stimulus display system (PowerDiva) that ensures submillisecond-level temporal accuracy. Both temporal and luminance calibrations were performed using a calibrated photocell and monitor gamma tables were adjusted to ensure response linearity and a constant mean luminance of 49 cd/m2. The display monitor had a frame rate of 120 Hz.
SSVEP stimulus parameters.
The target gratings were identical to the ones used in the psychophysical experiments in size, contrast, and spatial frequency. They were presented 1.5° below and 4.5° to the left and right of fixation. The static target grating was surrounded by a reporter annulus with the same spatial frequency as the target that flickered on and off at 15 and 20 Hz on the left and right, respectively (Fig. 2B). The contrast of the reporter gratings were set at 75 and 83% respectively so they appeared perceptually matched in contrast and generated EEG responses of approximately equal amplitude. The reporter annulus was separated by a gap of 0.25° from the static target and had an inner diameter of 2.5° and an outer diameter of 5°.
Each trial lasted 2 s and started with the appearance of the cue at the fixation point indicating the task (contrast or orientation discrimination) and the location (left or right) of the increment. The target and grating stimuli came on 600 ms before the start of the trial to eliminate onset transients. A modulation (either of contrast or of orientation) was present on 50% of the trials. When present, the increment came on 1 s after the start of the trial and lasted 200 ms. At the end of the trial, the observer used one of two keys (j or k) to indicate the presence or absence of the increment. Both modulation types were adjusted for each observer to maintain 85–90% correct detection (Fig. 3). On average, observers required a 17% increase in contrast and ∼2.5° clockwise tilt in orientation to perform at this level. We ran three blocks of trials each with a fixed reporter orientation of 0, 20, or 45°. For each reporter orientation, there were eight conditions associated with attend feature (contrast/orientation), attend location (left/right), and increment (present/absent). These eight were (1) contrast–left–present; (2) contrast–right–present; (3) contrast–left–absent; (4) contrast–right–absent; (5) orientation–left–present; (6) orientation–left–absent; (7) orientation–right–present; and (8) orientation–right–absent. To ensure that the measured responses were due to differential attention allocation and not due to stimulus differences in increment-present trials, we analyzed only the four increment-absent trials in which the stimuli were physically identical.
Eye-movement monitoring
We used the horizontal electro-oculogram (HEOG) to monitor eye movements, which was measured as the voltage difference between sensors placed at the left and right outer canthi of participants' eyes. For each surround orientation, we estimated the instantaneous deviation of mean eye position for each observer and normalized it by the SD of his/her eye position. This z-score value was averaged across all observers for that condition. We also ran separate experiments to calibrate the deviation of the HEOG signal for planned eye movements to targets at an eccentricity of 0.5, 1, 2, and 4° both to the left and right of fixation. These eye movements caused HEOG amplitude to increase approximately linearly: saccades of 0.5 and 4° resulted in z-scores of 0.3 and 1.5, respectively. From these calibration data, we determined that the largest z-score across all of our conditions (0.3) corresponded to saccades of <0.5° in amplitude. Thus eye movement artifacts did not contribute significantly to the pattern of evoked responses.
EEG signal acquisition and source imaging procedure
The EEG data were collected with 128-sensor HydroCell Sensor Nets (Electrical Geodesics) (Fig. 2C). The analog signals were sampled at 600 Hz and were bandpass filtered from 0.1 to 200 Hz. Following each experimental session, the 3D locations of all electrodes and three major fiducials (nasion, left and right periauricular points) were digitized using a 3Space Fastrack 3-D digitizer (Polhemus). For all observers, the 3D digitized locations were used to coregister the electrodes to their T1-weighted anatomical MRI scans.
Raw data were evaluated off line according to a sample-by-sample thresholding procedure. Noisy sensors were replaced by the average of the six nearest spatial neighbors. Once noisy sensors were substituted, the EEG was rereferenced to the common average of all the sensors. Additionally, EEG epochs that contained a large percentage of data samples exceeding threshold (25–50 μV) were excluded on a sensor-by-sensor basis.
Structural MRI and fMRI
Structural MRI and fMRI scanning was conducted on a 3 T Tim Trio scanner (Siemens) using a 12-channel head coil. We acquired a T1-weighted MRI dataset (3-D MP-RAGE sequence, 0.8 × 0.8 × 0.8 mm3) and a 3D T2-weighted dataset (spin echo sequence at 1 × 1 × 1 mm3 resolution) for tissue segmentation and registration with the functional scans. For fMRI, we used a single-shot, gradient-echo EPI sequence (TR/TE, 2000/28 ms; flip angle, 80°; 126 volumes per run) with a voxel size of 1.7 × 1.7 × 2 mm3 (128 × 128 acquisition matrix; 220 mm FOV; bandwidth, 1860 Hz/pixel; echo spacing, 0.71 ms). We acquired 30 slices without gaps, positioned in the transverse-to-coronal plane approximately parallel to the corpus callosum and covering the whole cerebrum. Once per session, a 2D spin echo T1-weighted volume was acquired with the same slice specifications as the functional series to facilitate registration of the fMRI data to the anatomical scan.
The FreeSurfer software package (http://surfer.nmr.mgh.harvard.edu) was used to perform gray and white matter segmentation and a midgray cortical surface extraction. This cortical surface had 20,484 isotropically spaced vertices and was used both as a source constraint and for defining the visual areas. The FreeSurfer package extracts both gray/white and gray/CSF boundaries, but these surfaces can have different surface orientations. In particular, the gray/white boundary has sharp gyri (the curvature changes rapidly) and smooth sulci (slowly changing surface curvature), while the gray/CSF boundary is the inverse, with smooth gyri and sharp sulci. To avoid these discontinuities, we generated a surface partway between these two boundaries that has gyri and sulci with approximately equal curvature.
Individual boundary element method conductivity models were derived from the T1-weighted and T2-weighted MRI scans of each observer. The FSL toolbox (http://www.fmrib.ox.ac.uk/fsl/) was also used to segment contiguous volume regions for the scalp, outer skull, and inner skull and to convert these MRI volumes into inner skull, outer skull, and scalp surfaces (Smith, 2002, 2004).
Visual area definition
The general procedures for these functional scans (e.g., head stabilization, visual display system) are standard and have been described in detail previously (Brewer et al., 2005). Retinotopic visual field mapping defined regions of interest (ROI) corresponding to visual cortical areas V1, V2v, V2d, V3v, V3d, V3a, and hV4 in each hemisphere (Sereno et al., 1995; Engel et al., 1997; Tootell et al., 1997; Press et al., 2001; Wade et al., 2002). ROIs corresponding to hMT+ were identified using low-contrast motion stimuli similar to those described by Huk and Heeger (2002).
Cortically constrained inverse
An L2 minimum norm inverse was computed with sources constrained to the location and orientation of the cortical surface (Hämäläinen et al., 1993). In addition, we modified the source covariance matrix in two ways to decrease the tendency of the minimum norm procedure to place sources outside the visual areas. These constraints involved (1) increasing the variance allowed within the visual areas by a factor of two relative to other vertices and (2) enforcing a local smoothness constraint within an area using the first-order and second-order neighborhoods on the mesh with a weighting function equal to 0.5 for the first order and 0.25 for the second. The smoothness constraint therefore respects areal boundaries, unlike other smoothing methods, such as low-resolution electromagnetic tomography (LORETA), that apply the same smoothing rule throughout cortex (Pascual-Marqui et al., 1994).
ROI-based analysis of the SSVEP
A discrete Fourier transform was used to estimate the average response magnitude associated within each functionally defined ROI for the first-harmonic component of the steady-state frequencies 15 and 20 Hz. As can be seen in Figure 4B, which plots the V1 sources for one observer, the strongest response is at the first harmonic of the input frequencies, consistent with the on–off temporal modulation. To take into account the different noise levels for each of our observers (Vialatte et al., 2010), we computed the signal-to-noise ratio (SNR) by dividing peak amplitudes by the associated noise, which is defined for a given frequency f by the average amplitude of the two neighbor frequencies (i.e., f − δf and f + δf where δf gives the frequency resolution of the Fourier analysis). For this observer, attention clearly enhances the response to the 15 Hz stimulus. Figure 4C compares the response to the 15 Hz component when the observer attends left and attends right. The frequency-tagged first harmonic component is greater when the observer attends left, to the location of the 15 Hz component.
Figure 5 shows the average (n = 13) Fourier amplitude of the evoked responses in cortical area V1 for the attend-contrast condition in the presence of a vertical annulus. The responses to frequencies f1 (15 Hz) and f2 (20 Hz) are shown in the upper and lower rows respectively. The columns show responses in the left and right hemispheres. It is clear that the annuli produce significant responses only in the contralateral hemisphere: the 15 Hz annulus in the left visual field produces significant responses only in the right hemisphere and vice versa for the 20 Hz annulus in the right visual field. We analyzed the responses to both the 15 and 20 Hz stimuli but plot only the 15 Hz responses in the Results section because the frequency-tagged responses to the 20 Hz stimuli not only had lower amplitude as expected with a higher temporal frequency, but also had much poorer SNR. The average SNR in area V1 for the 20 Hz condition is 2.5 (Fig. 5), but this includes three observers with SNR significantly <1. The average SNR in area hV4 for the 20 Hz condition is only 1.5 and includes five observers with SNR not statistically different from 1. This is not due to differences in task difficulty: the proportion correct response to stimuli on the right with the 20 Hz surround and on the left with the 15 Hz surround were statistically indistinguishable. These values were 89.3 ± 4.5% and 87.8 ± 5.1%, respectively.
The attentional modulation of the cortical sources is defined in Equation 1: When examining the effect of spatial attention, we compared the SNR of the frequency-tagged response when the observer attended to the target, to the SNR of that frequency when the observer attended away (attended to the target on the opposite side). Thus we were able to determine the modulation due to spatial attention for each of the attend-orientation and the attend-contrast tasks.
The above measure combines both spatial attention and feature attention (orientation/contrast). To isolate the effect of feature attention, we subtracted the modulation in the contrast task from that in the orientation task. More specifically the modulation index for feature attention (FAMI) is expressed in Equations 2 and 3:
Statistical analysis of SNRs
We performed one-tailed paired t tests between responses to the 20° reporter annulus and compared it to responses to the 0 and 45° annuli to determine whether the population responding to the 20° tilt indeed had the greatest attentional modulation as predicted by population coding theory and our initial psychophysical experiments.
Cross talk
We estimated the theoretical cross talk among visual areas in our EEG study using the calculation described by Cottereau et al. (2011) and Lauritzen et al. (2010). Cross talk refers to the neural activity generated in other areas that is attributed to a particular ROI, due to the smoothing of the electric field by the head volume. In brief, for each observer, we used the same forward and inverse methods described above to simulate the cross talk by placing sources in one ROI and estimating their contribution to other ROIs. The global cross-talk matrix averaged across all the observers who participated in our EEG experiments is shown in Figure 6 for the three ROIs we consider (V1, hV4, V3a); the cross-talk magnitude shown in the matrix is proportional to activity originating in the ROI where the cross talk is being estimated.
Values at row i and column j represent the relative contribution of area j to the cortical current density estimate in area i. The normalization is obtained by dividing by the amplitude obtained in area i when only area i was activated in the simulation set. For example, when we estimated the activity in V1, the absolute amplitudes obtained when hV4 and V3a were simulated independently (i.e., the second and third columns of the first row of the cross-talk matrix) were respectively 16 and 23% of the amplitude in V1 when only V1 was activated. (i.e., first row, first column). An ideal estimation of the cortical current densities would lead to zero cross talk (an identity matrix). In our study, hV4 and V3a received on average <30% cross talk from other areas. This means that our estimates of activity in each ROI are not influenced significantly by our other ROIs. These cross-talk estimates are worst-case scenarios since they assume that each area contributes independent, additive noise. In practice, noise from remote areas will contain harmonics that are not perfectly in phase and significant noise cancellation will therefore occur. Our cross-talk matrix indicates that activity in three ROIs is largely, but not completely, due to activity generated in the corresponding visual area.
Although we have estimated the probable contribution of cross talk among our chosen ROIs (V1, hV4, and V3a within the dotted lines), one could argue that errors in our estimates may also come from neighboring cortical regions. In this context, V1 is completely surrounded by areas V2 and V3, yet these areas have only a marginal influence on it (11 and 15%, respectively; Fig. 6, first row, fourth and fifth columns). From our simulations, it is also apparent that there is significant cross talk between areas in the same foveal cluster, such as V2 and V3 (the last two rows of the matrix). For this reason, we excluded these two ROIs from further analysis and focused on V1, V3A, and hV4. These areas are more widely separated and their estimated activities are therefore more reliable.
Results
Psychophysics
In our psychophysical experiments, we used adaptation as a way of reducing the sensitivity of well defined populations of orientation-tuned neurons. Figure 7 plots orientation and contrast discrimination thresholds for each adaptor orientation with respect to the threshold for the vertical (0°) adaptor. Threshold elevation following adaptation to a particular orientation serves as an index of the contribution of the populations most sensitive to the adapted orientation. The bars represent threshold elevation averaged across our four observers for the contrast and orientation tasks, respectively. Contrast thresholds were relatively invariant with adaptor orientation, except for a small increment at the target orientation. On the other hand, orientation thresholds clearly peaked at ∼20° from the target orientation. The orientation discrimination results mirror the finding from classic psychophysical studies that orientation discrimination is most impaired following adaptation to stimuli tilted away from the target orientation. (Regan and Beverley, 1985; Navalpakkam and Itti, 2007; Scolari and Serences, 2009). The novel contribution of this study is the comparison of orientation and contrast discrimination following adaptation to gratings of various orientations.
The psychophysical data are consistent with theoretical predictions for how populations tuned to different orientations contribute to orientation discrimination, given our current understanding of orientation tuning curves (Fig. 1). The differential response to a small change in orientation is greater for populations tilted away from the target orientation, and for our stimuli, it appears as if populations tilted 20° away mediate orientation discrimination. The angle at which an off-channel population is most sensitive depends critically on the bandwidth of the population selective for the spatial frequency of the target, and tends to increase as bandwidth broadens. As lower spatial frequency mechanisms have larger bandwidths (Phillips and Wilson, 1984; Snowden, 1992), it is not surprising that the off-channel peak occurs at 20° with our 2 c/deg compared with the off-channel peak at 12–15° observed with an 8 c/deg stimulus (Wilson and Regan, 1984).
For contrast discrimination, the populations with a broad range of orientation preference around the target orientation appear to contribute similarly, with perhaps a small peak at the target orientation. This result suggests that the variable mediating contrast discrimination performance, the differential response normalized by the SD of the associated noise, changes more slowly than predicted by Itti et al. (2000).
We then asked whether attention to fine orientation differences modulated the same populations implicated in the perceptual decisions. Specifically, we asked whether asking subjects to perform an orientation discrimination judgment amplified the responses of the V1 neural population tuned 20° away from the target orientation. To probe the contribution of different populations, we surrounded the test gratings with reporter annuli that had the same orientation as the target (0° or vertical), as well as annuli tilted 20 and 45° clockwise with respect to the target.
EEG
Topography
Figure 8 shows the scalp topography of the SSVEPs for the surround orientations of 0, 20, and 45°, respectively. The SNR of the 15 Hz response to the stimulus in left visual field is shown, averaged over the 11 subjects who participated in sessions with all three annulus orientations. Note that the topographic maps in response to this stimulus are largely contralatateral. If observers had instead looked directly at the stimulus (rather than at the fixation point), the topographic maps would have shown high activity in the representation of the bilateral medial posterior areas, rather than in contralateral visual areas. The observed contralateral pattern of activity combined with the extremely small magnitude of the horizontal EOG suggests that artifacts due to eye movements were minimal.
The sessions were blocked by relative annulus orientation, as labeled in Figure 8,A–C. Within each of these, the first and second rows show the responses for orientation and contrast discrimination, respectively. The first and second columns plot the responses in the attended and unattended conditions, while the third column shows the difference due to attention. Attending to the target grating at the center of the annulus increased the response evoked by the surrounding annulus in some conditions as shown here and reported in Kim and Verghese (2012). We cannot compare the absolute values of the SNR in the attended and unattended conditions across reporter orientations, as sessions were blocked by reporter orientation, but we can compare the modulation due to attention across reporter orientation (third column). Figure 8 shows that for the orientation discrimination task the biggest difference in SNR between the attended and unattended conditions occurs for a reporter orientation of 20° and not at the target orientation (0°), consistent with the threshold elevation observed in our psychophysical experiments. For the contrast discrimination task, the relative difference due to attention shows a significantly different pattern. The SSVEP response appears to be of similar magnitude for reporter orientations of 0 and 20°, and somewhat less for a reporter orientation of 45°. These relative differences are not due to differences in task difficulty, as the accuracy was similar for orientation and contrast discrimination across all annulus orientations (Fig. 3).
It could be argued that our results suffer from potential center-surround effects of the flickering annuli on the central target, or vice versa. While there may indeed have been suppression between center and surround that depended on the relative orientation difference between them (Petrov et al., 2005; Webb et al., 2005), suppression cannot account for the difference in results that we see in the attend-orientation and attend-contrast conditions. Recall that in a session with a fixed annulus orientation, the SSVEP response is from increment-absent trials to identical stimuli: the central target was static, and the surrounding annulus flickered. The only difference between these conditions was the task instruction: the cue to attend to orientation or to contrast.
Cortical sources
Scalp topographies give only an approximate measure of the cortical response. Our source-imaging procedure, however, can extract well localized neuronal data from individual visual area clusters. We plot cortical current density in two ways. Figure 9 shows the SNR difference between the attended and unattended condition, while Figure 10A plots the attentional modulation index, which is the differential response due to attention normalized by the sum of the responses in the attended and unattended conditions. Each of these plots has its advantages. Figure 9 relates more directly to the scalp topography as it plots SNR difference in each ROI, while Figure 10A plots the attentional modulation index used in single unit studies (McAdams and Maunsell, 1999). Moreover, the normalization of the differential response in the attentional modulation index equalizes the contribution from all observers, whereas the simple difference in SNR shown in the scalp topography weights the data of those observers with the largest differences.
Cortical current density averaged across sources in contralateral V1 (Fig. 9) shows a pattern of results similar to that for the scalp topography (Fig. 8). Attention to a spatial location preferentially increased the amplitude of the neural response to some annulus orientations. More importantly, the pattern of modulation depended on the task and on the orientation of the reporter annulus. For orientation discrimination, the estimated cortical activity in area V1 showed a sharp peak in attentional modulation in the 20° offset condition, whereas for contrast discrimination no such peak was evident. Similar tuning functions for orientation and contrast discrimination, respectively, were obtained from the psychophysical adaptation studies (Fig. 7). These findings indicate that humans attend selectively to the most informative neural population and that attention changes the responsivity of discrete neural populations depending on the nature of the task. Furthermore, attentional selection can be detected at an anatomically early stage: at the level of the V1 cluster (Fig. 4B).
For comparison, we also estimated current density in other cortical clusters centered on ventral (hV4) and dorsal (V3A) visual areas (Tootell et al., 1997; Wade et al., 2002). Area V3a is part of a separate, dorsal visual area cluster. Area hV4, while technically part of the V1 cluster, is dominated by responses from the adjacent ventral surface “VO” cluster. In general, these areas have a relatively low level of cross talk with V1, compared with such areas as V2 and V3, which lie closer to striate cortex (Lauritzen et al., 2010).
Both the differential response and the attentional modulation index from area hV4 showed a similar pattern to that in V1 with a sharp peak at 20° for orientation discrimination and broad tuning for contrast discrimination (Figs. 9, 10A). By comparison, responses from the dorsal cluster, while robust, showed no clear peaks in either tuning function. The differential response in area V3a (Fig. 9) for the contrast discrimination task mirrors the trend observed in the scalp topography showing significant attentional enhancement for the 0 and 20° surround, although this pattern is not so clear in the attentional modulation index plots. The close correspondence between scalp topography and the differential cortical source response in V3a is likely because V3a is a compact, dorsolateral area that generates a strong radial electric field, allowing it to be well imaged by EEG.
As mentioned in the Materials and Methods section, the 20 Hz responses had poor SNR. Three of the 11 observers had SNR values <1 for the 20 Hz condition, implying that they had no significant driven response at this frequency. If we restrict our analysis to observers with SNR >1 and combine the 15 and 20 Hz data, then the attentional modulation index of the 20° surround is still significantly greater than the response to the 0° surround, but only in the orientation condition (paired t test, p = 0.02). This is consistent with the effects we measured in the 15 Hz responses alone. None of the other paired comparisons reached significance for the combined data. The responses from area hV4 are weaker than those in area V1, presumably because hV4 is a small region partially located on the ventral surface of the visual cortex and oriented away from our electrode array. If we look at cortical source data from hV4, the average SNR is 1.5 with 5 of 11 observers with SNR not significantly >1. Because of the poor SNR of the 20 Hz responses, Figures 9 and 10 plot only the 15 Hz responses.
The finding that ventral regions mirror the task-specific enhancement of populations tilted 20° away from the target orientation is consistent with existing literature showing that neurons in the ventral stream (e.g., hV4) show robust effects of both spatial and featural attention (David et al., 2008). The nonspecific enhancement seen in area V3a for both tasks serves as a control, showing that there is cortical specificity in the pattern of attentional modulation and suggesting that this dorsal-stream area has a limited role in orientation discrimination.
The design of our task combines spatial and feature attention. The cue at the start of the trial indicates both the feature to be discriminated (orientation/contrast) and the location of the increment (left/right) with 100% validity. To measure neuronal modulation due to feature-based attention, we compared the difference in SSVEP response amplitude when observers performed orientation versus contrast discrimination as defined in Equations 2 and 3. Figure 10B shows the effect of feature attention both when observers attend toward the 15 Hz stimulus and attend away. When attention is directed toward the stimulus, it is clear that attention to orientation modulates populations tuned to tilted orientations in area V1 in a fundamentally different way than attention to contrast. There is little modulation when attention is directed away, which at first seems to be at odds with studies showing that enhancement due to feature attention occurs across the entire visual field (Treue and Martínez-Trujillo, 1999; Saenz et al., 2002; Cohen and Maunsell, 2011). A more careful examination of these studies shows that this global effect of feature attention is seen under two experimental scenarios with designs different from that used in our study. First, global effects of feature attention are observed when there is competition between feature values within a dimension, such as when one direction of motion has to be selected in a display with two overlapping motion directions, or when one color has to be selected from a display with two intermixed colors (Saenz et al., 2002; Zhang and Luck, 2009). Second, global effects of feature attention are also seen when the task has a spatial cue that is invalid, such as feature changes that occur at both the cued and uncued location as in Cohen and Maunsell (2011). As our task did not require us to select one feature value over the other, or to detect a feature change at an uncued location, it is not surprising that our feature attention effects are local and are absent at the spatially unattended location.
Discussion
Our study used psychophysics and high-density EEG combined with cortical source localization to examine the populations that mediate orientation and contrast discrimination. In a psychophysical task, we showed that perceptual discrimination of fine orientation is mediated by the most informative populations: those with preferred orientations tilted away from the target orientation (Seung and Sompolinsky, 1993, Itti et al., 2000, Jazayeri and Movshon, 2006). These results replicate psychophysical and single unit studies showing that off-channel populations contribute preferentially to the fine discrimination of orientation and motion direction (Regan and Beverley, 1985; Vogels and Orban, 1990; Waugh et al., 1993; Hol and Treue, 2001; Schoups et al., 2001, Baldassi and Verghese, 2005; Purushothaman and Bradley, 2005; Jazayeri and Movshon, 2007; Scolari and Serences, 2009). For contrast discrimination, our psychophysical data implicate populations with a broad range of preferred orientations centered around the target orientation.
The SSVEP study allowed us to determine whether these same populations were modulated when observers attended to either the orientation or to the contrast of a physically identical stimulus. Our cortical source data show a remarkable parallel to the psychophysical results. Depending on the task, attention selectively enhances the populations that are most informative, and importantly, these are the same populations selected for read out in the psychophysical task (Purushothaman and Bradley, 2005; Jazayeri and Movshon, 2007).
Measuring human population responses
Measuring functionally resolved neural population responses in humans is technically challenging. Noninvasive neuroimaging techniques such as positron emission tomography and fMRI provide only an indirect measurement of neural activity via its effects on local metabolism and blood flow (Logothetis, 2008). The link between neural activity and the neuroimaging signal is particularly critical when studying attention because attentionally driven hemodynamic changes may not reflect instantaneous changes in local neural activity in a linear manner (Buracas and Boynton, 2007; Maier et al., 2008; Sirotin and Das, 2009; Kleinschmidt and Müller, 2010; Bouvier and Engel, 2011). More direct measurements of neural activity can be obtained from electromagnetic signals, such as those from electroencephalography. However it is difficult to relate these measurements to activity in individual cortical regions because the spatial relationship between the far-field signal and the current density in cortex is complex.
Here, we used a combination of neuroimaging techniques to measure task-modulated neural activity in human V1. We used anatomical MRI and fMRI to create an electrical model of each subject's head and identify functional visual areas within each cortex. We then used source-imaged EEG to compute the mean stimulus-driven electrical activity within striate cortex. Finally, we used frequency tagging to separate neural modulations driven by two spatially separated stimuli and to dissociate the effect of attention to location from attention to features in two tasks: contrast discrimination and orientation discrimination.
V1 is modulated by feature attention
While others have shown the effect of feature attention in human V1 (Liu et al., 2007; Jehee et al., 2011), the data presented here are the first measurements of how attention to different features of the same stimulus modulates different populations in the V1 cluster, depending on the attended feature. They are consistent with previous single-cell studies in nonhuman primates demonstrating that V1 neurons underlie orientation discrimination (Vogels and Orban, 1990, Schoups et al., 2001) and, remarkably, show that attention can select and amplify the responses of a neural population that is most sensitive to the orientation difference to be discriminated in precisely the way predicted by psychophysics. A very recent study using fMRI (Scolari et al., 2012) also shows that neural populations in human V1 are modulated differently, depending on whether the task is orientation or contrast discrimination. In that study, the orientation and contrast discrimination tasks were done in separate blocks and the task was to determine whether two stimuli matched in orientation or contrast.
As the responses in our study are recorded from identical stimuli (increment-absent trials in the contrast and orientation discrimination task), the differential activity across populations depending on the task cue must be driven, at least initially, by top-down mechanisms, consistent with the predicted role of task instruction on early visual responses (Yu and Dayan, 2004; Tsotsos, 2011). Our finding that attention can act very early in the visual pathway is also consistent with recent fMRI studies showing robust attentional modulation as early as the LGN as well as SSVEP studies showing that signals in opponent chromatic pathways respond differently (or not at all) to attention compared with signals in achromatic pathways (Schneider and Kastner, 2009; Di Russo and Spinelli, 1999a,b; Di Russo et al., 2001; Wang and Wade, 2011), a sign that modulation occurs before these signals are combined two synapses downstream in the primary visual cortex (Sincich and Horton, 2005).
Resolution of cortical source localization
We deliberately chose cortical areas dominated by distinct foveal clusters (Wandell et al., 2005) and >2 cm apart in Euclidian space to minimize cross talk. The L2 minimum norm reconstruction approach we used in our study is related to similar techniques that have localization errors of <10 mm for the EEG (Baillet et al., 2001; Bai et al., 2007). Thus the resolution of the inverse should be sufficient to resolve responses in the areas we have chosen. The fact that the relative activation across area depends on both the task and the specific cortical area strengthens our claim that we have identified independent sources. While the technique of source imaging combined with high-density EEG and individual head models and retinotopic ROIs is relatively new (Appelbaum et al., 2006), several recent studies have shown that it can be used to measure robust effects of both attention and center–surround interactions (Lauritzen et al., 2010; Xiao and Wade, 2010; Wang and Wade, 2011). In addition, recent work from our group has shown that cross talk between visual area clusters is not a significant source of error in these types of measurement (Lauritzen et al., 2010; Cottereau et al., 2011). The magnitude of worst-case cross talk between neighboring visual areas within the same cluster (e.g., between V2 and V3) is generally larger (Lauritzen et al., 2010), but, especially in the case of the V1 visual cluster, cortical current densities extracted from retinotopic area V1 are still dominated by the response of neurons in striate cortex. Therefore, we believe it is reasonable to suggest that the attentional modulation that we measure originates at the start of the cortical visual processing hierarchy.
Conclusion
High-density EEG combined with cortical source localization shows that attention modulates the neural population response at the earliest cortical levels differentially depending on the task. Furthermore, the pattern of modulation of population responses mirrors behavioral sensitivity in these two tasks, and follows the prediction for optimal weighting of neural populations for fine discrimination of orientation and contrast differences.
Footnotes
This work was supported by National Science Foundation (NSF) Grant BCS-0963914 to P.V., and by National Institutes of Health Grant R01-EY018157-02 and NSF Grant BCS-0719973 to A.R.W.
The authors declare no competing financial interests.
- Correspondence should be addressed to Preeti Verghese, PhD, Smith Kettlewell Eye Research Institute, 2318 Fillmore Street, San Francisco, CA 94115. preeti{at}ski.org