Abstract
Previous studies of feature-selective attention have focused on situations in which attention is directed to one of two spatially superimposed stimuli of equal salience. While such overlapping stimuli should maximize stimulus interactions, it is still unknown how bottom-up biases favoring one or the other stimulus influence the efficiency of feature-selective attention. We examined the integration of bottom-up contrast and top-down feature-selection biases on stimulus processing. Two fully overlapping random dot kinematograms (RDKs) of light and dark dots were presented on a gray background of intermediate luminance. On each trial, human participants attended one RDK to detect brief coherent motion targets, while ignoring any events in the unattended RDK. Concurrently, through changes in background luminance, stimulus contrast could be set to five different levels: the stimuli could either be equal, or one of the two stimuli could have twice or four times the contrast of the other stimulus. This manipulation introduced a bottom-up bias toward the stimulus with the higher contrast while keeping the difference between the stimuli constant. Stimulus processing was measured by means of steady-state visual evoked potentials (SSVEPs). SSVEP amplitudes generally increased with higher contrast of the driving stimulus. At earlier levels of processing, attention increased the slope of this linear relation, i.e., attention multiplicatively enhanced SSVEP amplitudes. However, at later levels of processing, attention had an additive effect. These effects of attention can be attributed to the differential integration of gain enhancement and inhibitory stimulus competition at different levels of the visual processing hierarchy.
Introduction
The visual system needs to be selective because the resources lying at its disposal are limited. Situations in which attention needs to select a faint stimulus from spatially close or even overlapping distracting stimuli of greater saliency pose a profound challenge, whereas the opposite case in which a very salient stimulus must be selected should only pose small demands on attention. Studies of feature-selective attention approximate such situations by commonly using closely intermingled stimuli that only differ in the feature of interest (Saenz et al., 2002; Andersen and Müller, 2010). The extent to which visual stimuli compete for neuronal representation has been shown to increase with spatial proximity (Kastner et al., 2001; Fuchs et al., 2008), hence such displays with superimposed stimuli should maximize the influence of stimulus interactions on attentional selection.
However, most research on feature-selective attention has refrained from directly manipulating stimulus properties that would bias processing toward one or the other stimulus. In fact, the vast majority of studies relied on superimposed stimuli equated for properties such as luminance, contrast and size. However, a stimulus that included systematic variations in contrast would provide results that would be highly relevant from a theoretical standpoint. There are two separate main accounts of the neuronal mechanisms of selective attention, but their interrelation remains unclear. The feature-similarity gain model describes effects of feature-selection but leaves out stimulus competition (Treue and Martinez-Trujillo, 1999; Martinez-Trujillo and Treue, 2004; Maunsell and Treue, 2006). On the other side, the biased competition model is concerned with competition between stimuli within the same receptive field but has mainly focused on spatial or object-based attention (Desimone and Duncan, 1995; Duncan et al., 1997; Reynolds et al., 1999). Hence, although feature attention is most prominent in situations that maximize stimulus competition, the two frameworks have mainly been considered separately in both experiment and theory (notable exceptions: Boynton, 2005; Reynolds and Heeger, 2009; Andersen and Müller, 2010).
To experimentally bridge the gap between the two accounts, we manipulated voluntary feature-selective attention to one of two superimposed random dot kinematograms (RDKs) that differed in both luminance polarity and luminance contrast. The dark dots flickered at 10 Hz and the light dots flickered at 12 Hz, thereby driving distinguishable steady-state visual evoked potentials (SSVEPs), an oscillatory brain response of the same frequency as the driving stimulus. SSVEP amplitudes are enhanced by attention (for review, see Andersen et al., 2011a), thus this technique of “frequency-tagging” allows one to concurrently assess the amount of processing resources allocated to each stimulus. Through changes in background luminance, processing was biased in favor of one or the other stimulus while keeping the brightness of the dots constant. A multiplicative enhancement by a constant factor across all levels of these bottom-up biases would be most consistent with the feature-similarity gain model. On the other hand, the biased competition account would predict that attention has the biggest relative effect when competition is strongest, which is the case for low contrast stimuli.
Materials and Methods
Participants.
The study included 16 participants (11 female, 3 left-handed, aged 19–34 years, mean age 24.4 years) with normal or corrected-to-normal vision. Individual written informed consent was obtained and the study conformed to the ethical guidelines of the University of Leipzig.
Stimuli and procedure.
Each trial started with the presentation of a gray fixation cross for 1000 ms. Subsequently, a stationary dot pattern consisting of 120 dark dots or 120 light dots was presented for 700 ms, indicating which dots to attend on that trial. After a 200–400 ms period with only the fixation cross on screen, participants were presented with two superimposed, flickering RDKs which stayed on the screen for 8500 ms. The stimulus was followed by a final fixation interval which lasted 500 ms (Fig. 1).
The 120 light dots (80 cd/m2) flickered at 10 Hz and the 120 dark dots (40 cd/m2) flickered at 12 Hz. The luminance of the background was manipulated in five steps (48, 54, 60, 66, and 72 cd/m2) to create experimental conditions that differed in the relative contrast of light and dark dots. Relative contrast was defined as the ratio of Weber contrasts of the two stimuli for each background luminance LB, where LS1 and LS2 are the luminances of the RDK stimuli. The resulting values for R were 0.25, 0.53, 1.00, 1.86 and 4.00 for both light and dark dots, i.e., the contrast of one stimulus could be one quarter, approximately half, equal, approximately two times or four times as high as the contrast of the other stimulus. This resulted in a total of 10 experimental conditions, as participants could attend light or dark dots at five different levels of contrast ratio (i.e., 2 attentional × 5 stimulus conditions). Trials from different conditions were presented in randomized order and the appropriate background luminance was presented throughout each trial, i.e., starting from 1000 ms before cue onset (Fig. 1B).
Although RDK contrast is usually expressed in terms of the SD of luminance of all the elements in the RDK area (Moulden et al., 1990), we based our contrast ratios on Weber contrasts of individual dots. This is due to the fact that in the context of our experiment, bright and dark dots operate as distinct features that are either attended or ignored and thus it is more meaningful to associate their respective SSVEPs to their relative saliency as opposed to associating them to the overall contrast of the RDK.
All dots moved in random, independent directions (0% coherence) except for brief intervals (400 ms) of 50% coherent motion which could go in any of the four cardinal directions (up, down, left or right). These intervals of coherent motion could occur randomly in either the attended (targets) or unattended (distractors) RDK. From 500 ms after RDK onset, a total of one to five targets and distractors could appear during each single trial and their onsets were separated by at least 700 ms. Participants were instructed to press a button whenever they detected a target while ignoring distractors. Responding hand was changed half way through each recording session. Responses occurring within an interval from 250 to 900 ms after onset of a target or distractor were counted as hits or false alarms, respectively.
The experiment consisted of 300 trials distributed over 10 blocks of 30 trials each. For each of the 10 experimental conditions, a total of 30 targets and 30 distractors were presented. Two or more training blocks of 15 trials each were performed to achieve stable performance before the start of the EEG recording.
The experiment was run on a 19-inch CRT monitor set to a resolution of 640 × 480 pixels and a refresh rate of 120 Hz. At a viewing distance of 80 cm, each RDK formed a circle with a diameter corresponding to 12.94° of visual angle. Each dot subtended 0.29° and changed its position in a random direction by 0.05° per frame of screen refresh. To prevent systematic overlapping of dark and light dots, which might induce a depth cue, the dots were drawn in random order. The presentation of stimuli and collection of responses was controlled in Matlab (The MathWorks) using Cogent Graphics (John Romaya, Laboratory of Neurobiology at the Wellcome Department of Imaging Neuroscience).
SSVEP recordings and data processing.
Participants were seated in a comfortable chair in an electrically shielded chamber. Brain electrical activity was recorded at a sampling rate of 256 Hz from 64 Ag/AgCl electrodes mounted in an elastic cap using an ActiveTwo amplifier (BioSemi). Lateral eye movements were monitored with a bipolar outer canthus montage (horizontal electrooculogram). Vertical eye movements and blinks were monitored with a bipolar montage positioned below and above the right eye (vertical electrooculogram).
EEG data were processed using the EEGLab toolbox (Delorme and Makeig, 2004) in combination with custom-made procedures in Matlab (The MathWorks). A period of 500 ms after stimulus onset was discarded to exclude the evoked response to stimulation onset and to allow the SSVEP sufficient time to build up. Eight epochs of 1000 ms duration were extracted from each 8.5 s stimulus train.
All epochs with target or distractor onsets occurring either within the epoch or later than 250 ms after onset of the previous epoch were excluded from the SSVEP analysis. This ensured that the analyzed data were not contaminated by activity related to coherent motion or manual responses and left a total of 160 one second epochs for each condition. All epochs were detrended (removal of mean and linear trends). Epochs with eye movements or blinks were rejected from further analysis, and all remaining artifacts were corrected or rejected by means of an automated procedure (SCADS (statistical correction of artifacts in dense array studies); Junghöfer et al., 2000). The total rejection rate was 9.2% of all epochs and did not differ between conditions. Subsequently all epochs within the same condition were averaged for each participant and subjected to a scalp current density (SCD) transformation (Pernier et al., 1988; Perrin et al., 1989). Compared with scalp potentials, SCDs are independent of the choice of reference, afford higher spatial resolution and show better correspondence with underlying cortical generators (Tenke and Kayser, 2005).
The SCD-transformed averaged 1000 ms epochs were Fourier-transformed and SSVEP amplitudes were quantified as the absolute value of the complex Fourier-coefficients at the two stimulation frequencies (10 and 12 Hz).
Behavioral data analysis.
To quantify participants' ability to discriminate coherent motion targets and distractors, observer sensitivity (d′) was calculated for each contrast condition and stimulus as the difference of the inverse Gaussian transformed hit and false alarm rates for that stimulus. Response bias C was calculated correspondingly. Reaction times, d′ and C were averaged over light and dark dots, since both types of dots showed analogous patterns of effects. The results were subjected to repeated-measures ANOVA with the factor contrast ratio that had five levels (0.25, 0.53, 1.00, 1.86 and 4.00) using the Greenhouse-Geisser correction for non-sphericity.
EEG data analysis.
SSVEP amplitudes were maximal over occipital and parietal electrodes for both stimulation frequencies (Fig. 2A). Based on an examination of amplitude and phase of SSVEPs at all electrodes, we defined three clusters of neighboring electrodes for further analysis: central occipital (O1, Oz, O2, Iz, POz), left parieto-occipital (P3, P5, P7, PO3, PO7) and right parieto-occipital (P4, P6, P8, PO4, PO8). The definition of these clusters was based on three criteria: high signal strength of all included electrodes (Fig. 2A), similar signal phase for all electrodes within a cluster and unequal signal phase between electrodes of neighboring clusters (Fig. 2B,C). SSVEP amplitudes were averaged over the five electrodes in each of these clusters, normalized to a mean of 1.0 (Andersen et al., 2011b) and subsequently collapsed across frequencies (i.e., dark and light dots) for conditions with equal contrast ratio. The resulting normalized amplitudes were subjected to a regression analysis over log contrast ratios for each participant and each attentional condition separately. The resulting values for slope and offset (offset corresponds to the amplitude for equal contrast (r = 1) since log 1 = 0) of the regression lines were used to calculate the intercept with zero on the horizontal contrast ratio axis. Slope and intercept were then compared statistically between attentional conditions (attended vs ignored) by Wilcoxon signed ranked tests.
Results
Behavioral data
Participants were able to discriminate targets from distractors rather well, as indicated by an average observer sensitivity d′ of 2.64 ± 0.13 (95% confidence interval for mean over all conditions and subjects). Higher contrast ratio of a stimulus led to responses that were more frequent (response bias C: F(4,15) = 10.066, p < 0.0005) and faster (reaction time: F(4,15) = 42.368, p < 10−10). A post hoc t test revealed that response bias C did not differ significantly between the two highest levels of contrast ratio (t(15) = 1.168, p > 0.1), indicating that increasing the contrast ratio >2 did not cause more responses to a stimulus. Observer sensitivity d′ showed a shallow V-shaped dependency upon contrast ratio (F(4,15) = 6.157, p < 0.001), with the lowest sensitivity at equal contrast (Fig. 3A). At equal contrast, discrimination of coherent motion targets from distractors could only be based on luminance polarity of the moving dots. Unequal contrast however would have caused the perceived strength of coherent motion to differ between targets and distracters (Banton and Levi, 1993). This would have facilitated their discrimination causing the observed pattern. Note that SSVEP amplitudes were calculated from epochs without coherent motion and thus are not directly affected by possible differences in perceived strength of coherent motion. An indirect effect is, however, still conceivable. If a possible difference in perceived strength of coherent motion made the equal contrast conditions unduly harder, then participants might have increased effort to make up for some of this effect. If this were the case, one would expect increased attention effects for contrast ratio 1. However, neither SSVEP amplitudes nor reaction times (Fig. 3A,B) show any signs of increased attentional deployment for contrast ratio 1 conditions. Thus, if at all present, any such effect would necessarily be of very minor magnitude.
SSVEP amplitudes
Figure 3B displays the regression for SSVEP amplitudes averaged over all subjects. The statistical analysis was done by fitting regression lines separately for attended and unattended conditions for each single subject and submitting the regression parameters (Fig. 3C) to a robust nonparametric Wilcoxon signed ranked test. SSVEP amplitudes increased linearly with higher log contrast ratio for both attended and unattended stimuli at all three electrode clusters (all t(15) = 0, all p < 0.0005). For the central occipital cluster, attention increased the median slope of this linear contrast-amplitude relation by 43% (t(15) = 7, p < 0.005). The intercept with the x-axis, which was extrapolated to a value of ∼1:12, did not differ with attention (t(15) = 55, p > 0.1). Hence, attention and contrast ratio multiplicatively modulated signal gain in early visual areas reflected in the central occipital cluster. This differed from the pattern of attention effects observed at the two lateral parieto-occipital clusters: here, the slope of the linear contrast-amplitude relation was unaffected by attention (left: t(15) = 61, right: t(15) = 49, both p > 0.1) while the offset was increased by attention (left: t(15) = 3, right: t(15) = 2, both p < 0.001). Hence attention additively enhanced SSVEP amplitudes at both lateral parieto-occipital clusters.
Note that the assignment of electrodes to clusters was based on the mean over all subjects. Due to topographical differences between subjects, this assignment was not optimal for some electrodes in some subjects. To test the robustness of our results, we repeated the above regression analysis excluding all electrodes at the borders of neighboring clusters (PO3/4, O1/2, POz). This analysis yielded virtually identical results to the main analysis.
Discussion
We examined the influence of bottom-up biases on the magnitude of feature-selective attentional enhancement of neural markers of stimulus processing. By changing the background luminance in five steps, we systematically varied the contrasts of superimposed light and dark dots and thereby created conditions of different or equal relative luminance contrast. This manipulation effectively modulated behavioral measures: stimuli with higher relative contrast elicited responses that were faster and more frequent. Selective stimulus processing, as assessed by SSVEP amplitudes, was multiplicatively enhanced by attention at central occipital electrodes, while the effect at more lateral parieto-occipital sites was additive. Previous source localizations of SSVEPs and SSVEP attention effects (Di Russo et al., 2007; Andersen and Müller, 2010) found the early visual areas V1–V3 and motion-sensitive MT to be the main generators of scalp-recorded SSVEPs. These visual areas lie directly beneath the central occipital and lateral parieto-occipital electrode clusters, respectively.
Our pattern of results is best explained by an attentional gain mechanism working together with competitive inhibition between stimuli. A pure gain mechanism, as assumed by the feature-similarity gain model, should lead to multiplicative attentional enhancement, as we observed for SSVEP amplitudes at central occipital electrodes. On the other side, according to the biased competition model, attention biases the competition between stimuli (Reynolds et al., 1999), leading to the largest attentional effects when competition is highest. Correspondingly, attention effects are larger in single cells when two stimuli are presented within the cell's receptive field (Luck et al., 1997; Reynolds et al., 1999) or in later visual areas in population measures of activity (Kastner et al., 1998). The later effect is due to the fact that larger receptive fields in these areas lead to a larger proportion of neurons with multiple stimuli in their receptive fields. This is consistent with the pattern of effects observed at lateral parieto-occipital electrodes in the present experiment, where attention had an equally large effect for all levels of contrast ratio. SSVEP amplitudes increased linearly with log contrast ratio, hence the largest relative enhancement indeed occurred for low contrast stimuli which were subject to strong competition from superimposed high contrast stimuli.
A previous study investigating cued shifts of feature-selective attention using a very similar stimulus display found that attentional selection was the result of both enhancement of the attended as well as suppression of the unattended stimulus (Andersen and Müller, 2010). Enhancement preceded suppression by 130 ms, consistent with the idea that gain enhancement of the attended stimulus biases competition between the overlapping stimuli, which in turn leads to suppression of the unattended stimulus. The long delay between the onsets of enhancement and suppression led to the suggestion that suppression observed in the early visual areas (∼V1–V3) resulted mainly from feedback from downstream visual areas with larger receptive fields such as V4 (∼4°) or MT (∼8°; Yoshor et al., 2007). If the results of the present experiment are considered in this light, one might assume that competition between attended and unattended dots occurs in particular in those visual areas reflected in the lateral parieto-occipital electrodes. The output of this competition would then be fed back to the early areas reflected in the occipital cluster, leading to the observed qualitative differences in attention effects. Evidence for such a backward progression of attentional effects was found in a recent study, which reported earlier and larger effects of attention on neuronal firing rates in V4 than in V1, with intermediate values for V2 (Buffalo et al., 2010). Although our putative explanation is consistent with previous observations (Andersen and Müller, 2010; Buffalo et al., 2010), it needs to be tested more directly in future studies.
We observed proportional scaling of stimulus processing with attention at central occipital electrodes. By comparison the pattern observed at lateral parieto-occipital sites (Fig. 3B) presents a stronger modulation for low contrast ratios and a weaker modulation for higher contrast ratios, which better reflects the behavioral demands of the task. Note that, despite this, top-down attention had a smaller effect on SSVEP amplitudes than the contrast ratio manipulation at all three electrode clusters. Hence, the attended stimulus could neurally “lose” competition at both earlier and later stages of the visual processing hierarchy although it still effectively controlled behavioral responses. This is inconsistent with the “integrated competition hypothesis,” which assumes that the same stimulus wins competition across all levels of the processing hierarchy (Duncan et al., 1997). Attentional modulation can be insufficient to make a faint stimulus dominate visual processing, but the stimulus can still control behavioral responses.
The present study differs from previous studies in several ways. First, compared with earlier studies of feature-selective attention (Treue and Martinez-Trujillo, 1999; Saenz et al., 2002; Martinez-Trujillo and Treue, 2004; Andersen and Müller, 2010; Andersen et al., 2011b), we systematically manipulated stimulus contrast to investigate the interplay of top-down and bottom-up biases on stimulus processing. Second, compared with previous studies on competitive stimulus interactions (Kastner et al., 1998, 2001; Reynolds et al., 1999), we manipulated feature rather than spatial attention and through the use of frequency-tagging, we were able to concurrently assess the allocation of processing resources to both the attended and the unattended stimulus. In sum, our approach thus signifies an important step toward understanding how the mechanisms described by the feature-similarity gain and biased competition models work together to achieve stimulus selection. However, an important limitation of the present study is that while it allows us to concurrently assess attentional modulation of both stimuli, it does not allow us to separate how much of that modulation is due to enhancement when the stimulus is attended or suppression when the same stimulus is unattended. This important question remains to be answered in the future and might also be instrumental to test the consistency of our results with recent normalization models of attention (Lee and Maunsell, 2009; Reynolds and Heeger, 2009).
In conclusion, we manipulated background luminance to bias the processing of two overlapping stimuli toward one or the other which has allowed us to assess the influence of such bottom-up biases on top-down feature-selective attention. The feature-similarity and biased competition models, which are both mainly based on single cell recordings of monkeys, make opposing predictions on which physical conditions should show the biggest effects of attention. Consistent with a gain enhancement mechanism, as assumed by the feature-similarity gain model, we found multiplicative enhancement of stimulus processing with attention at early levels of the processing hierarchy, while a later stage of processing revealed an additive enhancement more consistent with biased competition. This result suggests that feature-selection is not the product of a unitary mechanism but results from the interaction of gain enhancement and competitive stimulus interactions.
Footnotes
This work was supported by Deutsche Forschungsgemeinschaft (AN 841/1-1, MU 972/11-2). We thank Renate Zahn and Christopher Gundlach for help with data collection, Andreas Widmann for making available routines for SCD transformation, and Jürgen Kayser for helpful suggestions on the application of SCDs.
- Correspondence should be addressed to Søren K. Andersen, Department of Neurosciences, University of California at San Diego, 9500 Gilman Drive #0608, La Jolla, CA 92093. soren{at}sdepl.ucsd.edu