Abstract
When a target stimulus is embedded in a high contrast surround, the target appears reduced in contrast and is harder to detect, and neural responses in visual cortex are suppressed. We used functional magnetic resonance imaging (fMRI) and psychophysics to quantitatively compare these physiological and perceptual effects. Observers performed a contrast discrimination task on a contrast-reversing sinusoidal target grating. The target was either presented in isolation or embedded in a high-contrast surround. While observers performed the task, we also measured fMRI responses as a function of target contrast, both with and without a surround. We found that the surround substantially increased the psychophysical thresholds while reducing fMRI responses. The two data sets were compared, on the basis of the assumption that a fixed response difference is required for correct discrimination, and we found that the psychophysics accounted for 96.5% of the variance in the measured V1 responses. The suppression in visual areas V2 and V3 was stronger, too strong to agree with psychophysics. The good quantitative agreement between psychophysical thresholds and V1 responses suggests V1 as a plausible candidate for mediating surround masking.
Introduction
Surround suppression has been studied extensively both in physiology and psychophysics. Physiologists typically measure the response to a target placed in the receptive field of a neuron, and then test how the response is modulated by the presence of high-contrast stimuli placed outside the receptive field of the neuron. The general finding for neurons in primary visual cortex (V1) is that responses are reduced in the presence of the surrounding stimuli (Hubel and Wiesel, 1968; Maffei and Fiorentini, 1976; Gulyas et al., 1987; Orban et al., 1987; Knierim and van Essen, 1992; DeAngelis et al., 1994; Kastner et al., 1997; Levitt and Lund, 1997; Sengpiel et al., 1998; Cavanaugh et al., 2002a,b). The effect is orientation specific, i.e., when the stimulus in the surround has a different orientation from the target, the suppressive effect is reduced (Knierim and van Essen, 1992; DeAngelis et al., 1994; Kastner et al., 1997; Cavanaugh et al., 2002b), or even reversed (Sillito et al., 1995). Single-unit electrophysiology of surround suppression has been studied most extensively in V1 (above references), but effects that are perhaps analogous have been reported in some extrastriate areas as well (Allman et al., 1985; Desimone et al., 1985; Tanaka et al., 1987; Schein and Desimone, 1990; Raiguel et al., 1995; Xiao et al., 1995; Xiao et al., 1997; Bradley and Andersen, 1998).
Inhibition between stimuli presented in neighboring visual field locations has also been demonstrated using functional magnetic resonance imaging (fMRI) (Kastner et al., 1998) and magnetoencephalography (Ohtani et al., 2002). The lateral extent of these inhibitory effects scales with the receptive field sizes in different visual cortical areas (Kastner et al., 2001). Other fMRI studies have shown that responses are stronger when different elements in the visual field have different orientations, compared with when all elements are aligned (Karni et al., 1999), consistent with the orientation specificity of inhibition observed electrophysiologically.
Perceptually, one finds that the contrast of a given pattern appears weaker when it is surrounded by a high-contrast pattern (Chubb et al., 1989; Cannon and Fullenkamp, 1991; Snowden and Hammett, 1998; Xing and Heeger, 2000). In addition, contrast detection of a target is often impaired when high-contrast masks are placed in its vicinity (Polat and Sagi, 1993; Wilkinson et al., 1997; Zenger-Landolt and Koch, 2001). Both these effects are strongest for iso-oriented surrounds, and decrease with increasing orientation difference between target and surround; thus, these effects likely result from orientation-specific inhibitory interactions (Cannon and Fullenkamp, 1991; Snowden and Hammett, 1998; Xing and Heeger, 2001).
The apparent similarity between the physiological and behavioral effects provides circumstantial evidence linking the two. To establish a tight link between physiology and behavior, we have studied surround suppression for the same stimulus conditions and task using both fMRI and psychophysics. This approach has been used previously to show that simple contrast discrimination performance is consistent with the responses in visual cortex obtained with fMRI (Boynton et al., 1999). Here, we show that the psychophysical surround effects on contrast discrimination of a target can be quantitatively accounted for by response suppression in V1.
Materials and Methods
Observers and experimental sessions. One male and two female observers participated in the experiment. All had corrected to normal vision and were practiced psychophysical observers. Each observer participated in 10 fMRI sessions. One fMRI session was conducted to define retinotopic areas, and another session was dedicated to defining the cortical representation of the stimulus annulus. Six sessions were devoted to measuring the contrast response functions with and without surround, and two sessions were devoted to the control experiment. Each observer performed additional psychophysics outside the scanner, at least three sessions, with surround and without surround.
Apparatus and experimental setup. MRI was performed using a 3T General Electric scanner with a custom-designed dual surface coil. Stimuli were presented on a flat panel monitor (NEC, Itasca, IL; multisynch LCD 2000; size, 20 inches; resolution, 480 × 640) placed within a Faraday box with a conducting glass front, positioned near the subjects' feet. Subjects lay on their backs in the scanner and viewed the display through binoculars. The virtual distance of the display, when viewed through the binoculars, was 51 cm. The subjects' head position was stabilized by a bite bar. Observers indicated their responses in the psychophysical task via a MRI-compatible keypad (Resonance Technologies, Northridge, CA).
Subjects viewed the stimuli while time series of MRI volumes were acquired (every 1.5 sec) using a T2*-sensitive, spiral-trajectory, gradient-echo pulse sequence (Glover and Lai, 1998; Glover, 1999): echo time (TE), 30 msec; repetition time (TR), 750 msec (two interleaves); flip angle, 55°; field of view (FOV), 220 mm; effective inplane pixel size, 3.2 × 3.2 mm; 4 mm slice thickness; 12 slices. Slices had an oblique orientation perpendicular to the calcarine sulcus with the most caudal slice tangent to the occipital pole. The slices covered most of the occipital lobe.
Each scanning session began by acquiring a set of anatomical images using a T1-weighted SPGR pulse sequence in the same slices as the functional images (FOV, 220 mm; TR, 68 msec; TE, 15 msec; echo train length, 2). These inplane anatomical images were aligned to a high-resolution anatomical volume of each subject's brain so that all MR images (across multiple scanning sessions) from a given subject were coregistered with an accuracy of ∼1 mm (Nestares and Heeger, 2000).
Additional psychophysical data were collected in separate sessions, outside the scanner. Viewing conditions were closely matched to those in the scanner: stimuli were displayed on a flat-panel monitor of identical make, and observers viewed the display from the same distance of 51 cm in an otherwise dark environment. Psychophysical thresholds tended to be slightly higher in the scanner than in the psychophysics room (by factor of 1.08 on average; p = 0.15). The absence of feedback, the mixing of different pedestal contrasts within the scan, distracting scanner noise, and some general discomfort while lying in the scanner may all have contributed to this small difference.
Stimulus and task. The stimulus was a contrast-reversing (4 Hz), sinusoidal grating (1.1 cycles/degree), presented for 750 msec (Fig. 1). Within this grating, we defined an annular target region, which extended from 4.5 to 7.8° eccentricity, and a surround region, which covered the remaining region within a 16.4° circle (i.e., the surround included the areas both inside and outside of the target annulus). We chose an annulus rather than a central disk of eccentricities because it is difficult to identify the boundaries between the cortical visual areas corresponding to the very center of the visual field. The target annulus was further divided into eight segments. The eight segments of the target and the surround regions were separated by antialiased black lines (Fig. 1). Observers fixated a high-contrast square at the center of the display while attending (without moving their eyes) to the eight segments of the target annulus. The observers' task was to determine whether the contrast of one segment was lower than the contrast of the other seven segments, or whether they all had the same contrast. Observers practiced the task in a series of practice sessions until they reached asymptotic performance levels.
Procedure. Contrast discrimination thresholds were estimated using a staircase procedure. The contrast of the low-contrast segment was fixed, and the contrast of the other seven (or all eight segments when no low-contrast segment was present) was adjusted so that observers could do the task with a 79% accuracy. The contrast difference was increased after every incorrect response and decreased after three correct responses (Levitt, 1971).
In the purely psychophysical experiment, each session consisted of 13 blocks, in which different pedestal contrasts were tested. Each block consisted of 60 trials, and the geometric mean of the reversal contrasts served as the threshold estimate. Auditory feedback (correct/incorrect) was provided.
In the scanner, conditions varied according to a block-alternation design, with a block duration of 9 sec. Each functional MRI scan contained 14 block alternations and lasted 4.2 min. No auditory feedback was provided. In the main experiment, each block contained five trials, consisting of a 750 msec stimulus display, followed by a 1050 msec response period. For one observer (BZL), the response period was reduced to 750 msec, and each block, thus, contained six instead of five trials. The surround contrast was the same in all trials of any given scan, but the pedestal contrast was varied systematically (Fig. 2). The stimuli in block A always had a pedestal contrast of 0%, whereas the pedestal contrast of block B varied between scans and was 10, 20, 40, or 80%.
In the control experiment, each trial lasted 2.25 sec, i.e., each of the 9 sec blocks contained four trials. Each trial consisted of two stimulus intervals, only the first of which was task relevant. Both intervals lasted 750 msec, with a 375 msec interstimulus interval (Fig. 3). The target stimulus always appeared in the first interval, with pedestal contrast set to 0% in block A and 60% in block B. There were three conditions, which differed with respect to the surround presentation. In the simultaneous-surround condition, a surround of 100% contrast was presented during the first interval (together with the target), whereas in the lagging-surround condition, the surround was presented in the second interval (lagging behind the target stimulus). In the no-surround condition, the surround was not shown in either interval.
In all of these experiments, three design features were introduced specifically to control attention. First, we dynamically adjusted task difficulty by a staircase procedure, thus the trials in different conditions (different surround contrast, different pedestal contrast) were all at the same level of difficulty. Second, observers did not know in which of the eight target segments the low contrast grating would appear and were, thus, forced to attend the whole annulus region. This ensured that attention would be spatially homogeneous across the annular region of interest. Finally, we chose contrast decrement detection as a task (rather than contrast increment detection) because it has been shown that attention is necessary for detecting decrements, but not for detecting increments (Braun, 1994). In other words, we chose a task in which a lapse of attention would negatively affect performance, thus forcing (or at least encouraging) observers to pay close attention in all conditions, and we equated all of the stimulus conditions in terms of performance accuracy as a proxy for controlling attention.
Data analysis. Data from the first cycle of block alternation was discarded to allow the hemodynamic response to reach steady state and to allow subjects to practice the task. The fMRI time series were preprocessed by: (1) high-pass filtering the time series at each voxel to compensate for the slow signal drift typical in fMRI signals (Smith et al., 1999); and (2) dividing the time series of each voxel by its mean intensity. The resulting time series were averaged across the gray matter that corresponded to the V1 (likewise V2 or V3) representation of the target annulus (see below for how we defined these gray matter regions).
We then fit a sine wave to the mean time series, the frequency of which was determined by the block-alternation frequency and the phase of which was determined by separate reference scan measurements. The amplitude of this sine wave served as an estimate for the magnitude of response modulation in each scan. This response amplitude was positive when the blood oxygenation level-dependent (BOLD) signal evoked during block B (with the higher target contrast) was larger than that during block A (with target pedestal contrast of 0%). The response amplitudes were averaged across the six repeated scans (from separate scanning sessions) for each observer. To compensate for the increased trial number per block of observer BZL in the main experiment (six instead of five stimulus presentations per block, corresponding to a factor of 1.2 higher duty cycle), we rescaled her data by dividing her fMRI responses by 1.2.
We computed a suppression index to qualitatively compare the fMRI responses across the three visual areas. The suppression index was computed by expressing the mean response (averaged across all contrast levels) in the presence of the surround as a percentage of the mean response without surround.
Defining the visual areas. Retinotopically defined visual areas (V1, V2, V3) were defined by measuring the polar angle component of the cortical retinotopic map (Engel et al., 1994; Sereno et al., 1995; DeYoe et al., 1996; Engel et al., 1997). To visualize the retinotopic maps, we rendered the fMRI data on a computationally flattened representation (flat map) of each subject's brain using software developed at Stanford University (Teo et al., 1997; Wandell et al., 2000).
We used a block-alternation design to localize the subregion of each visual area that responded to the target annulus. In block A, a checkerboard flickered in the target annulus, whereas in block B, the checkerboard flickered everywhere else (Fig. 4). Prolonged presentation of the flickering surround can sometimes lead to perceptual filling-in, no longer rendering the empty annulus perceptually distinct from the surround. To avoid this, we interrupted stimulus presentation every 3 sec with a 500 msec blank stimulus (Fig. 4). fMRI time series were preprocessed (see above) and averaged across nine or 10 repeated scans. We then fit a sine wave of the block-alternation frequency to the data and computed the correlation between the sine wave and the time series. If the correlation exceeded our criterion (r > 0.6) and if the sine wave was in phase with the annulus presentation (taking hemodynamic delay into account), the voxel was included in our region of interest. The sizes of the resulting visual area subregions are listed in Table 1. Using a correlation threshold of 0.4 instead of 0.6 yielded comparable results.
Modeling. As is common in sensory psychophysics (Nachmias and Sansbury, 1974), we assumed that observers can discriminate patterns of different contrasts when neural responses differ by a certain fixed amount. For a monotonically increasing contrast-response function r(x), threshold xth at the pedestal contrast xo is defined by r(xth) = r(xo) + 1. A frequently used contrast-response function to account for contrast discrimination thresholds (adapted from Foley, 1994) is: 1 However, this simple description fails to account for the behavior at higher pedestal contrasts in the surround condition, in which thresholds increase less than predicted with increasing contrast, or even decrease somewhat (Zenger-Landolt and Koch, 2001). To accommodate this behavior, we defined x as a function of contrast c: 2 The best fitting parameters were estimated by a multidimensional simplex algorithm (Press et al., 1992). We used a relatively large number of free parameters (seven per curve) to allow for a good and unbiased fit of the psychophysical data. If we had chosen a simpler model (for example, Eq. 1), we would have obtained systematic errors in the fit of the psychophysical data, presumably leading to systematic errors in our prediction for the fMRI data. The seven parameters were fit to only the psychophysical data; they were not adjusted further to improve the correspondence between the psychophysics and fMRI (which depended on only one free parameter; see below). Furthermore, the inferred contrast-response functions depended only on the shape of the curve fitted through the psychophysical data and not on the parameterization that was used to describe this curve. Any fit that has a similar shape, irrespective of how many parameters are used to describe the curve, would lead to a very similar conclusion.
To predict fMRI data from the psychophysical data, we first calculated the mean displayed target contrast (pedestal + increment) during the fMRI experiments for each of the 10 conditions (pedestal contrasts of 0, 10, 20, 40, and 80%; with and without surround). Each of these values was then entered in Equations 1 and 2 to compute r(c), using the parameters estimated from the psychophysical data. To predict the fMRI signal modulation in the block-design experiment, we subtracted the responses predicted for block A from those predicted for block B. A single free parameter was then estimated to fit the psychophysical data to the fMRI data; specifically, a scale factor specifying the fMRI response amplitude that corresponds to psychophysical threshold. This parameter was estimated separately for each visual area. In V1, this scale factor was found to be: 1 just-noticeable difference (JND) = 0.047% BOLD. To evaluate how well the psychophysical data predicted the fMRI data, we computed the correlation between the (psychophysics-based) prediction and the actual fMRI data.
In this analysis, we assumed that the psychophysically inferred responses, the neural firing rates, and the fMRI responses were proportional to one another. We consider the implications of these assumed proportionalities in turn, beginning with the relationship between psychophysics and neural activity. We assumed that a fixed response increment corresponded to a fixed level of performance accuracy. Because perceptual discriminability depends on the signal-to-noise ratio, this is equivalent to assuming that performance is limited by additive noise. The noise in individual cortical neurons does not conform to this assumption because noise variance has been reported to increase with mean firing rates (Dean, 1981; Softky and Koch, 1993; Geisler and Albrecht, 1997; Shadlen and Newsome, 1998). In contrast, computational models suggest that performance does not simply reflect the noise in single sensory neurons; signals are pooled across many weakly correlated neurons so that only the correlated component of the noise survives and successive stages of processing contribute additional noise (Shadlen et al., 1996). Additional research is required to identify and characterize the different noise sources and their interactions. In the meantime, there is no compelling rationale for rejecting the additive noise assumption.
Next, we consider the assumed proportionality between neural firing rates and the measured fMRI responses. The central assumption-guiding inferences about neural activity from fMRI data has been that the fMRI signal is approximately proportional to a measure of local average firing rate, averaged over a spatial extent of several millimeters and over a time period of several seconds (Boynton et al., 1996; Heeger et al., 2000; Rees et al., 2000). Although it is known that the fMRI signal is triggered by oxygen depletion because of metabolic demands of increased neural activity, the details of this process are only partially understood (Heeger and Ress, 2002). Accumulating evidence suggests that the fMRI signal may not be directly tied to the spiking activity that is typically measured with single-unit electrophysiology. It is widely believed that increased blood flow follows from increased synaptic activity, not average spiking activity (Fox et al., 1988; Magistretti and Pellerin, 1999; Mathiesen et al., 2000; Logothetis et al., 2001). The interpretation of fMRI data depends crucially, therefore, on the extent to which the output from a cortical area might be decoupled from the intracortical activity within that area. In our experiments, we have largely circumvented these concerns by using visual contrast as our primary independent variable. In early visual areas, the input firing rates, intracortical activity, and output firing rates all increase monotonically with stimulus contrast. Hence, the synaptic activity and multi-unit firing rates should be highly correlated with one another and with the fMRI responses. In this context, it seems worthwhile pointing out that although surround suppression is known to reduce firing rates it may well lead to an increase in inhibitory synaptic activity. In our study, this increase in inhibition led to a clear reduction in fMRI responses (see Results).
Results
Psychophysical contrast discrimination thresholds
We measured contrast discrimination thresholds for three observers. Observers viewed contrast-reversing gratings consisting of an annular target region embedded in a surround region (Fig. 1). Their task was to decide whether the contrast in one of the eight target segments had a lower contrast than the other seven target segments, or whether all eight segments had the same contrast. This task forced observers to pay attention to the whole target annulus (see Materials and Methods). The low-contrast segment had a contrast c, and the high-contrast segments had a contrast of c+Δc. The base level contrast c is commonly referred to as pedestal contrast and was fixed across the different trials in a block. Contrast discrimination thresholds (the smallest increment Δc that observers can reliably detect) were measured for a series of pedestal contrasts c. Surround contrast was either 0 or 100%.
In the absence of a surround, contrast discrimination data follow a dipper function (Figs. 5, 6A, filled symbols). This classical finding (Nachmias and Sansbury, 1974; Legge and Foley, 1980; Wilson, 1980) means that our ability to discriminate contrast is best around a non-zero contrast value, which is typically close to the detection threshold. The presence of the surround impaired contrast discrimination performance, especially at low pedestal contrasts (Figs. 5, 6A open symbols). Consistent with previous reports (Zenger-Landolt and Koch, 2001), the threshold elevation induced by the surround became smaller at higher pedestal contrasts, so that there was little or no difference in the thresholds at the highest pedestal contrasts.
We used the psychophysical data to infer nonlinear contrast response functions, assuming that a fixed response difference (Fig. 6B, 1 JND) is required for correct discrimination (Nachmias and Sansbury, 1974). At the steep part of the contrast response function, only a small contrast difference suffices to produce the required response difference, and, therefore, thresholds are small. Larger contrast differences, however, are required at the more shallow regions of the contrast response function, and thresholds are, thus, comparatively large. The fit of the psychophysical data was achieved by simple curve-fitting (see Materials and Methods for details), although we point out that the inferred contrast-response functions depended only on the shape of the curve fitted through the psychophysical data and not on the parameterization that was used to describe this curve. Because the psychophysical data were comparable across the three observers (Fig. 5), we used the average psychophysical thresholds (Fig. 6A) to compute the predicted contrast response functions (Fig. 6B).
fMRI responses
The fMRI experiment was designed to measure contrast response functions for the target stimulus, both in the presence and in the absence of a surround. The same three observers performed the psychophysical task while lying in the scanner. As one would expect, responses increased with increasing target contrast, both with and without the surround (Figs. 7, 8). Responses were suppressed by the presence of the surround (Figs. 7, 8, compare dark bars with light bars).
The suppression was progressively stronger in V2 and V3 than in V1 (Fig. 8), consistent with previous reports (Kastner et al., 2001). We computed a suppression index to compare the suppression across the three visual areas (see Materials and Methods). The suppression index was 51% in V1, meaning that the responses in the presence of the surround were, on average, about half as large as they were without the surround. The suppression index was 25% in V2, meaning that the responses in the presence of the surround were about ¼ as large as they were without the surround, and the index took on a value of minus 1% in V3, meaning that there was no significant response to the target in the presence of the surround.
Comparing the psychophysics and fMRI responses
We found a good agreement between the psychophysical data and fMRI data in V1 (Fig. 9). Only one free parameter was adjusted to achieve the fit. This free parameter is the scaling factor that relates BOLD signal changes to the inferred psychophysical response. In performing this comparison, we assumed: (1) that observers based their psychophysical judgments on the pooled activity across the entire V1 representation of the target annulus, and (2) that a fixed response difference is required to achieve the criterion level (79% correct) of behavioral performance accuracy (see Materials and Methods). The prediction from psychophysics accounted for 96.5% of the variance in the measured fMRI responses.
The agreement between the psychophysics and fMRI data were not as good in V2, or in V3. The prediction from the psychophysics accounted for only 78.9% of the variance in the V2 responses and for only 62.6% of the variance in the V3 responses. The reason for the mismatch is that there was too much suppression from the surround in V2 and V3. The psychophysics predicted a suppression index of 52%, which was nearly identical to the value of 51% observed in V1, as compared with the much smaller values (see above) in V2 and V3.
In pilot experiments, we used a contrast matching protocol to measure the apparent contrast of the target with and without surround (Chubb et al., 1989; Cannon and Fullenkamp, 1991; Snowden and Hammett, 1998; Xing and Heeger, 2000). The observed suppression in apparent contrast because of the surround did not match the suppression inferred from contrast discrimination or the suppression in the fMRI responses. However, the mismatch could have been a result of confounds in the contrast matching experimental protocol. For example, although we asked observers to judge the contrast of the whole annulus, the task did not really enforce an even distribution of attention, unlike the contrast discrimination task. Perhaps related to this, observers often reported that contrasts in the different segments appeared quite different, making it difficult for them to render consistent judgments. Therefore, more careful experiments will be required to clarify whether apparent contrast is correlated with V1 activity or not.
Hemodynamic control
Whereas we suggest that the observed response decrease in the presence of the surround reflects a suppression of neural responses, we also considered an alternative scenario in which the apparent surround suppression might actually be confounded by the hemodynamics. One example hemodynamic confound has been called “hemodynamic stealing.” When the surround is strong, it will produce a very large BOLD signal, requiring a high level of blood flow in the cortical region corresponding to the surround. To satisfy the need for oxygenated blood, it may get diverted from the less active target region, thereby reducing the BOLD response to the target. Indeed, it has been observed that a BOLD increase in one brain region can be accompanied by a sustained negative BOLD signal in neighboring brain regions (Tootell et al., 1998; Smith et al., 2000; Raichle et al., 2001; Harel et al., 2002; Logothetis, 2002; Shmuel et al., 2002). Whereas this negative BOLD signal may reflect a decrease in neural response below spontaneous baseline activity in those regions (Tootell et al., 1998; Smith et al., 2000; Shmuel et al., 2002), it has also been suggested that negative BOLD is the result of hemodynamic stealing (Woolsey et al., 1996; Harel et al., 2002; Shmuel et al., 2002). Hemodynamic stealing would lead to a reduction in BOLD signal that is uncor-related with neural activity. In the present study, we observed a decrease in the BOLD response to the target because of the presence of a high-contrast surround stimulus. Different from the studies cited previously, the BOLD reduction we observed was a reduction in stimulus-induced activity, not a reduction below baseline. Nevertheless, we considered the possibility that the reduction in the BOLD signal might reflect hemodynamic stealing (induced by the highly active surround region) rather than neural suppression.
To distinguish between neural and hemodynamic effects, we used the difference in the time scale of these effects (with neural suppression being much faster). Specifically, we introduced a condition in which the surround stimulus appeared with a lag, 375 msec after the target disappeared. This delay is long enough to abolish the psychophysical masking effect of the surround, and neural surround suppression presumably does not occur. Thus, if our results were due to neural suppression, the lagging surround condition would give a similar BOLD response as the no-surround condition. If our results were the result of hemodynamic effects, however, we would expect a different outcome in this control experiment. Because the hemodynamics operate on a much slower time-scale (several seconds), our relatively short lag would be irrelevant, and the lagging surround condition would, thus, be comparable with the simultaneous surround condition.
The results from the control experiment clearly favored the neural-suppression interpretation. In different scans, we tested three surround conditions: no surround, lagging surround, and simultaneous surround. The V1 responses in the lagging-surround condition were very similar to the no-surround condition and significantly larger than in the simultaneous surround condition (Fig. 10). The results in V2 and V3 do show differences between the no-surround and lagging-surround conditions (although the differences are not statistically significant). These differences indicate that in these areas there may have been a hemodynamic effect that contributed to the overall suppression observed in the main experiment, but the measured hemodynamic effect is too small to account for the mismatch between the psychophysics and fMRI responses.
Discussion
Using fMRI and psychophysics, we have studied both the perceptual and physiological processes that occur when a target is embedded in a surround. The psychophysical data showed that the surround impairs contrast discrimination, especially at low pedestal contrasts. The fMRI data showed that responses to the target were diminished in the presence of the surround. Assuming that a fixed response difference is required for correct discrimination, we obtained a nice fit (with only one free parameter) between the behavioral data and the fMRI data. Consistent with previous reports (Kastner et al., 2001), the suppression from the surround was stronger in extrastiate visual areas than in V1. Indeed, the suppression was too strong in V2 and V3 to agree with the psychophysics. We performed a control experiment to demonstrate that these results cannot be attributed to a hemodynamic confound.
Two general models have been proposed for how a surround mask can affect target processing: (1) the mask may degrade the target signal, or (2) it may impair the read-out of this signal. We observed a considerable reduction in V1 activity in the presence of the surround mask, demonstrating that the mask affected the V1 representation of the target, and not just its readout. By contrast, read-out impairment was demonstrated in an elegant psychophysical study (He et al., 1996), in which peripheral grating patches were presented close to each other. The presence of neighboring patches made it impossible for the observers to determine the orientation of a target patch. Nevertheless, target presentation led to orientation-specific adaptation, implying that the orientation information was represented in visual cortex. The authors argued that attention limited the observers' ability to read out this information.
Several studies have shown that the attentional state of the observer can have dramatic effects on fMRI signals as early as primary visual cortex (Brefczynski and DeYoe, 1999; Gandhi et al., 1999; Martinez et al., 1999; Somers et al., 1999; Huk et al., 2001). When attempting to measure sensory signals, it is, therefore, critical to control attention. This may be particularly important when studying lateral inhibition, because there is converging evidence from electrophysiology (Reynolds et al., 1999), fMRI (Kastner et al., 1998) and psychophysics (Zenger et al., 2000) that inhibitory lateral interactions are modulated by attention. Because attention was carefully controlled in our experiments (see Materials and Methods) we believe that our fMRI measurements reflect sensory processing signals.
How does the surround mask degrade the target signal? Again, one can distinguish two types of effects: direct masking effects in which the mask stimulates the receptive fields of the target neurons, and indirect masking effects in which the mask stimulates other neurons that then interact with the target neurons. Our V1 data were most likely dominated by indirect masking effects. Physiological estimates of receptive field sizes in V1 depend on the method that is used to measure them (Kapadia et al., 1999; Sceniak et al., 1999; Cavanaugh et al., 2002a) and vary between 0.5° (Smith et al., 2001) and 1° (Cavanaugh et al., 2002a) in diameter at the eccentricity of our target annulus. The width of our target annulus was 3.3° of visual angle, corresponding to ∼8.4 mm of cortical distance (Horton and Hoyt, 1991). We restricted the analysis of the data to the gray matter subregions of each subject's V1 that contained neurons, the receptive fields of which were centered in the target annulus (see Materials and Methods). Because the target annulus was large compared with the V1 receptive fields, most of the neurons included in these subregions did not receive any direct input from the surround stimulus. Physiological data suggest that the surround effects in V1 extend over a distance of about three times the receptive field size (Maffei and Fiorentini, 1976; Li and Li, 1994; Angelucci et al., 2002; Cavanaugh et al., 2002a). Therefore, neurons with receptive fields centered in our annulus were likely to have received considerable surround modulation. The conjecture that our results are predominantly because of indirect masking is further supported by our psychophysical data. In the presence of the surround, the threshold data (Fig. 6A) do not follow the characteristic dipper function found for superimposed masking (Legge and Foley, 1980; Foley, 1994). Rather, they decrease at high pedestal contrasts resembling the results from previous studies of surround masking (Zenger-Landolt and Koch, 2001).
The progressive increase in suppression in V2 and V3 might simply reflect the progressive increase in receptive field sizes in those cortical areas. At corresponding eccentricities, the receptive fields in V2 are ∼1-3° in diameter (Gattass et al., 1981; Foster et al., 1985; Kastner et al., 2001), and they are ∼2-5° in diameter in V3 (Felleman and Van Essen, 1987; Gattass et al., 1988). Hence, although our V2 and V3 subregions were selected to include receptive fields centered within the target annulus, many of these receptive fields extended beyond the annulus into the surround. The responses of those V2/V3 neurons may have been saturated or suppressed by direct masking effects in the presence of the high contrast surround stimulus which fell within their classical receptive fields. This is unlikely to have occurred in V1, in which the receptive field sizes were small relative to the width of the target annulus (see above). Regardless, the critical issue is whether or not the psychophysical data were consistent with the measured cortical activity in each visual area's representation of the target annulus.
Given the larger receptive fields in extrastriate areas, it is conceivable that the cortical activity in these extrastriate areas might be more predictive of the psychophysics if the target annulus were chosen to be wider and, hence, better matched to the receptive field sizes. This could be readily tested by systematically varying the target size. If this were the case, then it would imply that there was nothing special about the V1 activity in our experiment other than a fortuitous choice of the stimulus size. However, it is widely believed that extrastriate neurons perform further processing, that is, that their responses are different from those of V1 neurons even after compensating for receptive field size.
Our results, demonstrating a nice fit between the behavioral data and V1 activity and a poor fit between the behavioral data and activity in extrastriate visual areas, raise the question of how the V1 activity is read out to guide behavior. It is widely believed that visual cortex is organized hierarchically, so that neural signals from V1 must pass through (and be processed further by) neurons in extrastriate areas before those signals can be used to drive behavior. Whether this is a strict feedforward hierarchy or a highly interactive (feedforward/feedback) system, neural signals that correspond to the subjects' perceptual reports ought to be evident beyond V1. We have not measured activity in all of the extrastriate visual areas, so it is possible that there would be a better match elsewhere (e.g., in V4). This seems unlikely, however, given the previous reports of progressively stronger suppressive effects in later visual areas (Kastner et al., 2001). A second possibility is that a subpopulation of extrastriate neurons might veridically carry the V1 signals, even though the majority of extriastriate neurons do not. A third possibility is that the perceived contrasts of the stimuli are represented explicitly (e.g., as neural firing rates) only in V1, and that a differential signal corresponding to the contrast difference (when present) is computed in extrastriate cortex and used to drive the motor responses.
In summary, our study demonstrates a striking quantitative agreement between human performance and activity in primary visual cortex, suggesting that V1 is a plausible candidate for mediating lateral masking phenomena observed behaviorally.
Footnotes
This work was supported by a grant from the National Eye Institute (R01-EY11794).
Correspondence should be addressed to Prof. David J. Heeger, Department of Psychology and Center for Neural Science, New York University, 6 Washington Place, Eighth Floor, New York, NY 10003. E-mail: david.heeger{at}nyu.edu.
Copyright © 2003 Society for Neuroscience 0270-6474/03/236884-10$15.00/0