Neurological case studies and qualitative measurements suggest that regions within human extrastriate cortex are specialized for different perceptual functions, including color. However, there are few quantitative measurements of human extrastriate color specializations. We studied the chromatic and temporal responses in several different clusters of human visual field maps using functional magnetic resonance imaging. Contrast response functions were measured for luminance [(L + M)-cone], red-green [(L - M)-cone] and blue-yellow (S-cone) modulations at various temporal frequencies. In primary visual cortex (V1), temporal responsivities to luminance and red-green modulations are approximately constant up to 10 Hz, but responsivities to blue-yellow modulations decrease significantly. In ventral occipital cortex (VO), all colors elicit strong responses, and, for each color, low temporal frequency modulations are more effective than high temporal frequency modulations. Hence, VO represents the full range of color information but does not respond well to rapid modulations. Conversely, in human motion-selective cortex (MT+) and V3A, blue-yellow modulations elicit very weak responses, whereas luminance and red-green high temporal frequency modulations are equally or more effective than low temporal frequency modulations. Hence, these dorsal occipital regions respond well to rapid modulations, but not all color information is represented. Similar to human motion perception, MT+ and V3A respond powerfully to all temporal frequencies but only to some colors. Similar to human color perception, VO responds powerfully to all colors but only to relatively low temporal frequencies.
Color vision begins with signals initiated in the L-, M- and S-cones. These signals are transformed by retinal and thalamic circuitry specialized for encoding different aspects of the signal, such as chromatic and temporal information (Rodieck, 1998; Dacey, 2000; Reid and Shapley, 2002). The signals from these circuits are, in turn, communicated to a variety of distinct cortical targets in which additional specializations for chromatic and temporal processing take place (Meadows, 1974; Zeki et al., 1991; Wade et al., 2002).
What measurements would suggest that a cortical region is specialized for color processing? For some insights about how to approach this problem, we might consider the best understood example of computational specialization for motion processing in macaque middle temporal area (MT). Its columnar organization of direction-selective units is well suited for representing information about all motion directions, and its strong responses to high temporal frequency signals make possible accurate computation of motion (Albright, 1984; Priebe et al., 2003). There is consistency between the motion selectivity of MT neurons and behavior (Britten et al., 1992). Conversely, MT receives weak S-cone signals (Seidemann et al., 1999; Wandell et al., 1999), and its receptive fields are relatively large (Albright and Desimone, 1987), making the circuitry poorly suited for detailed pattern vision or color. These observations support the theory that MT is specialized for motion processing.
The case for a cortical specialization for color, particularly in human, is based on fewer and more qualitative measurements. Neurological studies of cerebral achromatopsia and qualitative neuroimaging experiments suggest that ventral occipital cortex (VO) regions play a critical role (Zeki, 1990; Zeki et al., 1991; Rizzo et al., 1993), but more quantitative measurements are necessary to establish firmly a specialization role for color. For example, it is important to demonstrate that a cortical specialization for color contains strong and independent signals from all cone classes.
This criterion alone is not sufficient to establish a cortical specialization. For example, we expect that both the primary visual cortex (V1) and VO satisfy this criterion, just as both V1 and MT contain direction-selective neurons (Wandell, 1995). To establish a computational specialization for color, it is necessary to consider the interaction of color with additional stimulus properties, such as temporal frequency. Color appearance has very sluggish temporal dynamics (Kaiser and Boynton, 1996). Hence, just as MT specializes for dynamic information at the cost of coarse color representation, we might expect that a region specialized for color has poor temporal dynamics.
To clarify the cortical specializations for color, we quantitatively measured the responses to chromatic and temporal signals using functional magnetic resonance imaging (fMRI). We specifically compared the chromatic and temporal tuning in a few key locations: V1, human motion-selective cortex (MT+), and a candidate color specialization in VO. When considering chromatic and temporal responses together, these regions differ from one another in ways that suggest their perceptual functions.
Materials and Methods
Stimuli were designed and controlled using Matlab (MathWorks, Natick, MA) and the psychophysics toolbox software (Brainard, 1997) installed on a Macintosh PowerBook G4 laptop (Apple Computers, Cupertino, CA) with an integrated 10-bit precision graphics card (ATI Radeon 8500) and a flat-panel liquid crystal display (LCD) (Multisync LCD2080UX; NEC-Mitsubishi, Itasca, IL). The display, which spanned 8° horizontally and 6° vertically, had an 800 × 600 sampling grid, 60 Hz temporal refresh rate, and 80 cd/m2 mean luminance level. All experiments were done at viewing distance of 280 cm in rooms that were otherwise dark.
Color calibration. Display spectral properties were characterized using standard methods (Brainard, 1989; Wandell, 1995). Spectral and gamma curves were measured at 4 nm intervals in the 380-770 nm range using a photospectrometer (PR-650; Photoresearch, Chatsworth, CA). Stimulus cone contrasts were calculated using the 2° Stockman fundamentals (Stockman et al., 1993), which incorporate the macular pigment absorption effects. Because our stimuli were within 3° eccentricities, no correction for macular pigment was necessary.
Temporal calibration. The LCD temporal response was calibrated with a system comprising a photodiode, an operational amplifier, and a digital oscilloscope. We measured the harmonic amplitudes of photodiode responses to square-wave flickering at a variety of frequencies presented separately in the red, blue, and green channels. We found that there was only a small attenuation across frequencies up to 15 Hz and that the attenuation was independent of stimulus contrast. This attenuation profile is in agreement with rising and falling times (25 ms) of the LCD estimated by an independent calibration system (Norcia et al., 2005). There was also no significant drift of mean luminance level with contrast level or temporal frequency up to 15 Hz. Thus, our stimuli at high temporal frequency had no known contamination of low-frequency components. We compensated for the small attenuation at high temporal frequency (0.05 log10 units between 1.5 and 7.5 Hz) in calculation of stimulus contrasts.
fMRI measurements were performed on a 3 T General Electric scanner with a custom-designed surface coil (Nova Medical, Wilmington, MA). Subjects were supine in the scanner bore with the coil placed near the visual cortex. Head movement was minimized by padding and tape. Subjects viewed the LCD through an angled front surface mirror placed close to eyes. The LCD was inside a front-transparent shielded box in the scanner room at the rear end of the scanner bore.
Stimulus design. Stimuli were presented in a block design. Each block lasted 12 s. The control block was a neutral gray background. The stimulus block was a sequence of presentations of a flickering contrast pattern. The pattern was a radial sinusoidal grating (two cycles, 0.8 cycles per degree) (Fig. 1 A) arranged so that the grating edge was at a point of zero contrast. During each stimulus block, the pattern was presented five times. In each presentation, the grating pattern flickered (2 s), followed by the gray background (0.4 s). The flickering was a sinusoidal temporal modulation of contrast with the start, end, and mean all equal to the gray background (Fig. 1 B). To make the local spatiotemporal contrast approximately uniform, the spatial phase of the sinusoidal grating was randomly jittered between sine and cosine phase across presentations.
We define the stimulus contrast as the vector length (square root of the sum of squares) of individual cone contrasts, which were controlled based on color calibration measurements. To increase the precision of contrast control, a spatial dithering method was used. Low-contrast stimuli were presented using only one-quarter of the pixels (every other pixel in each row and column). This method provides finer contrast resolution over the low-contrast range. Under our viewing conditions, the dithering introduces spatial frequencies above 50 cycles per degree and thus does not change the visible spatial frequency contents significantly. This procedure was calibrated and used for contrasts below 3% [(L + M)-cone], 1% [(L - M)-cone] and 6% (S-cone).
To control attention during control and stimulus blocks, subjects performed a simple task that consisted of a sequence of fixation mark presentations. Throughout the scan, a small (0.1° radius), high-contrast fixation mark was presented (2 s), followed by a blank interval (0.4 s). The spatial pattern of the fixation mark varied twice during the 2 s, and the probability of the first and third patterns being identical was 0.25. During the 0.4 s blank interval, which coincided with the blank interval in stimulus blocks, the subject pressed a button to indicate whether the first and third fixation mark patterns were identical. The fixation mark was small, spatially segregated to direct attention away from the grating pattern. The attention task was the same in both stimulus and control blocks and thus had no effect on the measured responses to the radial grating.
Experimental sessions. Each fMRI session began by acquiring a set of T1-weighted anatomical images prescribed in 20 slices covering the visual cortex (field of view, 220 mm; repetition time, 8.9 ms; echo time, 2 ms; flip angle, 15°; slice thickness, 3 mm). Blood oxygen level-dependent (BOLD) signals were acquired using a self-navigated, interleaved spiral-trajectory pulse sequence (repetition time, 1.2 s; 2 interleaves; echo time, 30 ms; flip angle, 55°) (Glover and Lai, 1998). The effective interframe sampling interval and voxel size were 2.4 s and 2.5 × 2.5 × 3 mm, respectively. Slices were coronal, axial, or perpendicular to the calcarine sulcus; slice orientation had no notable effect on results.
Each functional scan (132 s) comprised five cycles that alternated between a 12 s stimulus block and a 12 s control block. In addition, a control block was added at the start of each scan to allow the magnetic field to reach steady state; data from this block were discarded. Conditions of the grating patterns were same throughout each scan but pseudorandomized across all scans: sequential scans used stimuli with different chromatic modulations, and the temporal frequency and contrast were randomly chosen.
Four human subjects (AAB, JL, LG, MS) participated in all fMRI measurements; subjects LG and MS were naive to the purpose of study. All subjects had normal color vision and a corrected acuity of 20/20 or better. Each subject completed a total of at least 120 scans taken at two temporal frequencies (1.5 and 7.5 Hz), three color modulation directions [(L + M)-, (L - M)-, and S-cone] and a variety of contrast levels. Additional measurements were taken for some subjects at other temporal frequencies (3, 5.5, 6.6, and 10 Hz) or color modulation directions (L + M + S, L - 1.5M, L - 0.5M).
We measured retinotopic maps of the central visual field in separate fMRI sessions. Expanding ring and rotating wedge stimuli were displayed to measure eccentricity and angular maps, respectively. Stimuli were high-contrast black-white dartboard patterns [spatial frequency, five cycles per degree (radial), 12 cycles per 2π radian (angular); temporal frequency, 2 Hz]. The ring covered 25% of the 0-3° eccentricity range and changed 0.3° in mean eccentricity every 2.4 s. The wedge angle was 45° and rotated 36° every 2.4 s. For each type of stimulus, a full display cycle comprised 24 s, and the data included at least 25 cycles across repeated scans.
We analyzed fMRI data with our custom software mrVista (http://white.stanford.edu/software/). Because of differences in the absolute amplitude of BOLD response between subjects, results for each subject were analyzed and presented separately.
Anatomical pre-processing. We acquired several whole-brain T1-weighted anatomical data. For each subject, we used FSL software (http://www.fmrib.ox.ac.uk/fsl/) to average and resample the data into a 1 × 1 × 1 mm resolution three-dimensional anatomical volume that was corrected for inhomogeneity, linearly transformed (with no rescaling or distortion), and aligned with the Talairach reference brain.
The gray matter and white matter were segmented from the anatomical volume (Teo et al., 1997). The white matter in the occipital lobe was manually edited to minimize segmentation errors (Dougherty et al., 2003). The white-gray boundary was rendered as a smoothed three-dimensional surface (Fig. 2). We restricted our data analysis to the gray matter.
Each fMRI session included both functional and anatomical data (20 images) in the same measurement planes. The anatomical images were aligned with the whole-brain anatomical volume through a semi-automated alignment algorithm (Nestares and Heeger, 2000). Hence, functional data sets obtained in different sessions were spatially coregistered and combined.
BOLD response analysis. Data in each fMRI session were analyzed voxel-by-voxel with no spatial smoothing. The acquired BOLD signal of each voxel was divided by its mean to derive a time series of percentage modulation. Low-frequency baseline drifts of each voxel were calculated by convolving this 50-sample time series with a 20-sample triangular kernel; these drifts were deducted from the time series. Head movements and other motion artifacts were examined. Most scans had minimal head motion (less than one voxel). Less than 10% of the scans had notable motion artifacts; these scans were discarded. No motion correction algorithm was applied.
The BOLD responses were summarized using several variables derived from the 50-sample time series of each voxel. Phase and coherence of the harmonic component at the fundamental stimulus frequency (five cycles per scan) were computed from the Fourier transform of time series (Brewer et al., 2002). All of the chromatic stimuli had the same fixed phase at the fundamental frequency. We found no systematic differences of BOLD response phases across the contrast, temporal frequency, and chromatic conditions (Boynton et al., 1996). Given the invariance, an average response phase was selected for each subject to characterize the fixed phase delay between the stimulus and the BOLD response. This phase differed across subjects (e.g., AAB, 35°; JL, 10°), presumably because of differences in the time course of hemodynamic responses.
To characterize the BOLD response amplitude, the time series was subdivided into five 10-sample time series, each corresponding to a cycle of a stimulus block followed by a control block. Each time series was independently fit by a sine wave of one cycle (the fundamental stimulus frequency) with the phase as the selected averaged response phase. The sine-wave amplitude summarized the BOLD response for one stimulus cycle. This procedure treats each cycle as an independent measurement and allows estimation of the cycle-to-cycle amplitude variance.
Visual field maps and regions of interest. Visual field maps in each subject were measured using traveling wave stimuli (Engel et al., 1997b; Wandell, 1999). Figure 2 shows the pseudocolor eccentricity and angular maps on a smoothed three-dimensional cortical surface (subject JL, left hemisphere). The eccentricity and angular maps were quite similar across all subjects.
There is a large semicircular eccentricity representation near the occipital pole. The most central representation (red-yellow) is located around the occipital pole, and an increasingly peripheral representation extends onto the ventral (Fig. 2 B) and dorsal surfaces (Fig. 2 E). This eccentricity representation can be divided by angular measurements into several distinct visual field maps: V1, V2, V3, and hV4 (Wandell, 1999; Wade et al., 2002). In each hemisphere V1 includes a hemifield representation that ranges from the lower (Fig. 2 F, orange) to upper vertical meridian (Fig. 2C, cyan); hence, V1 contains a complete visual field map. Similar visual field maps are also found for V2, V3, and hV4. Because these maps share a common semicircular eccentricity representation, they are described as a cluster (Wandell et al., 2005).
There is an additional cluster of visual field maps in the VO. This VO cluster is based on a semicircular eccentricity representation found in the fusiform gyrus (lateral to collateral sulcus) (Fig. 2 B). The most central VO eccentricity representation (red-yellow) is well separated from the most central V1/V2/V3/hV4 eccentricity representation (Wade et al., 2002). We identified securely one visual field map, VO-1, within this cluster by combining the eccentricity and angular measurements. The reasons for organizing this part of cortex into (hV4, VO-1), rather than the alternate organization into (V4v, V8), are described elsewhere (Wandell et al., 2005) (A. A. Brewer, J. Liu, A. R. Wade, B. A. Wandell, unpublished observations). The eccentricity map within VO-1 spans a hemifield and runs from relatively anterior (central) to posteromedial (peripheral) along the fusiform gyrus. The map represents increasingly peripheral visual field as it extends medially across the collateral sulcus and approaches the peripheral representation of V3 ventral (Fig. 2 B). The angular map within VO-1 runs from the fusiform gyrus (lower vertical meridian) toward the collateral sulcus (upper vertical) in a medioanterior direction, in which the ventral border of VO-1 abuts hV4 and the medioanterior border abuts V3 ventral (Fig. 2C). Such complete visual field maps of VO-1 are found in all subjects.
Clusters of visual field maps are also found on the dorsal and lateral surfaces of the occipital lobe. The posterior portion of the intraparietal sulcus (at the transverse occipital sulcus) on the dorsal surface contains a semicircular eccentricity representation (Fig. 2 E). Its most central eccentricity representation (green) is well separated from and relatively more peripheral than the most central eccentricity representations of V1/V2/V3/hV4 and VO (Press et al., 2001). Based on angular maps, this eccentricity representation is shared by V3A and V3B, each of which represents a full hemifield (Smith et al., 1998; Press et al., 2001; Wandell et al., 2005). Similarly, there is another visual field map representation at the superior limb of the inferior temporal sulcus. This representation overlaps with the MT+ cluster, which has motion-selective responses (Huk et al., 2002). A review of the locations and properties of human visual field maps may be found elsewhere (Wandell et al., 2005).
We defined regions of interest (ROIs) within the visual field maps. V1, V2, V3, hV4, VO-1, V3A, and MT+ ROIs were securely identified for all subjects. The surface areas and Talairach coordinates of some ROIs were shown in Table 1 (Dougherty et al., 2003). All ROIs contain eccentricity representations up to 3°. As expected, these ROIs in general respond well to our chromatic flicker stimuli that cover 3° eccentricities.
BOLD-equivalent stimulus contrast. Figure 3 illustrates that the V1 BOLD response increases in amplitude with stimulus contrast. The relationship between BOLD response and stimulus contrast is described by a BOLD-contrast response function. This function combines two distinct parts: the neural-contrast response function, which is what we aim to understand, and the coupling between the neural and BOLD signal, sometimes called the hemodynamic response efficiency (Logothetis and Wandell, 2004). The neural-contrast response function is often characterized as a sigmoid function. The hemodynamic response efficiency function is unknown but is well modeled as a linear transformation over modest ranges of contrast and BOLD amplitude (Boynton et al., 1996, 1999; Logothetis and Wandell, 2004). Hence, the BOLD-contrast response function can be modeled as follows:
where R is the BOLD response amplitude, c is the stimulus contrast, and M, p, and s are the saturation amplitude, slope, and semi-saturation contrast parameters, respectively. These parameters depend on the stimulus and other factors and are generally quite reliable because the fitting is based on multiple BOLD amplitude estimates (>15 per contrast level) measured across a range of contrast levels (more than five per color and frequency condition).
Differences in the hemodynamic response efficiency functions across cortex make it impossible to directly compare the BOLD amplitudes across ROIs. However, if we assume that the hemodynamic response efficiency within each ROI is independent of the stimulus condition, then two stimuli that elicit equal BOLD responses in an ROI also elicit equal neural responses. Hence, by finding the stimuli that elicit equal BOLD responses, we can compare the neural responsivities across different stimulus conditions in each ROI.
The first step of this stimulus-referred procedure is to estimate contrast response functions in an ROI for two stimulus conditions, for example, 1.5 and 7.5 Hz stimuli with the same color condition. By inverting these functions, for any given 1.5 Hz stimulus contrast, we can estimate the 7.5 Hz stimulus contrast that elicits the same BOLD response. These two stimulus contrasts are called the BOLD-equivalent stimulus contrasts (BESCs). The cortical ROI is relatively less responsive to the stimulus condition with higher BESC value.
By resampling the data (with replacement) and fitting the contrast response functions many times, we can obtain an estimate of the statistical distribution of stimulus contrast values that elicit any particular BOLD response (Efron and Tibshirani, 1994). We can calculate the statistical reliability of differences between two BESCs by applying Wilcoxon's rank-sum test to the two corresponding resampling distributions.
We measured psychophysical detection thresholds in separate experiments using essentially the same stimulus design as in fMRI sessions, with either the same LCD display or a 10-bit calibrated cathode ray tube (CRT) display (MultiSync FP2141B; NEC-Mitsubishi) to present the near-threshold low-contrast stimuli. Each trial comprised two intervals (2 s each) separated by a blank interval (0.4 s), cued by auditory tones and fixation marks. The flickering stimulus was presented in one of the two intervals, whereas the other interval was the gray background. Subjects indicated which interval appeared to have the flickering stimulus. Each session measured a single temporal frequency and color modulation condition using a set of three randomly interleaved staircases. Each staircase contained 30 trials, during which the contrast was decreased after two correct responses and increased after one incorrect response.
The measured percentages of correct responses were fit with Weibull functions of the stimulus contrast. The thresholds were estimated as the contrasts giving 71% correct response rates. The variance of each threshold was estimated using a bootstrapping procedure (Efron and Tibshirani, 1994). The estimated thresholds with the LCD display were very similar to thresholds with the CRT display, but the variances were larger. Hence, we present only the results from measurements with the CRT display.
V1 and VO-1
Figure 4A shows BOLD-contrast response functions in V1 (subject JL), for the 1.5 Hz (open symbols and solid curves) and 7.5 Hz (filled symbols and dashed curves) stimuli. The three panels show measurements using (L + M)-cone, (L - M)-cone, or S-cone stimuli. Previous reports found that V1 is most responsive to (L - M)-cone, less responsive to (L + M)-cone, and least responsive to S-cone stimuli per unit cone contrast (Kleinschmidt et al., 1996; Engel et al., 1997a). We confirm these responsivity differences. For example, the 1.5 Hz stimulus contrasts required to elicit 0.6% V1 BOLD responses are lowest (0.87 ± 0.07%) for (L - M)-cone, higher (7.8 ± 0.8%) for (L + M)-cone, and highest (11.7 ± 1.0%) for S-cone stimuli.
As the temporal frequency increases from 1.5 to 7.5 Hz, V1 contrast response functions for S-cone stimuli change such that there is a loss of responsivity to high-frequency S-cone signals. Notably, this loss is more significant at low BOLD response amplitudes. For example, to elicit 0.4% V1 BOLD responses, the 1.5 Hz S-cone contrast (6.5 ± 0.8%) is significantly lower than the 7.5 Hz S-cone contrast (15.5 ± 0.9%), but, to elicit 0.8% V1 BOLD responses, the 1.5 Hz S-cone contrast (19.0 ± 1.7%) is quite close to the 7.5 Hz S-cone contrast (22.2 ± 1.5%). This resembles a single-unit study in which the loss of responsivity of a single V1 neuron to high-frequency isoluminant modulations is more significant at low spike rates (Hawken et al., 2001). In comparison, as frequency increases from 1.5 to 7.5 Hz, the V1 contrast response functions change only slightly or not at all for (L + M)-cone and (L - M)-cone stimuli (Engel et al., 1997a). Measurements at additional frequencies (3, 5.5, 6.6, and 10 Hz; results not shown) further confirmed these relationships with temporal frequency.
Figure 4B shows BOLD-contrast response functions in VO-1, which is part of the VO cluster in the ventral surface (see Materials and Methods). VO-1 is as responsive to (L + M)-cone stimuli as V1. For example, the 1.5 Hz stimulus contrasts required to elicit 0.6% VO-1 BOLD responses are lowest (0.74 ± 0.04%) for (L - M)-cone, higher (5.3 ± 1.0%) for (L + M)-cone, and highest (12.7 ± 1.0%) for S-cone stimuli, similar to what we found in V1. At 7.5 Hz, these contrasts are still lowest (1.43 ± 0.18%) for (L - M)-cone, higher (14.2 ± 2.9%) for (L + M)-cone, and highest (25.0 ± 1.3%) for S-cone stimuli (see Fig. 8).
Whereas the relative color responsivity in VO-1 resembles that of V1, the temporal frequency responsivity in VO-1 significantly differs from V1. As the temporal frequency increases from 1.5 to 7.5 Hz, all VO-1 contrast response functions show a loss of responsivity. To elicit equal VO-1 BOLD responses (e.g., 0.6%), the 7.5 Hz stimulus contrasts (14.2 ± 2.9%, L + M; 1.43 ± 0.18%, L - M; 25.0 ± 1.3%, S) are all significantly higher than the otherwise matched 1.5 Hz stimulus contrasts (5.3 ± 1.0%, L + M; 0.74 ± 0.04%, L - M; 12.7 ± 1.0%, S). Hence, VO-1 has lower responsivity to 7.5 than to 1.5 Hz stimuli for all three color modulation directions. Measurements at additional frequencies and color directions (L + M + S, L - 1.5M, and L - 0.5M; results not shown) further confirmed the reduced VO-1 responsivity to high temporal frequency stimuli.
Figure 5 shows BOLD-contrast response functions in V1 and VO-1 of another subject (AAB) in the same format as Figure 4. Although these functions differ somewhat between the two subjects, probably because of the hemodynamic response efficiency differences, the relationships with color and temporal frequency are the same for both subjects. All of the relationships described above are statistically reliable at p ≪ 10-4 (Wilcoxon's test; see Materials and Methods).
We reached the same conclusions in V1 and VO-1 of all four subjects. The response pattern in VO-1 is also quite similar to that in other regions of the VO cluster. We also studied the contrast response functions of cortical regions within V2, V3, and hV4 (see supplemental material, available at www.jneurosci.org). The cortical responses to high-frequency signals appear progressively attenuated compared with the responses to low-frequency signals as one measures in the order of V1, V2, V3, hV4, and VO-1.
V3A and MT+
Figure 6 shows BOLD-contrast response functions in V3A (subject JL) and MT+ (subject AAB). These regions respond well to (L + M)-cone stimuli, and the BOLD responses saturate at stimulus contrasts as low as 10%. However, for S-cone stimuli, V3A and MT+ have very weak responses, and a contrast response curve could not be reliably fit to the data. These regions are approximately equally responsive to (L + M)-cone and (L - M)-cone stimuli per unit cone contrast, indicating that the (L - M)-cone signals are relatively weak compared with V1 and VO. For example, to elicit 0.4% BOLD responses in MT+ (subject AAB), the (L + M)-cone and (L - M)-cone contrasts are (4.3 ± 1.2%, 3.0 ± 0.4%) at 1.5 Hz and (1.5 ± 0.3%, 1.2 ± 0.2%) at 7.5 Hz. Hence, in both V3A and MT+, the responses to (L - M)-cone and S-cone stimuli are weaker than in V1. However, the (L - M)-cone and (L + M)-cone stimuli are similarly effective per unit contrast at evoking V3A and MT+ responses (see Fig. 8).
As temporal frequency increases from 1.5 to 7.5 Hz, some V3A and MT+ contrast response functions for (L + M)-cone and (L - M)-cone stimuli change significantly. For example, to elicit equal BOLD responses (e.g., 0.4%) in MT+ (subject AAB), the cone contrasts required at 1.5 Hz (4.3 ± 1.2%, L + M; 3.0 ± 0.4%, L - M) are significantly higher than at 7.5 Hz (1.5 ± 0.3%, L + M; 1.2 ± 0.2%, L - M). Similarly, the (L - M)-cone signals drive V3A (subject JL) more effectively at 7.5 than at 1.5 Hz, whereas the (L + M)-cone signals are approximately equally effective at 7.5 as at 1.5 Hz. Hence, both V3A and MT+ respond well to high temporal frequency modulations. The representation of high temporal frequency luminance information in V3A and MT+ is at least as strong as in V1 and significantly stronger than in VO. All of the relationships described above are statistically reliable at p ≪ 10-4 (Wilcoxon's test).
In these experiments, the BOLD response amplitudes in V3A and MT+ are lower than in V1 and VO. We did not obtain good responses in both V3A and MT+ for all subjects; however, we did obtain good responses in at least one of these regions for each subject (V3A in three subjects and MT+ in two subjects). The similarity of V3A and MT+ across subjects is notable. The relatively low response amplitudes in V3A and MT+ are probably attributable to the 3° stimulus size. Such small foveal stimuli are helpful in analyzing VO and for understanding color vision. Conversely, these stimuli are not optimized to elicit responses in V3A and MT+.
Summary of temporal responsivity differences
In this section, we summarize the temporal responsivity properties of V1, VO-1, and V3A/MT+ using the BESC values. This stimulus-referred procedure specifies two stimulus contrast levels that elicit the same BOLD response within a particular region of interest. For example, we read from the L + M panel of Figure 4A that, to elicit 0.6% V1 BOLD responses, the (L + M)-cone stimulus contrasts are 7.8 and 6.1% at 1.5 and 7.5 Hz, respectively. This pair of BESC values corresponds to a single point on the V1 BESC curve in the L + M panel of Figure 7A. Measuring the BESC value pairs at different BOLD response levels produces the BESC curve.
Figure 7 shows a set of BESC curves in four subjects, each row corresponding to one subject. The three columns show measurements using (L + M)-cone, (L - M)-cone, and S-cone stimuli. The curves show measurements in V1 (solid), VO-1 (dashed), and V3A/MT+ (dotted). The BESC curves are shown over the range of BOLD-contrast responses above the BOLD response noise levels and below saturation. The contrast response curves over this range, and therefore the BESC curves, are quite stable as measured by bootstrapping. When a cortical region is consistently more responsive to 1.5 than to 7.5 Hz, the BESC curve falls above the diagonal (no shading); otherwise, the curve falls below the diagonal (in the shaded zone).
In the (L + M) column, the V1 curves (solid) are close to the diagonal except for one subject (LG). The same pattern holds for the (L - M) column. This indicates that V1 is equally sensitive at the two test frequencies for stimuli initiated by the L- and M-cones. V1 curves with S-cone stimuli (right column) fall above the diagonal, indicating that V1 is more sensitive to low (1.5 Hz) than to high (7.5 Hz) temporal frequency S-cone signals.
The VO-1 curves (dashed) are significantly above the diagonal in all columns, indicating that, for all colors, VO-1 is more sensitive to low (1.5 Hz) than to high (7.5 Hz) temporal frequency stimuli. Conversely, V3A/MT+ curves (dotted) are below or close to the diagonal, indicating that V3A/MT+ is less or equally sensitive to low than to high temporal frequency for L- and M-cone stimuli. V3A/MT+ curves are not shown for S-cone stimuli because the responses are too weak.
Finally, the VO-1 curve is above the V1 curve and the V3A/MT+ curve is below or close to the V1 curve for every subject and color direction. That is, for a low temporal frequency contrast, the BOLD-equivalent high temporal frequency contrasts are the lowest for V3A/MT+, higher for V1, and the highest for VO-1. This indicates that the responses to high-frequency signals are relatively amplified in V3A/MT+ compared with V1 and relatively attenuated in VO-1 compared with V1.
Psychophysical detection thresholds to these stimuli are summarized in Table 2. The psychophysical temporal sensitivity does not match the BOLD temporal responsivity. For example, the psychophysical thresholds increase sharply from 1.5 to 7.5 Hz for both (L - M)-cone and S-cone stimuli but not for (L + M)-cone stimuli (Engel et al., 1997a). None of the three cortical regions shows this pattern. The lack of correspondence may be because the stimulus contrasts in BOLD measurements never reached down to the psychophysical threshold levels. Alternatively, the regions of interest or the BOLD responses may not correspond to the neural substrate that limits psychophysical sensitivity.
Summary of color responsivity differences
Figure 8 summarizes the color responsivity properties of V1, VO-1, and V3A/MT+ using BESC curves. Figure 8A compares (L - M)-cone and (L + M)-cone signals in all four subjects and three cortical regions: V1 (solid), VO-1 (dashed), and V3A/MT+ (dotted). When a cortical region is consistently more responsive to (L - M)-cone than to (L + M)-cone stimuli per unit cone contrast, the BESC curve falls below the diagonal (in the shaded zone). The V1 and VO-1 curves fall similarly and significantly below the diagonal for all subjects. In comparison, the V3A/MT+ curves are close to the diagonal, indicating that these regions are equally responsive to (L - M)-cone and (L + M)-cone stimuli per unit cone contrast.
Figure 8B compares S- and (L + M)-cone signals in the same format as Figure 8A. When a cortical region is consistently more responsive to (L + M)-cone than to S-cone stimuli, the BESC curve falls above the diagonal (no shading). The V1 and VO-1 curves fall similarly and significantly above the diagonal for all subjects.
VO and color perception
Our measurements support the hypothesis that VO has a critical role in human color perception. This hypothesis was originally suggested by neurological cases in which ventral cortical lesions produce a specific loss of color vision (Meadows, 1974; Zeki, 1990) and further supported by “color-exchange” experiments in which adding an isoluminant pattern to a monochrome luminance pattern produces a powerful response in VO (Zeki et al., 1991; Wade et al., 2002).
These observations are sometimes misinterpreted as implying that luminance is not represented in VO. Instead, we suggest that candidate cortical regions for color computation must respond well to signals in all color dimensions (luminance and chromatic) to contain the full range of color information. As demonstrated in this paper, V1 and VO satisfy this fundamental criterion. Both V1 and VO are most responsive to (L - M)-cone, less responsive to (L + M)-cone, and least responsive to S-cone modulations per unit cone contrast (Kleinschmidt et al., 1996; Engel et al., 1997a).
Although VO and V1 both represent the full range of color, the temporal responsivities in VO differ from those in V1. A fundamental property of human color perception is that lights flickering at high frequency lose their color appearance (Kaiser and Boynton, 1996). This perceptual property suggests that the cortical sites that represent color appearance respond weakly to high temporal frequencies compared with other cortical sites. We find that VO, but not V1, satisfies this criterion. As the (L + M)-cone, (L - M)-cone, and S-cone signals are communicated from V1 to VO, they are all temporally filtered such that the high temporal frequency responses in VO are attenuated compared with those in V1. Moreover, the relative temporal responsivities between low and high temporal frequencies become approximately matched in VO.
These empirical results reveal two principles concerning color computations. First, color perception is based on comparing signals in multiple pathways. We showed that VO is a candidate cortical region for color computation because it receives strong luminance and chromatic signals. We speculate that these signals are delivered on independent cortical pathways, such as those present in V1 (Chatterjee and Callaway, 2003). Second, the comparison is optimized when luminance and chromatic signals have matched temporal dynamics. This matching might be achieved by neural mechanisms with a sufficiently long integration time. We speculate that neural circuitry specialized for color, such as VO, cannot operate faster than the slowest of luminance and chromatic signals (e.g., S-cone signals). In summary, VO has independent input signals with matched temporal dynamics; such properties are beneficial for a computational process specialized for comparing chromatic signals.
The measurements in MT+ and V3A suggest that these regions, despite being well separated on the cortical surface, are part of a common functional network. Compared with other parts of visual cortex, both MT+ and V3A have weak responsivities to (L - M)-cone and S-cone stimuli. Previous work showed that MT+ and V3A have common (strong) responses to achromatic motion stimuli (Tootell et al., 1997; Smith et al., 1998). The common responsivities to color and flicker demonstrated in this paper indicate that MT+ and V3A respond similarly to stimuli beyond the realm of motion and reinforce the hypothesis that they are part of a specialized cortical network.
The properties of this MT+/V3A network are profoundly different from those of VO. First, this network does not respond well to all chromatic modulations. MT+ and V3A respond poorly to S-cones even at visual field locations that produce powerful V1 responses. Because there is no need to compare color signals for motion estimation, these computations can be performed using a color-univariant spatiotemporal signal; that is, a signal that combines luminance and chromatic inputs (Adelson and Bergen, 1985; Dougherty et al., 1999). Such a signal might be implemented by combining the magnocellular pathway (large weight) with parvocellular and koniocellular pathways (small weights) (Maunsell et al., 1990; Sawatari and Callaway, 1996; Wandell et al., 1999). Second, motion computations perform best when the spatiotemporal signal has precise temporal information. As the L- and M-cone signals are communicated from V1 to MT+ and V3A, high temporal frequency responses are not attenuated. The best temporal precision is achieved by neural mechanisms with sufficiently short integration times. In summary, MT+ and V3A have a color-univariant signal with the fastest possible temporal dynamics; such properties are beneficial for a computational process specialized for motion estimation.
There are two important considerations of interpreting BOLD responses. First, for a number of reasons, such as hemodynamics, equal neuronal responses may not produce equal BOLD responses across cortical locations (Logothetis and Wandell, 2004). Comparing the BOLD responses in two locations to a single stimulus cannot prove that the neuronal responses differ between those locations. Second, measurements using a single stimulus contrast level may obscure responsivity differences (Hadjikhani et al., 1998; Singh et al., 2000). The stimulus-referred procedure used here (BESC) avoids these problems, allowing meaningful comparisons between cortical regions and between stimulus conditions.
The stimulus-referred procedure, which has a long history in visual neuroscience (Wandell, 1995; Kaiser and Boynton, 1996), can be used to compare measurements obtained by vastly different methods. For example, one can compare spike-rate equivalent stimulus contrast values with BESC values. Notably, by using the stimulus-referred procedure, the neuroimaging, electrophysiological, and behavioral experiments can provide converging evidences for neural models (Dougherty et al., 1999; Seidemann et al., 1999; Wandell et al., 1999).
In this paper, however, behavioral measurements of detection sensitivity do not match with BESC values in any single cortical region we studied. We speculate that color appearance computations, rather than sensitivity limitations, are measured by VO responses. To test this hypothesis, one needs to measure quantitatively the variation in color appearance with increasing temporal frequency. We are unaware of such quantitative measurements beyond the demonstration of a general loss of color saturation with increasing temporal frequency [heterochromatic flicker photometry (Kaiser and Boynton, 1996)]. Thus, future behavioral work could clarify further the precise color computations carried in VO.
The chromatic and temporal responses in VO differ distinctly from those in MT+ and V3A. The VO regions respond well to all three dimensions of chromatic information and respond more powerfully to low than to high temporal frequency signals. In contrast, MT+ and V3A respond well to high temporal frequency signals and respond weakly to near-isoluminance modulations. These findings suggest that VO specializes in color computation, whereas MT+ and V3A belong to the network that specializes in estimation of temporal information such as motion. These specializations are at the cost of coarse VO representations of temporal information and coarse MT+ and V3A representations of color information.
More generally, local neuronal specializations for one visual feature (e.g., color) are accompanied by selective loss of representation of other features (e.g., temporal dynamics). It is perhaps impossible to design local neuronal circuitry that simultaneously optimizes color and temporal signals. The optimization for chromatic and temporal features requires different circuitry, which in the human brain is localized in widely separated cortical locations. The value of the neuronal optimization appears to out-weigh the challenge of coordinating signals between these locations.
This work was supported by National Eye Institute Grant RO1 EY03164. We thank Alyssa Brewer, Robert Dougherty, Satoshi Nakadomari, Anthony Norcia, David Ress, and Alex Wade for their help and comments on this manuscript.
Correspondence should be addressed to Junjie Liu, Wandell Laboratory, Stanford University, Jordan Hall, Building 420, Stanford, CA 94305. E-mail:.
Copyright © 2005 Society for Neuroscience 0270-6474/05/253459-10$15.00/0