Abstract
We recorded the responses of direction-selective simple and complex cells in the primary visual cortex (V1) of anesthetized, paralyzed macaque monkeys. When studied with sine-wave gratings, almost all simple cells in V1 had responses that were separable for spatial and temporal frequency: the preferred temporal frequency did not change and preferred speed decreased as a function of the spatial frequency of the grating. As in previous recordings from the middle temporal visual area (MT), approximately one-quarter of V1 complex cells had separable responses to spatial and temporal frequency, and one-quarter were “speed tuned” in the sense that preferred speed did not change as a function of spatial frequency. Half fell between these two extremes. Reducing the contrast of the gratings caused the population of V1 complex cells to become more separable in their tuning for spatial and temporal frequency. Contrast dependence is explained by the contrast gain of the neurons, which was relatively higher for gratings that were either both of high or both of low temporal and spatial frequency. For stimuli that comprised two spatially superimposed sine-wave gratings, the preferred speeds and tuning bandwidths of V1 neurons could be predicted from the sum of the responses to the component gratings presented alone, unlike neurons in MT that showed nonlinear interactions. We conclude that spatiotemporal modulation of contrast gain creates speed tuning from separable inputs in V1 complex cells. Speed tuning in MT could be primarily inherited from V1, but processing that occurs after V1 and possibly within MT computes selective combinations of speed-tuned signals of special relevance for downstream perceptual and motor mechanisms.
Introduction
The speed of a moving object is not represented directly in the input to vision but must be computed by comparing the spatial location of the object at different times. Because their spatial receptive fields are punctate, the responses of photoreceptors can depend only on the local temporal variation of light intensity: when tested with sinusoidal gratings, their sensitivity will be “separable” in the sense that it depends on the product of two functions, one of spatial and one of temporal frequency. As a result, preferred speed, given by the ratio of temporal and spatial frequency, varies with stimulus spatial frequency. At the other extreme, some cortical neurons represent object speed directly by maintaining the same speed preference as spatial frequency changes (Perrone and Thiele, 2001; Priebe et al., 2003). Thus, a useful signature of the computation of object speed is how the responses of neurons to grating motion at different speeds depend on the spatial frequency of the stimulus (Tolhurst and Movshon, 1975; Holub and Morton-Gibson, 1981; Baker, 1990; Levitt et al., 1994; McLean and Palmer, 1994).
We can think of the creation of speed-tuned responses as analogous to the creation of orientation selectivity in primary visual cortex (V1) simple cells: for orientation, non-oriented inputs from the LGN combine in a way that allows simple cells to code for spatial orientation (Hubel and Wiesel, 1962); for speed, individual nonselective inputs (i.e., spatiotemporally separable cells) combine so that the target neuron selectively codes for a biologically important feature of the input, speed (Heeger et al., 1996; Simoncelli and Heeger, 1998). To understand the latter transformation, one must determine the responses of neurons at different levels of the visual motion system. In retinal ganglion cells and LGN neurons, spatiotemporal frequency tuning is not separable, but the deviations are of the wrong kind to represent speed (Enroth-Cugell et al., 1983; Hicks et al., 1983; Derrington and Lennie, 1984). Recordings from cat area 17 suggest that tuning in V1 at contrast threshold is approximately separable in spatial and temporal frequency (Tolhurst and Movshon, 1975), as do some measurements from macaque (Foster et al., 1985). Some neurons in V2 and the middle temporal visual area (MT) show speed tuning that is invariant with spatial frequency (Foster et al., 1985; Levitt et al., 1994; Perrone and Thiele, 2001); others show separable tuning for spatial and temporal frequency, although half fall between these two extremes (Priebe et al., 2003). Because these data come from many laboratories using different conditions and species, it is difficult to draw any strong conclusions about how speed-tuned responses are created, especially without knowing the spatiotemporal behavior of the directionally selective cells in V1, which provide input to downstream motion processing areas (Movshon and Newsome, 1996).
Here, we investigate the representation of speed in V1. Directionally selective simple cells in V1 show separable tuning for spatial and temporal frequency, whereas directionally selective complex cells show the same degree of speed tuning found in area MT neurons. Thus, the nonseparable speed tuning shown by MT neurons is probably inherited from V1. Our results suggest a hypothesis for how a representation of target speed is created by transforming visual signals in multiple small steps across several levels of visual processing. As in other aspects of motion processing (Movshon et al., 1985; Britten et al., 1992), all but the last of these steps seems to be completed by the time visual motion signals reach the level of complex cells in V1.
Materials and Methods
Experiments were performed at University of California, San Francisco and New York University (hereafter, W and E) using very similar procedures. We made extracellular single-unit microelectrode recordings in the primary visual cortex of three anesthetized, paralyzed macaques (Macaca fascicularis) in recording sessions that lasted between 84 and 120 h. The W and E methods were identical to those described by Priebe et al. (2003) and Cavanaugh et al. (2002), respectively, and will not be presented in detail here. All experiments followed protocols that had received previous approval by the Institutional Animal Care and Use Committee at the relevant institution.
Experimental design.
For each single unit, we chose the preferred eye and covered the other with an opaque occluder. Receptive fields were mapped with hand-held stimuli; all of the neurons had receptive field centers within 12° of the fovea. We positioned a mirror to center the receptive field on a video monitor and conducted the remaining experiments under computer control using displays and computer software described previously [program W (Priebe et al., 2003); program E (Cavanaugh et al., 2002)]. Experiments consisted of a sequence of brief presentations of moving stimuli (duration of 1–3 s) with an intertrial interval of 600–1000 ms. In the intertrial interval, the screen either was blank (E) or presented a stationary version of the stimulus form to be used in the upcoming trial (W). Stimuli consisted of either sine-wave gratings or textures of random dots; in each case, we started by presenting stimuli in different-sized apertures to choose the aperture that produced the largest response. All grating stimuli were surrounded by a gray background of the mean luminance of the grating (W, 60 cd/m2; E, 33 cd/m2).
For sine-wave gratings, we determined the preferred direction based on the responses to the motion of a 32%-contrast sine-wave grating in 16 directions. We then assessed preferences for spatial and temporal frequency by measuring the response to gratings that moved in the preferred direction for many combinations of spatial frequencies and temporal frequencies. The stimulus frequencies were varied over the full range that activated the neuron under study, in half-octave (E) or full-octave (W) unit steps. For many neurons, gratings were presented at two contrasts, 32 and 8%, in randomly interleaved trials. For 25 neurons (16 complex, 9 simple cells), we also tested responses with stimuli that contained two spatially overlapping gratings. Dual-grating stimuli were created by displaying each of the component gratings individually in temporally alternated frames. The refresh rate of the monitor was 100 Hz, so that the refresh rate to display the dual-grating stimulus was 50 Hz. Dual-grating experiments contained interleaved presentations of single sine-wave gratings, achieving the same refresh rate by alternating the grating temporally with blank (gray) screens of the same mean luminance. Equal values of contrast were obtained by displaying the components of the dual-grating stimulus at half the contrast used for the single sine-wave gratings. The duration of sine-wave grating presentations was adjusted so that we always showed an integral number of cycles of temporal frequency for each component of the stimulus.
For a subset of neurons, we measured the speed tuning for random dots using displays that have been described previously for W experiments (Priebe et al., 2002). In E experiments, similar displays of bright dots on a gray background were used. We determined the approximate best speed by ear before running experiments to identify quantitatively the preferred direction of each neuron for the random dot textures. Finally, we evaluated the speed tuning by moving the random dot texture in the preferred direction at speeds ranging from 1 to 128°/s.
Data analysis.
We analyzed responses by aligning all of the spike trains elicited by identical trajectories of grating motion on the onset of motion and accumulating poststimulus time histograms with a bin width of 1 ms. Background responses were eliminated by subtracting the mean firing rate when the screen was blank from the responses to moving stimuli. We then quantified the results of each experiment by measuring the mean firing rate for complex cells and the amplitude of modulation of firing rate for simple cells during selected analysis intervals from the background-corrected histograms. To provide the data for fitting, we measured the firing rate from each cell on a trial-by-trial basis. The set of single-trial firing rates then were used, along with the Gauss-Newton algorithm in Matlab (function “nlinfit”; MathWorks, Cambridge, MA), to fit the parameters of the equations (see below). Confidence intervals for parameter estimates were computed from the Jacobian matrix and the residuals using the Matlab function “nlparci.” Specific analyses are described at the relevant places in Results.
We estimated the direction selectivity of each neuron from the responses to gratings of 32% contrast at the preferred spatial and temporal frequency of the neuron under study. Direction selectivity was quantified using the direction index (DI): where Rp and Rn are response amplitudes for grating motion in the preferred and opposite directions, respectively. To fit plots relating neuronal responses to stimulus speed we used the following: where A is the peak response of the neuron, ps is the preferred speed, σ is the tuning width, and ζ is skew. To characterize data relating neuronal responses to spatial and temporal frequency, we fitted the full spatiotemporal tuning surface of each neuron with a variant of a two-dimensional Gaussian function in which the preference for temporal frequency could depend on stimulus spatial frequency: where and A is the peak response of the neuron, sf0 is the preferred spatial frequency averaged across temporal frequencies, tfp(sf) is the preferred temporal frequency for a particular spatial frequency, and ζ is the skew of the temporal frequency tuning curve. The dependence of the preferred temporal frequency (and therefore preferred speed) on spatial frequency is captured by the parameter ξ, which is the exponent of a power-law relationship between preferred temporal frequency and stimulus spatial frequency.
For each neuron and value of stimulus contrast, we created standard contour plots to represent the spatiotemporal response surfaces by computing the locations in the spatial and temporal frequency space in which response amplitude crossed a specified value.
Histology.
At the end of the experiment, monkeys were anesthetized deeply with sodium pentobarbital and perfused through the aorta with PBS, followed by 10% Formalin. Tissue was processed as frozen sections, and electrode penetrations were reconstructed to allow localization of the recorded units according to layer (Lisberger and Movshon, 1999). Direction-selective complex cells were recorded mostly in layers III and IV, whereas simple cells were found in layers IV–VI.
Results
Tuning for spatiotemporal frequency and speed
The speed of a grating stimulus is given by its temporal frequency divided by its spatial frequency. A neuron therefore cannot be tuned for each of the three parameters in a way that is independent of the values of the other two. If the preferred speed is the same at all spatial frequencies, then the temporal frequency tuning must vary as a function of spatial frequency: we say that the neuron is “speed tuned.” If the preferred temporal frequency does not vary as a function of spatial frequency, then the preferred speed must vary with spatial frequency: we say that the neuron has “separable tuning” for spatial and temporal frequency.
Figure 1 shows the predictions made by these two extreme models for a neuron stimulated with moving gratings at a range of spatial and temporal frequencies. We visualize spatiotemporal tuning surfaces as contour plots of the responses to gratings of different spatial and temporal frequencies (Fig. 1B,D).
If a neuron has separable tuning, then equal-response loci on its spatiotemporal tuning surface are ellipses whose primary axes are vertical and horizontal, because the surface is the product of independent functions along the horizontal and vertical axes (Fig. 1B). Separable tuning causes preferred speed to decrease as a function of spatial frequency, as shown by plotting response as a function of stimulus speed tuning for each individual spatial frequency (Fig. 1A). For a neuron with separable frequency tuning, the relationship between spatial frequency and preferred speed obeys a power law with an exponent of −1.
If a neuron is speed tuned, then it has a tilted spatiotemporal tuning surface like that in Figure 1D. When the speed tuning is plotted for different spatial frequencies, the speed that elicits the peak response does not change (Fig. 1C). In extrastriate area MT, approximately one-quarter of the neurons had tuning like that in Figure 1A, one-quarter resembled Figure 1C, and the remaining 50% had intermediate properties (Priebe et al., 2003). In the present paper, we describe a similar analysis of the spatiotemporal tuning of directionally selective neurons in the primary visual cortex.
Spatiotemporal separability of V1 neurons
After isolating each neuron in V1, we presented gratings of different orientation and direction to determine the motion selectivity of a neuron. We classified cells as simple or complex using the relative modulation measure described by Skottun et al. (1991). Neuronal response was taken as the amplitude of response modulation at the drift frequency for simple cells and as the baseline-corrected mean firing rate for complex cells. We studied only neurons whose direction index (DI in Eq. 1) exceeded 0.5. In fact, almost all V1 neurons that satisfied this criterion had values of DI near 1, providing a population with directionality similar to that of MT neurons.
Figure 2 shows the responses of two representative V1 neurons to gratings drifting in the preferred direction at a range of spatial and temporal frequencies. Figure 2, A, B, E, and F, shows responses as a function of the speed of the grating (as in Fig. 1A,C) in each case for three different spatial frequencies of two different contrasts. For the simple cell whose data are shown in Figure 2, A and B, speed tuning shifted substantially as a function of spatial frequency: preferred speed shifted from 40 to 8°/s as spatial frequency increased from 0.5 cycle/° (circles) to 2 cycles/° (squares). For the complex cell of Figure 2, E and F, preferred speed did not shift appreciably as spatial frequency changed. The effect of spatial frequency on speed tuning was slightly different at low and high contrasts, a point to which we return below.
To quantify the degree to which preferred speed depended on the spatial frequency of the stimulus, we fitted the full set of spatiotemporal responses of each neuron (represented by the contour plots in Fig. 2C,D,G,H) with Equations 3 and 4. The fits generated the continuous curves in Figure 2, A, B, E, and F. In our fitting procedure, ξ provides an index of the relationship between preferred speed and spatial frequency. When ξ is 0, the preferred temporal frequency is independent of spatial frequency, as in the model response of Figure 1A, and preferred speed changes with spatial frequency. When ξ is 1, the preferred temporal frequency is proportional to spatial frequency, and preferred speed is constant across spatial frequency, as in the model response of Figure 1C. For the rest of the paper, we will use the value of ξ obtained from Equations 2 and 3, referring to it as the “speed-tuning index.” Note that ξ is equal to Q + 1 from our previous analysis of MT neurons (Priebe et al., 2003).
For gratings of high contrast (32%), the distribution of speed-tuning index across our sample of direction-selective simple cells (Fig. 3A, top histogram) is centered near 0, indicating that these neurons have approximately separable tuning for spatial and temporal frequency, as found in cat by Tolhurst and Movshon (1975). For the direction-selective complex cells in V1, the speed-tuning index for gratings of high contrast showed a continuous distribution that ranged from separable to speed tuned (Fig. 3A, middle histogram). The mean values of ξ across our samples of simple and complex cells were 0.08 and 0.44 and were significantly different from each other (t test, p < 0.01). The mean value of ξ for complex cells in V1 was not significantly different from the mean of 0.48 for MT neurons [Fig. 3A, bottom histogram (data from Priebe et al., 2003)] (t test, p = 0.32).
In 16 of the 22 simple cells, ξ had 95% confidence intervals that overlapped 0, whereas the confidence intervals overlapped 1 in only one simple cell. Thus, the hypothesis of separable tuning for temporal and spatial frequency could not be rejected for most simple cells, whereas the speed-tuning hypothesis was supported for only one. A few simple cells had values of ξ that were statistically different from both 0 and 1, indicating behavior intermediate between perfectly separable tuning and perfect speed tuning. In 8 of the 33 complex cells, ξ had 95% confidence intervals that overlapped 0 (indicating separable tuning), whereas the confidence intervals overlapped 1 (indicating speed tuned responses) in 9 of 33 complex cells. Almost half of the complex cells (n = 16) had values of ξ that were statistically different from both 0 and 1, indicating behavior intermediate between perfectly separable tuning and perfect speed tuning.
For each neuron, we also checked the relationship between preferred speed and spatial frequency by fitting curves relating response to stimulus speed for each individual spatial frequency using Equation 2. We then made a separate estimate of ξ by fitting a power function to the relationship between preferred temporal frequency and spatial frequency. Across our sample of V1 neurons, we found good agreement between the estimates of ξ obtained by fitting Equations 2 and 3 to the full dataset together and those obtained by fitting speed-tuning curves for each individual spatial frequency (r = 0.92). We also obtained very similar values of ξ when our equations lacked the skew parameter (r = 0.88), although the fits using skewed temporal frequency functions described more of the variance in the responses of V1 neurons. Finally, we obtained very similar values of ξ when we measured the response of simple cells according to their mean firing rate or the modulation of firing rate.
Effect of contrast on spatiotemporal tuning of V1 neurons
Reducing the contrast of the gratings from 32 to 8% affected both the response amplitudes and the spatiotemporal tuning of V1 neurons, as reported in previous studies of V1 (Albrecht, 1995; Carandini et al., 1997) and MT (Priebe et al., 2003). In our sample of V1 neurons, the peak responses for low-contrast gratings were reduced to 42% of those for high-contrast gratings. Plotting the speed-tuning index at low and high contrast for V1 neurons shows that reducing contrast lowered the value of ξ in 11 of 17 simple cells (Fig. 3B, top graph, gray symbols) and 18 of 23 complex cells (black symbols). The effect of contrast on the value of ξ was individually significant in only 57% of the complex cells, but the mean value of ξ underwent a statistically significant reduction from 0.42 for high-contrast gratings to 0.10 for low-contrast gratings (paired t test, p < 0.01). The effect of contrast on the value of ξ for simple cells was not statistically significant (mean changed from 0.05 to −0.03; t test, p = 0.28). Reductions in contrast had the same effect on V1 complex cells (Fig. 3B, top graph, black symbols) as in MT neurons [Fig. 3B, bottom graph (data from Priebe et al., 2003)]. Thus, the responses of both V1 complex cells and MT neurons are well described by separable tuning when grating contrast is low and move toward more speed-like tuning when grating contrast is high.
Our analysis has emphasized the effect of contrast on the interaction of sensitivities to spatial and temporal frequency. When we analyzed our data along either of these axes alone, the effects of contrast on preferred spatial and temporal frequency agreed with data from previous studies (Holub and Morton-Gibson, 1981; Albrecht, 1995; Carandini et al., 1997). At the preferred temporal frequency, reducing contrast caused the preferred spatial frequency of V1 neurons to decrease by an average of 12%. At the preferred spatial frequency, reducing contrast caused the preferred temporal frequency to decrease by an average of 14%. When the spatial frequency of the stimulus was one octave above the preferred spatial frequency, reducing contrast caused the preferred temporal frequency to decrease by an average of 23% across our sample of 25 neurons.
Contrast gain in different quadrants of spatiotemporal tuning surfaces
We showed in the previous sections that changing the contrast of moving gratings changes the shape of spatiotemporal response surfaces of V1 neurons. One can imagine how this might occur if the contrast response function were different for different combinations of spatial and temporal frequency. For example, a separable spatiotemporal response field at low contrast would become a speed-tuned field at high contrast if the contrast response function were steepest in the northeast and southwest corners of the field, for combinations of high spatial and temporal frequencies and low spatial and temporal frequencies. The same result would occur if the contrast response function were less steep, or saturated at lower contrasts, for high–low and low–high combinations of spatial and temporal frequency (in the northwest and southeast corners of the response field). In Figures 4 and 5, we evaluate this possibility and ask which of the possible changes in contrast gain contribute most to the increase in speed selectivity we observe at high contrast.
For the three example complex cells in Figure 4, we created spatiotemporal response surfaces for stimuli of high contrast (top row) and low contrast (middle row). We then estimated contrast “gain” by plotting the ratio of the contour plots for each neuron, yielding the graphs in the bottom row of Figure 4. If the shape of a spatiotemporal tuning surface were invariant across contrast (i.e., the contrast response function were uniform across the surface), then the ratio contour plots in the bottom row of Figure 4 would be uniform. If the effect of contrast depended only on temporal frequency, as in the retina and V1 simple cells (Carandini et al., 1997), then the ratio contour plots would show horizontal bands. In fact, the ratio contour plots are as we outlined in the previous paragraph. In all three neurons, the largest effects of contrast (black and darker gray zones) occurred in the northeast and southwest portions of the contour plots, in which spatial and temporal frequency were either both high or both low. For two of the three, the strongest effect was in the northeast, in which spatial and temporal frequency were high. The region near the preferred spatial and temporal frequencies of the neurons, in the middle of the plots, always showed the smallest effect of contrast (white and light gray zones), perhaps because response saturation limited the effect of contrast on the responses to gratings near the preferred spatial and temporal frequencies.
We quantified the contrast gain for all 22 complex cells in our sample. First, we found the preferred spatial and temporal frequency for low-contrast sine-wave gratings and divided the response field into four quadrants centered on the optimal spatiotemporal frequency, as shown by the horizontal and vertical dashed lines in the ratio contour plots of Figure 4. Then, we computed the mean contrast gain in each quadrant, normalized the values by the mean contrast gain across the entire response field, and summarized the distributions across our full sample of complex cells in the histograms of Figure 5. In the northeast and southwest quadrants, representing high–high and low–low combinations of spatial and temporal frequency, the normalized contrast gain tended to be greater than one: the geometric means (arrows) were 1.47 and 1.08. In the southeast and northwest quadrants, representing high–low and low–high combinations of spatial and temporal frequency, the normalized contrast gain tended to be somewhat <1: the geometric means were 0.71 and 0.88.
We performed this analysis only for complex cells because the degree of tilt of the spatiotemporal responses surfaces, and therefore the speed tuning, of simple cells was not sensitive to the contrast of the grating.
Some spatiotemporal variation of contrast gain in V1 complex cells is certain given that increases in contrast changed the spatiotemporal responses from separable toward speed tuned. Figures 4 and 5 show that the most important component of this effect is enhanced contrast gain for stimuli whose spatial and temporal frequencies are higher than optimal. We have chosen to describe the effect this way because it suggests a plausible mechanism for achieving speed-tuned responses at high contrast, a point we will return to in Discussion.
Linear combination of responses to multiple spatial frequencies
In the preceding sections, we reported good agreement between the responses of V1 complex cells and MT neurons to single sine-wave gratings of low and high contrast. In our previous study of MT neurons (Priebe et al., 2003), we also found evidence for a nonlinearity in combining responses across spatial frequencies. We now report the results of two experiments that explore the responses of neurons to stimuli of more than one spatial frequency, to ask whether the nonlinearity found in MT also is present in the responses of V1 complex cells.
In the first set of experiments, we constructed stimuli consisting of two gratings of different spatial frequencies. The two gratings moved at the same speed, and different stimuli presented different speeds that varied over the full range that elicited responses from the neuron under study. Recall that, in many V1 complex cells, preferred speed depends on spatial frequency and is larger or smaller when tested with single sine-wave gratings of spatial frequencies below or above the preferred spatial frequency. For each neuron, we selected two pairs of spatial frequencies that would, had they interacted linearly, yield quite different preferred speeds. Choice of spatial frequency pairs represented a compromise. They had to be within the part of the spatiotemporal response field that evoked reasonable sized responses, but, at the same time, the two pairs had to be far enough above or below the preferred spatial frequency so that the preferred speeds differed as much as possible.
The spatiotemporal surface in Figure 6A shows the responses of a complex cell whose value of ξ was 0.23 when grating contrast was 32%. As a result, the preferred speed depended strongly on the spatial frequency of the stimulus. The preferred speeds were ∼5 and 3°/s when the spatial frequency of a single grating was 0.75 and 1.5 cycles/° (Fig. 6B, top graph); preferred speeds were 2 and 0.7°/s when spatial frequency was 3 and 6 cycles/° (Fig. 6C, top graph). Thus, a linear interaction predicts that dual-grating stimuli composed of 0.75 and 1.5 cycles/° gratings should have a preferred speed between 3 and 6°/s, whereas stimuli composed of 3 and 6 cycles/° gratings should have a lower preferred speed between 0.7 and 2°/s.
The prediction of the linear interaction is formalized by the filled symbols in the bottom graphs of Figure 6, B and C, which were obtained by summing the responses to each of the component gratings singly for each speed of each dual-grating stimulus. Comparison of these predictions with the data show that the speed-tuning curve of the example V1 complex cell for dual-gratings stimuli (open symbols) had approximately the shape predicted by summing the responses to each grating singly (filled symbols), with the same preferred speeds and peak responses of nearly the same amplitude. The same result was obtained across a sample of 16 V1 complex cells, as illustrated in the scatter plot of Figure 6D, in which points relating the actual to predicted preferred speed plotted very close to the unity line; each cell contributes two points to this plot, one for frequency combinations below optimum spatial frequency (filled symbols) and one for combinations above (open symbols).
In area MT, Priebe et al. (2003) reported a different result for the same experiment and analysis. As found in V1, the actual preferred speeds for dual-grating stimuli agreed well with the prediction based on linear summation when the two spatial frequencies in the dual-grating stimulus were higher than the preferred spatial frequency of the neuron (Fig. 6E, open symbols). In MT but not V1, however, there was consistent disagreement between actual preferred speeds and those predicted by linear summation of the responses to the component gratings when the two spatial frequencies in the dual-grating stimulus were lower than the preferred spatial frequency of the neuron (Fig. 6E, filled symbols): for most MT neurons, the actual preferred speed for the dual-grating stimulus was lower than that predicted by summing the responses to the two gratings presented singly. As explained in our previous publication (Priebe et al., 2003), this shift caused the preferred speeds for dual-grating stimuli to depend less strongly on the spatial frequency content of the stimuli than did those for single-grating stimuli.
In the second set of experiments, we compared the speed-tuning functions of V1 complex cells for moving random dot stimuli, which comprise multiple spatial frequencies, with those predicted by combining the responses to sine-wave gratings of different spatial and temporal frequencies. In MT neurons, a narrowing of the speed-tuning function for random dot stimuli provided additional evidence for a nonlinear combination of responses to different spatial frequencies (Priebe et al., 2003). Our finding in V1 of a linear interaction between spatial frequencies in the dual-grating experiments suggests that the width of the speed-tuning function for moving random dot stimuli should agree with that predicted by addition of the responses to single sine-wave gratings.
To estimate the preferred speed and tuning width for sine-wave gratings, we summed the responses to high-contrast gratings along each iso-speed line in the full spatiotemporal tuning surface, yielding a curve that relates predicted response to stimulus speed (Fig. 7A, gray symbols and curve). To measure the preferred speed and tuning width for dot textures, we measured the mean sustained firing rate for speeds from 1 to 128°/s in the preferred direction of the neuron under study (Fig. 7A, black symbols and curve). Both speed-tuning functions were fitted with Equation 2 to quantify preferred speed and tuning width.
In V1 direction-selective neurons, the tuning bandwidth derived from the spatiotemporal response surfaces agreed well with that for moving dot textures: the majority of neurons plot close to the unity line (16 of 20), although a few outliers (4 of 20) plot well below the line (Fig. 7B). For the MT neurons recorded by Priebe et al. (2003), the same analysis revealed that the tuning widths predicted from the responses to sine-wave gratings were consistently different from those obtained with random dot stimuli. The majority of MT neurons plot below the unity line (Fig. 7C), indicating that the tuning width for dot textures was consistently narrower than predicted from the responses to sine-wave gratings. Note that the data plotted in Figure 7C differ in detail from those in our previous publication on MT (Priebe et al., 2003). In the previous paper, we predicted the preferred speed by averaging (rather than summing) responses along the iso-speed lines in the spatiotemporal response surface, obtaining the same average result: compared with the actual tuning, the predicted tuning of MT neurons averaged 1.40 versus 1.38 times broader for the summation and averaging analyses, respectively. Thus, the presence of multiple spatial frequencies in the dot textures tended to reduce the width of the speed-tuning function in most MT neurons, but only infrequently in V1 complex cells. In the foregoing analysis, we attempted to predict the responses to dot textures from the responses to sine-wave gratings of 32% contrast. We obtained the same results for 12 neurons on which we had enough data to predict the responses to moving dot textures from the responses to gratings of 8% contrast.
The foregoing comparison of predicted and actual tuning function depends on the assumption that neural firing would reflect a linear addition of different inputs to a given neuron, an assumption that is surely contradicted by the nonlinear contrast response functions of visual neurons. However, the nonlinearity should cause errors primarily in the predicted amplitude of responses, not in bandwidth or speed tuning. Therefore, we think that we have made valid comparisons between actual and predicted tuning parameters, and we have chosen not to make potentially suspect comparisons between predicted and actual response amplitudes. In addition, we have been careful to use the same stimuli and apply the same analysis procedures to data from V1 and MT, enabling a fair comparison of the two areas even if the nonlinear contrast response functions, or any other minor issues, introduce small errors into the comparisons.
Comparison of tuning parameters in V1 and MT
The data accumulated in this and our previous paper (Priebe et al., 2003) allow us to make a direct comparison of the responses of neurons in V1 and MT to the same set of stimuli, namely sine-wave gratings of 32% contrast. As shown in Figure 8, the preferred speeds were similar in the present set of V1 simple and complex cells (Fig. 8A, open and filled bars) and in our previous set of MT neurons (Fig. 8B). The mean preferred speed found in MT (7.52°/s, geometric mean) is faster than found V1 (4.47°/s, geometric mean; unpaired t test, p < 0.02), but the range of preferred speeds found in the two areas is primarily overlapping (V1, 0.3–43°/s; MT, 0.4–80°/s). The distributions of spatial frequency bandwidth also overlapped, but that spatial frequency bandwidth was narrower in V1 than in MT (2.19 and 2.49 octaves; unpaired t test, p < 0.01). The distributions of temporal resolution in V1 and MT were not significantly different (p = 0.12). Finally, the tuning bandwidths of our sample of V1 complex cells (1.2–4.0) agreed with those reported by Movshon and Newsome (1996) for complex cells in V1 that were identified as projecting to area MT.
Discussion
The representation of speed at different levels of visual processing
Primates can reliably sense object speed (Gegenfurtner et al., 2003). However, peripheral visual neurons are not explicitly tuned for speed, and their sensitivity to sine-wave gratings is said to be separable in the sense that it depends on the product of two functions, one of spatial and one of temporal frequency. Neurons with separable tuning will respond selectively to certain ranges of speed, but they are not speed tuned in that they do not respond invariantly to the speed of moving stimuli (Lennie and Movshon, 2005). Instead, their speed tuning depends on the spatial structure of the moving stimulus. This raises the question of how target speed can be reflected accurately in perceptual and motor behavior. Are visual signals transformed into veridical representations of speed as they proceed through the levels of visual motion processing? If so, how? The present paper allows us to approach these questions by completing a description of the spatiotemporal tuning surfaces across the levels of visual processing in the geniculostriate pathways leading up to area MT.
Figure 9 summarizes our current knowledge of spatiotemporal tuning in the pathways from the retina to MT. Photoreceptors are broadly but separably tuned for spatial and temporal frequency (Fig. 9A). In retinal ganglion cells and LGN neurons, the spatiotemporal frequency surface takes a nonseparable but anti-speed-tuned form (Fig. 9B) (Enroth-Cugell et al., 1983; Hicks et al., 1983; Derrington and Lennie, 1984; Frishman et al., 1987). Here, we have shown that spatiotemporal responses are separable in direction-selective V1 simple cells (compare Figs. 9C, 2A–D), without any sign of special selectivity for speed (Foster et al., 1985; Baker, 1990; McLean and Palmer, 1994; Hawken et al., 1996). In direction-selective V1 complex cells, however, spatiotemporal tuning surfaces are usually “tilted” to the right, and a sizeable minority of cells is speed tuned in the sense that they have (statistically) the same preferred speed for gratings of all spatial frequencies that evoke responses. Figure 9D diagrams the tuning surface for an imaginary MT neuron or V1 complex cell that lies halfway between perfect speed-tuning and perfectly separable responses.
Comparison of the data in the present paper with the data of Priebe et al. (2003) reveals that the spatiotemporal tuning surfaces of MT cells show the same degree and distribution of tilt as V1 complex cells, with a similar-sized minority tuned for speed. Thus, the blend of speed-tuned versus separable responses in MT (and V2) (compare with Levitt et al., 1994) could be inherited from complex cells in V1. Although we have not established that the complex cells recorded in our study project to MT directly, it is known that MT neurons receive direct input from direction-selective complex cells from V1 (Movshon and Newsome, 1996). We also analyzed the spatial and temporal frequency tuning of a few V1 neurons identified by Movshon and Newsome as MT projection neurons and found a similar degree of spatial and temporal frequency dependence as in our database of neurons (supplemental Fig. 1, available at www.jneurosci.org as supplemental material).
Possible mechanisms for transforming spatiotemporal tuning from retina to MT
Knowing how spatial frequency, temporal frequency, and speed are represented at each level of the visual motion processing pathways (Fig. 9) is an essential first step that allows us to form hypotheses about how signals are transformed at each stage as signals pass from the retina to MT (De Valois and De Valois, 1990). For example, we can think of the transformations between the photoreceptors and LGN neurons primarily as a series of spatial filters, in which the conversion to center-surround spatial organization attenuates responses to low spatial frequencies (Enroth-Cugell and Robson, 1966; Enroth-Cugell et al., 1983; Shapley and Lennie, 1985). Additional high-pass filtering of the outputs of the LGN attenuates responses at low spatial frequencies still further to create separable bandpass tuning in V1 simple cells at the same time as direction-selectivity emerges (Derrington and Lennie, 1984; Hawken et al., 1996; O’Keefe et al., 1998). Separable spatiotemporal tuning is also evident in V1 complex cells at low contrast, but, at high contrast, the tuning surface tilts to approximate speed tuning. MT cells behave similarly (Priebe et al., 2003).
The effect of contrast on the spatiotemporal tuning surfaces suggests a mechanism for transforming the separable responses of V1 simple cells into the speed-tuned responses of V1 complex cells and MT neurons: spatiotemporal frequency-selective modulation of contrast gain. Contrast gain is highest in the northeast corner of the spatiotemporal response space of V1 complex cells, and responses increase more steeply as a function of contrast in this region than in the rest of the space. To a lesser degree, the same trend is evident in the southwest corner. Larger increases in response amplitude in the northeast and southwest corners cause the spatiotemporal response field to tilt at high contrast, as we observed in many direction-selective V1 complex cells. The effect would also be enhanced if contrast gain were reduced in the northwest and southeast corners of the spatiotemporal response field. Our data suggest this also, but the most potent effect is seen in the northeast, for stimuli of high spatial and temporal frequency.
Differences in visual motion processing in V1 and MT
V1 complex cells and MT cells represent speed similarly when the stimuli are single sine-wave gratings. We conclude, therefore, that MT neurons inherit their spatiotemporal response properties from earlier levels of neural processing, almost certainly from V1. The inheritance could be direct or indirect: there are some inputs to MT from the LGN (Sincich et al., 2004), but it seems implausible that a major transformation in MT would reconstruct direction and speed selectivity from the nonselective LGN inputs. Furthermore, almost all of the inputs to MT arise from either V1 or other nonstriate areas that themselves receive abundant inputs from V1 (Felleman and Van Essen, 1991). Although the issue is not settled, recent evidence suggests that the inputs from other extrastriate areas either are relatively unimportant for the visual response of MT or depend themselves on input from V1 (Collins et al., 2003). However, our conclusion that MT inherits its spatiotemporal responses from V1 leaves open a key question: if speed tuning per se arises at an earlier level of the visual system, what additional processing is done beyond V1, in either MT or other visual areas that provide inputs to MT?
Natural stimuli contain multiple spatial frequencies, and we therefore explored how V1 and MT cells process stimuli containing multiple frequencies. Two observations suggest that a nonlinear interaction among different spatial frequencies occurs after V1, possibly in MT. (1) When the stimulus comprised two gratings that were either both below or both above the optimal spatial frequency of the neuron, the preferred speeds of MT neurons were much more similar than predicted from simple summation of the responses the individual gratings (Priebe et al., 2003). Because the same nonlinearity was not evident in the responses of V1 complex cells, we suggest that this effect reflects processing in MT. (2) When the stimulus was a moving texture of random dots, which contains many spatial frequencies, the tuning bandwidths of MT neurons were consistently narrower than predicted by summing the responses to the relevant sine-wave gratings. Again, the same nonlinear combination of responses across spatial frequencies was not evident in the responses of V1 complex cells, in which the tuning bandwidth for dots was predicted well by summing the responses across spatial frequencies. We conclude that the tilted spatiotemporal response surfaces in MT are enhanced by a nonlinear combination of inputs sensitive to the conjoint presence of multiple spatial components of a common speed. Because V1 provides the largest input to MT, we think the nonlinear processing occurs in MT, although it could occur in numerous nonprimary visual areas that provide inputs to MT.
The nonlinear combination of inputs reflected in the responses of MT neurons suggests a rationale for creating tilted spatiotemporal receptive fields at all. A population of cells with separable tuning functions can represent speed, but, if each of those cells computes a selective combination of the inputs that represent the components of a moving object, then the resulting population response will be more accurate and more robust to noise in the visual environment.
In summary, many aspects of motion processing are evident in the responses of V1 neurons, but not all MT neuron response properties can be inherited from V1. In particular, MT neurons have much larger receptive fields those found in V1, suggesting that MT neurons integrate the responses of many V1 neurons with different spatial receptive fields. Perhaps in association with this spatial integration, processing within MT creates nonlinear interactions between responses to different combinations of spatial and temporal frequencies. These interactions may allow MT to refine the representation of motion for real-world objects to create outputs that are specialized for the needs of downstream motor and perceptual systems.
Footnotes
-
This work was supported by the Howard Hughes Medical Institute and National Institutes of Health Grants EY02017 (J.A.M.), EY03878 (S.G.L.), and EY014499 (N.J.P.). We are grateful to Wyeth Bair, Carlos Cassanello, Anne Churchland, Mark Churchland, Adam Kohn, Leslie Osborne, Nicole Rust, and David Schoppik for their help in data collection. Karen MacLeod and Elizabeth Montgomery provided assistance with animal preparation and maintenance. Scott Ruffner wrote the stimulus presentation software used in the experiments at University of California, San Francisco. We thank Jessica Hanover for helpful discussions and comments.
- Correspondence should be addressed to Nicholas Priebe at his present address: Department of Neurobiology and Physiology, Northwestern University, 2145 North Sheridan Drive, Evanston, IL 60208. Email: nico{at}northwestern.edu