The Journal of Neuroscience, January 18, 2006, 26(3):893-907; doi:10.1523/JNEUROSCI.3226-05.2006
Previous Article | Next Article 
Behavioral/Systems/Cognitive
Spatiotemporal Structure of Nonlinear Subunits in Macaque Visual Cortex
Christopher C. Pack,1
Bevil R. Conway,2
Richard T. Born,2 and
Margaret S. Livingstone2
1Montreal Neurological Institute, McGill University School of Medicine, Montreal, Quebec, Canada, H3A 2B4, and 2Department of Neurobiology, Harvard Medical School, Boston, Massachusetts 02115
 |
Abstract
|
|---|
The primate visual system is arranged hierarchically, starting from the retina and continuing through a series of extrastriate visual areas. Selectivity for motion is first found in individual neurons in the primate visual cortex (V1), in which many simple cells respond selectively to the direction and speed of moving stimuli. Beyond simple cells, most studies of direction selectivity have focused on either V1 complex cells or neurons in the middle temporal area (MT/V5). To understand how visual information is transferred along this pathway, we have studied all three types of neurons, using a reverse correlation procedure to obtain high spatial and temporal resolution maps of activity for different motion stimuli. Most complex and MT cells showed strong second-order interactions, indicating that they were tuned for particular displacements of an apparent motion stimulus. The spatiotemporal structure of these interactions showed a high degree of similarity between the populations of V1 complex cells and MT cells, in terms of the spatiotemporal limits and preferences for motion and their two-dimensional spatial structure. Much of the structure in the V1 and MT second-order kernels could be accounted for on the basis of the first-order responses of V1 simple cells, under the assumption of a Reichardt or motion-energy type of computation.
Key words: cortex; MT; striate cortex; vision; visual; computation
 |
Introduction
|
|---|
Neurons in the primary visual cortex (V1) encode many aspects of the visual input, including shape, color, depth, motion, and spatial position. In contrast, neurons in the many extrastriate areas are less tuned for stimulus position and more tuned for other visual features. In particular, neurons in the middle temporal area (MT or V5) are tuned for the direction of stimulus motion, primarily as a consequence of a strong projection from direction-selective V1 neurons (Maunsell and van Essen, 1983
; Movshon and Newsome, 1996
). As a result, comparisons of motion selectivity between V1 and MT are useful for understanding how the visual cortex, and perhaps the cortex as a whole, processes information.
Neurons in MT have receptive field diameters that are
10 times those found in V1 (Gattass and Gross, 1981
), raising the possibility that MT neurons can measure motion over a greater spatial range than V1 neurons (Mikami et al., 1986
). Alternatively, MT neurons may simply sum the outputs of V1 neurons that have different receptive field locations but common direction preferences. In this case, the spatial aspects of motion selectivity would be determined by V1, and one would not necessarily expect to find substantial differences between the two areas.
One general approach to understanding the processing of visual information is to measure the inputoutput relationships of individual neurons. For any given stimulus, the input corresponds to the time-varying distribution of luminances reaching the retina, and the output corresponds to the spiking activity of the neuron under study. The luminance distribution comprises the first-order statistics of the input: it can be captured by a composite of individual measurements made at specific points in space and time. In contrast, motion is a second-order feature of the input: it requires a conjunction of measurements at two points separated by some distance in space and time. The ratio of the separations in two dimensions of space and one dimension of time corresponds to the stimulus velocity, and such a second-order measurement is a common feature of motion-processing models (Adelson and Bergen, 1985
; van Santen and Sperling, 1985
).
In this paper, we report measurements of the second-order kernels of MT neurons and V1 complex cells and compare them with first-order maps from V1 simple cells. The kernels capture the inputoutput relationship between motion stimuli and spiking activity, for a range of stimulus velocities. Quantification of the kernels suggests a high degree of similarity between the processing of motion in V1 and MT, in terms of both the preferences and limits of spatiotemporal selectivity. In MT, the same basic kernel structure is repeated across individual spatial receptive fields, in a manner that is predictable from a simple summation of V1 complex cell activity. Moreover, many of the spatiotemporal aspects of the observed second-order responses can be predicted from the first-order properties of V1 simple cells, as has been shown previously for V1 complex cells (Rybicki et al., 1972
; Movshon et al., 1978b
; Baker and Cynader, 1986
; Emerson et al., 1987
; Livingstone and Conway, 2003
).
 |
Materials and Methods
|
|---|
Experimental procedures
Electrophysiology. Monkeys were prepared for chronic recording from V1 and MT, as described previously (Livingstone, 1998
; Born et al., 2000
). All procedures were approved by the Harvard Medical Area Standing Committee on Animals.
We recorded from single units in V1 and MT of five alert rhesus macaque monkeys while they performed a simple fixation task. The monkeys fixated a small spot and were rewarded for keeping their gaze within a fixation window that was 2° in diameter Each single unit was isolated by spike height and waveform.
Visual stimuli. Neurons were screened with a bar stimulus. The bar luminance was 39 cd/m2 on a gray background (20 cd/m2). Once a neuron was isolated properly, we obtained a direction-tuning curve and, in some cases, a speed-tuning curve. Cells that responded at least twice as strongly in one direction as in the opposite direction were considered direction selective. This criterion included every MT cell that could be well isolated but excluded
75% of V1 neurons (De Valois et al., 1982
; Foster et al., 1985
). The direction-tuning curve was calculated from 5 to 15 1-s sweeps of the stimulus in each of 20 directions spaced evenly around the circle. For MT neurons, tuning curves were obtained with random-dot fields sized to the classical receptive field of the neuron under study. The size of the individual dots was equal to that of the spots used in the noise mapping experiments (side length, 0.200.25°). For V1 cells, direction tuning was assessed with either a dot field or a long bar of optimal length and contrast polarity that was swept in a direction perpendicular to its orientation. Direction-tuning curves were obtained by averaging the firing rate over the period of stimulus presentation, which was 1000 ms. The mean bandwidth of direction tuning (measured by fitting a circular Gaussian to the direction-tuning curves) was 94.1 ± 39.5° (SD) in MT. In V1, the mean bandwidth was 99.4 ± 51.5° for cells measured with random-dot fields (n = 54) and 65.9 ± 29.8° for cells measured with long bars (n = 59). Thus, the bandwidth depended in part on the stimulus used to measure direction selectivity (Albright, 1984
).
The sparse noise stimuli were identical to those used previously (Livingstone et al., 2001
). All cells were studied with two-dimensional whitenoise stimuli, consisting of pairs of spots flashed at 60 Hz. We used white and black stimuli that were 19 cd/m2 above and below the mean background gray luminance of 20 cd/m2. The dots changed position at random from frame to frame, within a square stimulus range that was at least 2.0 x 2.0°. When black and white stimuli overlapped, the resultant stimulus was the same gray as the background. To achieve maximum spatial resolution, we used the smallest black and white spots that elicited reliable spiking activity from the neuron under study. For MT neurons, this was almost always 0.25 x 0.25°, whereas in V1 neurons, we were usually able to use spots that were 0.20 x 0.20°.
Analysis
Measurement of kernels. The correlation analysis used here was identical to the analysis used in previous studies (Livingstone et al., 2001
). Briefly, a computer recorded the evoked spike train (1 ms resolution), each stimulus position, and the monkey's eye position (4 ms resolution). For each map, between 5000 and 50,000 spikes were collected over a 2060 min period. The kernels were then calculated by correlating the spike train with the stimulus sequence. Before the kernel computation, the positions of the stimuli on each frame were corrected to account for small drifts in the monkeys' eye position.
In this paper, we present our results in terms of the "forward correlation" between the stimulus and the spike train. This was computed by smoothing the spike train with a 17 ms boxcar filter and counting the average number of spikes that followed each stimulus at a given correlation delay. Computed in this manner, the forward correlation is equivalent (up to a scaling factor) to the more standard "reverse correlation" procedure (Marmarelis and Marmarelis, 1978
; Baker, 2001
; Ringach et al., 2003
). The advantage of the forward correlation is that it can be interpreted as the average number of spikes per stimulus presentation, which makes it analogous to the standard poststimulus histogram. Second-order kernels were computed for MT cells and complex V1 cells, and first-order kernels were computed for V1 simple cells.
First-order kernels. The first-order portion of the neuronal response to a spatiotemporal stimulus s(x,y,t) is given by the following convolution integral:
where
is the delay between the stimulus and the response. When the stimulus is random, the kernel h1 (x,y,
) can be obtained by cross-correlating the response R1(t) with the stimulus sequence s(x,y,t
) (Marmarelis and Marmarelis, 1978
), where
is the chosen correlation delay. In our analysis, this was accomplished by computing separate correlations for the white and black stimulus sequences. For a linear system, stimuli that consist of positive and negative deviations from mean luminance evoke opposite responses, so the correlations with the white and black stimuli represent measurements of h1(x,y,
) and h1(x,y,
). The final estimate of the kernel h1(x,y,
) is obtained by subtracting the correlation with the black stimulus from the correlation with the white stimulus and dividing by two (Emerson et al., 1987
).
Second-order kernels. The second-order kernel specifies the response of the neuron to two spot stimuli located at spatial positions (x1,y1) and (x2,y2) and occurring
1 and
2 time units before time t. The stimulus-dependent portion of the second-order response to an arbitrary stimulus s(x,y,t) is given by the following convolution integral (Marmarelis and Marmarelis, 1978
):
where h2 is the second-order kernel. The integration limits have been suppressed for clarity here and in subsequent equations.
As with the first-order kernels, the second-order kernels were estimated by cross-correlation. However, in this case, the relevant stimulus quantity is the contrast polarity of the product of pairs of stimuli separated in space and time. Thus, the displacement of a white spot on one frame to a black spot on the second frame constituted a negative stimulus, and a white-to-white sequence constituted a positive stimulus. In practice the kernel was estimated by correlating the spike train with the four possible sequences that occurred between any two frames (whitewhite, whiteblack, blackblack, and blackwhite). The complete kernels were then calculated by summing the same-contrast individual maps (white-to-white and black-to-black) and subtracting the opposite-contrast maps (white-to-black and black-to-white), as described by Livingstone et al. (2001
). The resulting kernel is similar to a second-order Wiener-like calculation (Emerson et al., 1987
). An important advantage of this procedure is that the subtraction cancels any terms attributable to responses to individual white or black stimuli. This is important, because direction-selective complex cells and MT cells are both driven strongly by flashed stimuli (Mikami et al., 1986
), which do not by themselves contain information about velocity. Our second-order measurement is essentially uncontaminated by such on-diagonal and lower-order influences (Emerson et al., 1987
).
The result of the second-order kernel computation was a six-dimensional function h2(x1,y1,x2,y2,
1,
2) describing the response of the cell to all stimulus displacements. Previous work has shown that such kernels change very little with absolute spatial position in V1 (Movshon et al., 1978b
; Baker and Cynader, 1986
; Emerson et al., 1987
; Livingstone and Conway, 2003
) and MT (Livingstone et al., 2001
). Consequently, we focused on the responses as a function of the relative positions of the stimuli. This enabled us to simplify the kernel by substituting x2 = x1 +
x and y2 = y1 +
y and integrating over space to get the following:
Averaging over space in this manner improves the quality of the kernels substantially, without sacrificing information about their structure (Emerson et al., 1987
).
We further simplify the terminology by defining the temporal separation between stimuli as
2 =
1 + 
. This gives us the interaction function (Gaska et al., 1994
), as follows:
The interaction function I describes the second-order kernel as a function of the separation between two stimuli in space (
x,
y) and time (
). The time course of the impulse response is measured from the time of occurrence of the second stimulus (
2).
In this paper, we describe the second-order kernels in terms of displacement maps, each of which corresponds to the response of a neuron to all stimulus displacements (
x,
y) at a fixed interstimulus interval (
) and correlation delay (
2). These maps generally contain both positive and negative regions, which we will refer to as "facilitatory" and "suppressive," respectively. These terms make sense only in the context of the kernel formulation and are not meant to imply specific synaptic mechanisms. In this way, the analysis is quite similar to the measurement of the first-order kernels of V1 simple cells described above (Movshon et al., 1978a
; Jones and Palmer, 1987
), in which one relates spiking activity to the sign of contrast of a single stimulus. In those studies, the response to a dark stimulus was subtracted from the response to a bright stimulus to estimate the "inhibition" attributable to the dark stimulus. In the case of the second-order kernel, we are interested in the sign of contrast of the product of two stimuli, so we subtract opposite-contrast sequences from same-contrast sequences.
Figure 1a shows an interaction function for a single MT cell. Each panel contains a pseudocolor displacement map depicting the response of the neuron for a particular stimulus displacement (
x,
y), with the position of the second spot in any two-spot stimulus sequence being at the origin (0,0). Each column corresponds to successive time points (
2) after the displacement of the stimulus (i.e., response latency), and each row corresponds to a different temporal separation (
) between stimuli. In all maps, red indicates facilitation and blue indicates suppression. Thus, the red and blue regions in the maps demonstrate that the response of the cell was facilitated by a rightward apparent motion sequence and suppressed by leftward motion. The scale bar on the left of the figure indicates that the depth of this modulation was 0.12 spikes per stimulus presentation. Thus, on average, for every 100 presentations of the noise stimulus, the most effective stimulus led to 12 more spikes for same-contrast sequences than for opposite-contrast sequences. The response was strongest for small temporal separations and peaked at a poststimulus time of
65 ms.
To quantify the time course of the responses shown in Figure 1a, we computed the variance of each displacement map at 1 ms intervals. By using the variance, we take into account both positive (facilitatory) and negative (suppressive) deviations from the baseline activity, which is near zero. For each row in Figure 1a, the variance as a function of time is as follows:
where N is the number of pixels in each map, and Im is the mean of all the pixels. Figure 1b shows the time course of the response for the cell in Figure 1a, for 
values of one frame (solid line), two frames (dashed line), and three frames (dotted line). When 
is equal to one frame (
17 ms), there is a strong response that begins at
45 ms and peaks at 64 ms. Larger values of 
lead to smaller responses. This cell was typical in that the response was strongest at the shortest temporal separation 
, and the spatial structure was stable across the correlation delay
2.
For correlation delays (
2) of <30 ms, the responses of neurons in V1 and MT are unrelated to the stimulus sequence, so we can use this portion of the data to estimate the noise in the recording. The noise was therefore estimated as the mean of V during the first 30 ms of the response, and subsequent responses were considered statistically significant if they exceeded this level by 3 SDs. This significance threshold is shown as the horizontal dashed line in Figure 1b. The individual pixels that exceeded baseline by ±3 SDs are outlined by the dotted contours in Figure 1a.

View larger version (41K):
[in this window]
[in a new window]
|
Figure 1. a, Example interaction function I( x, y,  , 2). Each square contains a displacement map, which codes the response as a function of the spatial separation between two stimuli ( x, y). The position of the second stimulus in any two-spot sequence is the origin of each map. Positive (facilitatory) responses are coded in orange, and negative (suppressive) responses are coded in blue. The response of the cell in the figure was therefore facilitated by a rightward apparent motion sequence and suppressed by a leftward sequence. Each row indicates a different temporal separation between stimuli ( ), and each column indicates a different correlation delay ( 2). The displacement map outlined in thick black is the peak map, which is defined as the map that has the highest variance. Pixels that differ from baseline by±3 SDs (see Materials and Methods) are outlined by the dotted contour. b, Variance as a function of time for the neuron depicted in a. The solid curve indicates the variance in each displacement map as a function of delay ( 2), when  = 17 ms (1 monitor refresh). The dashed curve shows the same time course for the case in which  = 33 ms (2 refreshes), and the dotted curve corresponds to  = 50 ms (3 refreshes). The vertical dashed line indicates the peak response, which corresponds to the highlighted map in a. The horizontal dashed line is the threshold above which a response is considered significantly above the baseline noise. The vertical arrow indicates the interval over which the baseline noise is measured.
|
|
Curve fitting
For speed-tuning curves, data were averaged over a 1000 ms stimulus presentation. The preferred speed was determined as the peak of a log-Gaussian fit to the tuning curve. The map profiles were fit to Gabor functions of the following form:
where a is the floor, b is the amplitude, x0 is the center,
is the envelope width, f is the frequency, and
is the phase. Function fits were optimized via a least-squares criterion using the LevenbergMarquardt algorithm in Matlab (MathWorks, Natick, MA).
Permutation test
To test the significance of the singular values obtained from the
s/
maps, we performed a permutation test. The order of stimulus frames was shuffled, and the reverse correlation procedure was repeated to generate a new map. The singular value decomposition (SVD) was then performed on this map, and the entire procedure was repeated 100 times. To be considered significantly above the noise, a singular value obtained from the data had to be 2 SDs above the mean of the maps obtained during the shuffling procedure.
Tilt direction index
The degree of tilt in each spatiotemporal map was computed from the discrete Fourier transform of the map. Taking the peak of the transform as the optimal spatial and temporal frequencies (Fs and Ft), the tilt direction index (TDI) was computed as (Rp Rn)/(Rp + Rn), where Rp and Rn are the response amplitudes at (Fs, Ft) and (Fs, Ft) (Anzai et al., 2001
).
Biphasic index
For each neuron, we computed the facilitatory time course P(
2;
x,
y, 
), using the values of
x,
y, and 
that yielded the peak of the interaction function I(
x,
y, 
,
2). Only responses that were significantly above baseline were considered. Some examples of P appear as the red lines in the bottom row of Figure 12a. Similarly, the optimal suppressive response is defined as N(
2;
x,
y, 
), where
x,
y, and 
are the points at which the minimum response is obtained. These are shown as the blue traces in the bottom row of Figure 12a.
As is evident from Figure 12a, the facilitatory and suppressive time courses were generally mirror images of each other, so they were combined in the calculation of the biphasic index. We did this by subtracting the suppressive trace from the facilitatory trace, which is equivalent to flipping the blue traces in Figure 12a about the x-axis and adding them to the red traces. We then integrated the total negative deviations from baseline over time to get a measure of the extent to which the cell reversed direction and normalized this number by the total positive deviations, again obtained by integrating over time. The biphasic index (BI) was thus defined as follows:
where [...]+ is the rectification operation. Thus, a BI of 0 indicates that the response of a neuron never showed suppression in the preferred direction [P(
2)] or facilitation in the null direction [+N(
2)]. A BI near 1 indicated that the neuron fired as many spikes for motion in the null direction as in the preferred direction.
Reichardt/motion energy models
Many models suggest that direction selectivity involves a multiplication or squaring of luminance signals, after linear filtering (Reichardt, 1961
; Adelson and Bergen, 1985
; van Santen and Sperling, 1985
). Both computations can be reduced to a second-order kernel of the type we have computed (Courellis and Marmarelis, 1992
), in which the outputs of linear filters are combined in a nonlinear manner and summed over space. Using the notation introduced previously, this is equivalent to the following:
where k is a scaling factor. This is equivalent to the spatial autocorrelation of the linear filter h1 (Emerson et al., 1992
; Baker, 2001
) evaluated for pairs of time points separated by 
(see Fig. 13).

View larger version (28K):
[in this window]
[in a new window]
|
Figure 2. a, The peak displacement map from the neuron in Figure 1a. The dashed line connects the peak of the facilitatory region to the origin of the map and is used to determine the spatial profile of the response in b. The dashed circle delineates the region over which profiles are measured. The dashed ovals show elliptical Gaussian fits to the facilitatory and suppressive response regions. b, The spatial profile of the map shown in a. Each dot corresponds to the response at one point along the profile. The solid blue curve is the Gabor fit to these points. The dashed vertical line indicates the point at which the spatial displacement ( s) is 0. Different features of the Gabor fit are shown as dotted vertical lines, which indicate, from left to right, the zero-crossing for facilitation, the peak for facilitation, the peak for suppressive, and the zero-crossing for suppression.
|
|
In our experiments, the linear filters were estimated as described above from the responses to V1 simple cells. The autocorrelation was then computed between slices through h1(x,y,
) at two delays separated by 16 ms (see Fig. 13), to facilitate comparison between second-order kernels computed with 
equal to one frame. The peak of this function was then compared with second-order kernels measured in complex cells and MT cells.
Kernel predictions
Predictions based on the first- and second-order kernels were generated by calculating
and
, where T is the total time over which the response was averaged.
 |
Results
|
|---|
We computed second-order kernels for 131 V1 complex cells and 166 MT cells recorded from five awake, fixating macaque monkeys. Eighteen of the V1 cells and 21 of the MT cells did not have responses that were significantly above the noise and so were discarded from additional analysis. The remaining cells all had second-order kernels with clear structure.
Our analysis allowed us to express the behavior of the neurons in terms of a two-spot apparent motion sequence. Each sequence involved the displacement of a spot over a distance (
x,
y) in some time interval (
). The response of the neuron had a time course that we measure from the time of occurrence of the second stimulus (
2). The full response is therefore expressed in terms of the interaction function I(
x,
y, 
,
2). In the following sections, we describe the characteristics of various two-dimensional slices through this interaction function.
Spatial subunit structure: I(
x,
y)
We first examined the spatial structure of the interaction functions by taking all of the responses to spatial displacements (
x,
y) at the optimal 
and
2. Figure 2a shows such a displacement map for the MT cell shown in Figure 1a. The peak response occurred at a latency of
2 = 64 ms and 
= 17 ms (one frame), at which point the map exhibited strong facilitatory and suppressive regions in the
x,
y space. The red regions indicate facilitatory interactions, in which the response to a single spot was facilitated when the immediately preceding spot was to its left and slightly up. This means that the cell preferred rightward motion. Similarly, the blue regions indicate suppression for leftward apparent motion. In keeping with previous terminology (Movshon et al., 1978a
; Emerson et al., 1987
), we will refer to the structure of each displacement map as a subunit.
Taking the peak displacement map allows us to examine the spatial structure of the subunits while ignoring the temporal aspects of the kernels. In subsequent sections, we show that the maps are generally separable in space and time, so this procedure captures most of the spatial features of the subunits. The peak map in Figure 2a was typical in that it showed slightly elongated regions of facilitation and suppression, arranged perpendicular to the axis of elongation (Pack et al., 2003a
). The dashed line in the figure connects the origin of the map with the peak of the facilitatory region. Plotting the value of the map at each point along this line yields the cross-section shown in Figure 2b. The cross-section was well fit by a one-dimensional Gabor function (blue line), with an R2 of 0.97.
All of the cross-sections were 2° in length, as indicated by the circle in Figure 2b. For all of the neurons in the V1 and MT populations, we fit these profiles with Gabor functions (mean R2 = 0.97). The Gabor function captures key aspects of the displacement map, including the preferences and limits of direction selectivity. One of the main goals of this work was to compare these features between V1 and MT.
Dependence on eccentricity
In comparing the spatial aspects of direction selectivity between V1 and MT, it is important to consider the influence of retinal eccentricity (Mikami et al., 1986
). For V1, the receptive fields were typically within 5° of the fovea, although a few had eccentricities >20°. These two clusters of eccentricities corresponded to recordings from the operculum and from the roof of the calcarine sulcus, which were often encountered along the same penetration. For the MT population, eccentricities were sampled more evenly. In all cases, the noise stimulus was centered on the center of the receptive field under study.
Across both cortical regions, there was a consistent relationship between retinal eccentricity and the frequency of the best-fitting Gabor function: more eccentric cells had broader tuning for stimulus displacement and, hence, lower Gabor frequencies. This relationship is plotted for the population of V1 and MT cells in Figure 3. Although there is a great deal of scatter at any given eccentricity, there is clearly a negative slope in the data, and there is no obvious difference between V1 and MT. The relationship between eccentricity and Gabor frequency was highly significant for MT (linear regression, p < 0.01) and for the combined population from both areas (p < 0.0001). The regression could not reliably be performed on V1 because of the non-normality of the distribution of eccentricities, but the trend appears to be similar. Note that this does not imply that spatial frequency preferences for drifting gratings are identical between V1 and MT [in fact, MT as a population prefers lower spatial frequencies than V1 (Priebe et al., 2003
)]. Rather, our stimuli provided a coarse measure of spatial frequency in displacement space and may have missed some of the responses to high spatial frequencies that are known to be present in V1 (Foster et al., 1985
).

View larger version (30K):
[in this window]
[in a new window]
|
Figure 3. Correlations between aspects of the best-fitting Gabor functions and the retinal eccentricities of the neurons. Open circles indicate V1 neurons, and filled circles indicate MT neurons. The dashed lines show the regression lines. a, Correlation between retinal eccentricity and Gabor frequency. b, Correlation between retinal eccentricity and Gabor envelope width.
|
|

View larger version (30K):
[in this window]
[in a new window]
|
Figure 4. a, Displacement maps computed at different eccentricities within the same MT receptive field. The dashed squares show the sizes and positions of random-dot patches used to measure speed tuning at the corresponding receptive field locations. b, Spatial profiles for the foveal (red) and peripheral (blue) displacement maps shown in a. Each dot corresponds to the response at a point along a line through the peak of each map. The solid curves are the best-fitting Gabor functions. c, Speed-tuning curves collected with random-dot fields centered on the foveal (red) and peripheral (blue) receptive field locations. Error bars indicate SD of the mean. d, Shift in frequency of the best-fitting Gabor as a function of change in retinal eccentricity for 37 displacement maps computed within the receptive fields of 17 MT neurons. Each dot corresponds to the difference in frequency for a given change in retinal eccentricity, and the dashed line is the result of a linear regression.
|
|
Somewhat surprisingly, the tendency for the subunits to become coarser at greater eccentricities was not accompanied by an increase in the size of the spatial envelope of the Gabor function fits (Fig. 3b). Although there is a slight upward trend in the data, particularly for V1 neurons, the relationship for the population as a whole did not reach significance (linear regression, p > 0.1). Based on previous work in V1 (Ringach, 2002
), such a correlation would have been expected. However, the difference may be attributable to our inability to sample longer-range interactions that may have been present at large eccentricities. For example, we cannot determine whether the more eccentric profile in Figure 4b remains at zero or contains another subregion beyond 1°. Overall, these results suggest that the effect of stimulus displacement on the responses of V1 and MT neurons is determined in part by the retinal eccentricity of the subunit. Cells at greater eccentricities tolerate larger stimulus jumps, just as V1 neurons at larger eccentricities prefer lower spatial frequencies (De Valois et al., 1982
) and higher velocities (Orban et al., 1986
).

View larger version (24K):
[in this window]
[in a new window]
|
Figure 5. a, Measurements of Dopt, which is the spatial displacement yielding the maximal facilitatory or suppressive response for each neuron. The maximum is defined from the peak (or trough) of the Gabor fit to the spatial profile, as shown in Figure 2 b. The population of V1 cells is shown on the left, and MT cells are on the right. Facilitation is on top, and suppression is on the bottom. b, The x-axis shows, for 94 MT neurons, the peak of a log-Gaussian fit to speed-tuning curves like those shown in Figure 4c. The y-axis shows Dopt for facilitation (filled circles) and suppression (open circles).
|
|
The receptive field sizes of most MT neurons were much greater than the longest apparent motion sequence that elicited a response. Consequently, we were in some cases able to study the relationship between eccentricity and subunit structure within the same MT receptive field, by performing multiple noise mappings at different spatial positions. Figure 4a shows an example of one cell for which additional noise maps were obtained at two different positions, each displaced symmetrically from the center by
3° (the center map has been omitted for clarity). Although the two maps clearly indicate a preference for the same motion direction (downleft), the map taken at the greater retinal eccentricity is coarser. As such, it responds to a broader range of stimulus displacements, and its absolute peak response is shifted toward larger displacements. This can be seen clearly in the cross-sections of the two maps, along with their Gabor function fits (Fig. 4b). As in the population data in Figure 3, the decrease in Gabor frequency is not accompanied by a change in the size of the spatial envelope. Both profiles decay to zero at approximately the same point, and the extra bumps visible in the profile of the more foveal subunit suggest that the frequency is changing in a manner that is essentially independent of the envelope.
For this cell, we obtained separate speed-tuning curves for the two subunits, using random-dot fields centered on the same positions as the noise maps. The dots fields were larger than the stimuli used to generate the maps, but they did not overlap spatially. The resulting curves, shown in Figure 4c, indicate that speed tuning was very similar at the two locations, although the more foveal curve responded more strongly to slower speeds. Similar results of speed tuning at different spatial locations were obtained with five other MT neurons, suggesting that there may be modest changes in preferred speed across individual MT receptive fields (Treue and Andersen, 1996
).
We measured the subunit structure at multiple eccentricities within 17 MT receptive fields. For each cell, we first obtained a map at the center of the receptive field and then obtained one or more maps at other positions within the receptive field. Thus, we had 17 center maps and 37 maps from the receptive field peripheries. For each map, we calculated a corresponding Gabor frequency, defined as Fc for the center maps and Fp(n) for each of the n peripheral maps obtained from a given cell. The value of n ranged from 1 to 4. Thus, a simple way to quantify the effect of eccentricity on Gabor frequency is to compute, for each peripheral map, the ratio
F = log(Fc/Fp(n)). This captures the change in Gabor frequency, which can be related to the difference in retinal eccentricities at which the two maps were obtained:
E = (Ec Ep). The values of (
E,
F) are plotted in Figure 4d.
If the Gabor frequency were constant across the entire receptive field, the frequency difference (y-axis) would be zero, regardless of the eccentricity difference (x-axis). In contrast to this prediction, a linear regression indicates a highly significant correlation between
E and
F (p < 0.0001), with a slope of 0.014. As in the between-cell data shown in Figure 3a, greater eccentricities are associated with lower frequencies. This means that, even within a single MT receptive field, the range of dot displacements to which a cell responds changes in a predictable manner. Note that the lower signal-to-noise ratio near the edges of the receptive field cannot explain this result, because it would introduce similar effects at all eccentricities, leading to a V-shaped plot in Figure 4d.
The result in Figure 4d means that, on average, a 1° change in eccentricity within an MT receptive field is accompanied by a shift in the subunit frequency of
0.014 octaves. By comparison, the slope of the regression line for comparisons made across cells (Fig. 3a) was 0.016. In other words, one encounters a similar change in subunit structure across eccentricities, whether moving across V1 receptive fields, across MT receptive fields, or within MT receptive fields.
Preferred spatial displacement
The cross-sections shown in Figures 2 and 4 capture the preferences and limits of neuronal responses to two-spot apparent motion. As such, it is useful to compare these quantities between V1 and MT to determine to what extent the behavior of MT neurons can be accounted for on the basis of their inputs. For each cell, we computed the preferred dot displacement Dopt as the peak of the Gabor fit to the cross-section through I(
x,
y). This is shown as the vertical dotted line through the peak in Figure 2b. We also computed the optimal spatial displacement for suppression from the Gabor function (Fig. 2b, vertical line through the trough). Histograms of these values are shown in Figure 5a for both V1 (left) and MT (right). The distributions are clearly quite similar, with the means for facilitation being 0.26° (0.11° SD) for V1 and 0.31° (0.13° SD) for MT. This difference was marginally significant (p < 0.06, t test), but the substantial overlap in the two populations is consistent with a simple explanation in terms of a selective projection from V1 to MT. Similarly, the mean values of Dopt for suppression were 0.33° (0.14° SD) in V1 and 0.26° (0.14° SD) in MT. This difference did not reach significance (p > 0.2, t test).
Figure 5b shows the relationship between preferred speed for random-dot fields and Dopt for facilitation and suppression in 94 MT cells. Although no correlation is apparent for suppression, a significant correlation exists between Dopt for facilitation and preferred speed (Spearman's rank correlation, p < 0.05). The correlation is weak (R2 = 0.31), and Dopt clearly underestimates preferred speed for speeds greater than
20°/s. This latter finding can be further appreciated by inspection of Figure 5a (right), which shows very few neurons that prefer values of Dopt beyond
0.5°, which corresponds to a speed of 30°/s.

View larger version (23K):
[in this window]
[in a new window]
|
Figure 6. Measurements of Dmax, which is the maximal spatial displacement to which a neuron is sensitive. This is defined as the zero-crossing of the Gabor fit to the spatial profile shown in Figure 2b. Population histograms for Dmax are shown for V1 (left) and MT (right), for facilitation (top) and suppression (bottom).
|
|
Maximum spatial displacement
To get an idea of the limits of speed tuning, we also measured the maximum spatial displacement Dmax. This was defined as the first zero-crossing of the Gabor function (Baker and Cynader, 1986
) in the preferred direction of each neuron (Fig. 2b, leftmost vertical line).
As with measurements of Dopt, the distributions of Dmax were similar for V1 and MT (Fig. 6). For facilitation, the mean value for V1 was 0.59° (0.28° SD), whereas the mean value for MT was 0.67° (0.31° SD). These differences were again marginally significant (p < 0.05), but the distributions were mostly overlapping. The distributions of Dmax for suppression were nearly identical between V1 (0.49 ± 0.17°) and MT (0.49 ± 0.22°). Together with the measurements of Dopt, these results suggest that there is no systematic difference between the subunit structures found in V1 complex cells and in MT cells. Similar results were obtained with random-dot field stimuli (Churchland et al., 2005
).
Subunit aspect ratio
In addition to the overall sizes of the subunits, we can examine their two-dimensional shape. Nearly all of the subunits consisted of one facilitatory and one suppressive subregion. From inspection of Figure 2a, it is clear that both the facilitatory and suppressive regions are elliptical in shape, with the axis of elongation being perpendicular to the preferred-null direction axis (Pack et al., 2003a
). Because our probe stimuli are not oriented, the elongation of the subunits must reflect the orientation selectivity of the inputs of each neuron.
To study the elongation of the V1 and MT subunits, we fit each displacement map with an elliptical Gaussian function (Fig. 2a, ellipses). For the orientation of the facilitatory region, we truncated the maps at the 1/e contour. Suppressive regions were studied in the same way, after first inverting the positive and negative portions of the map. Gaussian fits were generally excellent, with only 12 MT neurons and 9 V1 neurons being rejected, with R2 values <0.9.
Figure 7 shows the distributions of subunit aspect ratios found in V1 and MT for the facilitation and suppression. The geometric means of the aspect ratios for facilitation were 2.2 in V1 and 2.1 in MT (p > 0.2, t test). For suppression, the corresponding values were 2.1 and 2.0 (p > 0.3, t test). Thus, it appears that the subunits are modestly elongated, suggesting a limited degree of orientation selectivity in the inputs to both V1 and MT cells, although it is possible that our procedure slightly underestimated aspect ratio (Jacobson et al., 1993
).

View larger version (29K):
[in this window]
[in a new window]
|
Figure 7. Measurements of the subunit aspect ratio. Aspect ratio was defined as the ratio of length to width of the best-fitting elliptical Gaussian for facilitatory (top) and suppressive (bottom) fields in V1 (left) and MT (right).
|
|
Spatiotemporal interactions: I(
s, 
)
One possible reason for the limited range of spatial displacements indicated in Figure 5 is that peak displacement maps ignore crucial aspects of the interaction function I(
x,
y, 
,
2). For example, the neurons might respond to larger spatial displacements at larger values of 
. Such a change in preferred spatial displacement with increased temporal separation between stimuli might render MT neurons sensitive to the ratio
s/
, where s is the magnitude of a two-dimensional spatial displacement. This type of invariant velocity selectivity is often assumed to be one of the primary functional differences between direction selectivity in V1 and MT.
We tested the velocity sensitivity of neurons in V1 and MT by computing profiles like those in Figure 2b at values of 
ranging from 16 to 150 ms. To do this, we first computed the profile for the peak displacement map to establish the preferred-null access (Fig. 2a, thick dashed line). We then used the same axis to obtain profiles at the other values of 
and stacked the profiles to obtain the
s
maps. Using the same axis for each value of 
allowed us to measure changes in preferred spatial displacement regardless of changes in preferred direction, which were generally negligible (Perge et al., 2004
). For most neurons, we did not measure simultaneous interactions (
= 0), although it would be of some theoretical interest to do so (Jacobson et al., 1993
; Baker, 2001
; Livingstone et al., 2001
; Livingstone and Conway, 2003
).
Figure 8 shows example
s
maps for three V1 cells and three MT cells. The y-axis indicates the time 
between stimuli, and the x-axis indicates the spatial displacement
s. The colors indicate facilitation and suppression, as in the maps in Figures 1 and 2. Thus, a horizontal row is a color-coded version of the cross-sections shown in Figures 2b and 4b. The slant evident in some of the maps suggests that the preferred spatial displacements for these cells change as a function of the temporal separation between stimuli.
As 
is increased, the V1 cell in the bottom left panel of Figure 8 shows an increase in its responses to large stimulus displacements. This is evident in the slant of the reddish regions on the left of the map, and it is exactly what one would expect from a cell that was tuned to the ratio
s/
. In contrast, the cell in the top left panel of Figure 8 shows a reversal in the spatial profile of its suppressive responses, so that the overall effect of increasing 
appears to be a phase shift in the response profile. This latter result is similar to the predictions of motion energy models (Adelson and Bergen, 1985
), which do not encode velocity per se, but rather a limited range of spatiotemporal displacements. We will first consider the slant present in the
s
maps in V1 and MT and then examine specific predictions of a simple model of velocity tuning.
Separability of I(
s, 
) maps
One way to examine the slant in the (
s, 
) space is to examine separability of responses to spatial and temporal displacements. For a neuron with responses that are separable in space and time, the
s
maps in Figure 8a can be described as the product of a spatial profile (like those in Fig. 2b) and a temporal profile. The temporal profile may be biphasic (as the one in the top left panel of Fig. 8a appears to be), but the separability of the response implies that responses to velocity will depend on individual values of
s and 
rather than their ratio. In contrast, a neuron with an inseparable response map will have a response to spatial displacements that varies systematically with the temporal interval between stimuli and so could be tuned to the ratio
s/
.
We tested the separability of V1 and MT responses by performing a singular value decomposition on the
s
map of each neuron. The SVD calculates a series of orthogonal maps (known as singular vectors), each capturing less of the variance than the preceding one. A completely separable map would be described by one singular vector, whereas inseparable maps would require more singular vectors. The extent to which a singular vector contributes to the map is described by a scalar known as the singular value. The statistical significance of each singular value was tested with a permutation test (p < 0.05), in which the maps were recalculated with the order of the stimulus frames shuffled (see Materials and Methods). The permutation test led us to discard 31 V1 neurons and 20 MT neurons, because none of their singular values reached significance.

View larger version (29K):
[in this window]
[in a new window]
|
Figure 9. Measures of separability in V1 (left) and MT (right). The top row shows population histograms of the separability index, calculated from a singular value decomposition of maps like those shown in Figure 8. The bottom row shows the tilt direction index, calculated from the discrete Fourier transform of the maps (see Materials and Methods).
|
|
Using the singular values for each map, we can describe separability in a continuous manner by calculating a separability index (SI) (Mazer et al., 2002
; Grunewald and Skoumbourdis, 2004
):
where
n is the nth singular value. The SI measures the extent to which the first singular vector is sufficient to account for the variability in each map, with an SI of 1 meaning complete separability and an SI near 0 indicating inseparability. The SIs for the V1 neurons shown in Figure 8 were 0.64, 0.75, and 0.88, from left to right. For the MT neurons in Figure 8, the SIs were 0.58, 0.70, and 0.91. Across the populations, the distributions of SIs for V1 and MT are shown in Figure 9. For V1, the mean SI was 0.71 (0.15 SD), whereas in MT, the mean was 0.70 (0.10 SD). These differences were not significantly different (t test, p > 0.2). If we consider only singular values that were significantly above the noise (permutation test, p < 0.05), then the mean SI becomes 0.80 (0.22 SD) for V1 and 0.86 (0.18 SD) for MT. Similar results on the separability of V1 and MT neurons have been obtained with sinusoidal grating stimuli (Foster et al., 1985
; Priebe et al., 2003
).
Orientation in I(
s, 
) maps
A second way to examine the separability of the neurons is to examine the orientation of the
s
maps. Neurons that change their preferred spatial displacement as a function of the inter-stimulus interval (
) should show a slant in their
s
maps. The degree of this slant can be computed with a tilt direction index, which describes the amount of slant from 0 to 1, with 0 indicating no slant and 1 indicating that the map is completely described by one direction of tilt (Anzai et al., 2001
; Baker, 2001
) (see Materials and Methods). Thus, the TDI should be inversely related to measures of separability, which was indeed the case in both V1 (p < 0.01) and MT (p < 0.002). This suggests that much of the inseparability found in the data were attributable to spatiotemporal slant.
For the V1 cells shown in Figure 8, the TDIs were, from left to right, 0.86, 0.52, and 0.10. The corresponding values for the MT cells were 0.85, 0.51, and 0.04. The distribution of TDIs is shown at the bottom of Figure 9 for V1 (left) and MT (right). The mean TDI for the population of V1 cells was 0.42 (0.22 SD) and that for MT cells was 0.39 (0.21 SD), which did not differ significantly (t test, p > 0.16).
Modeling of velocity tuning
Neither of the previous two analyses tested any particular hypothesis about velocity tuning. Both simply looked for structure in the data that one might expect to find if the neurons were velocity tuned. This is a rather indirect way to examine velocity tuning, because a neuron such as the one shown in the top left panel of Figure 8 can show slant in
s/
space without being tuned to velocity. Indeed, there are many ways in which a neuron can exhibit inseparability without being tuned for velocity (Baker, 2001
). We therefore examined a third way to explore velocity tuning by developing and testing specific models that predict particular spacetime structure for velocity-tuned neurons. These models can then be checked against the data to determine which accounts for more of the variance (Levitt et al., 1994
; Priebe et al., 2003
).
We performed such an analysis for our population of V1 and MT neurons. A separable model was derived from the spatial and temporal profiles (horizontal and vertical slices) through the peak of each map. These are shown for two example cells in Figure 10a. The left column shows the
s/
maps for two of the cells shown in Figure 8, with the dashed rectangles along the sides of the maps indicating the spatial and temporal profiles taken at the peaks of the maps. The separable prediction was then the outer product of these functions (middle column). The velocity-tuned prediction was computed by shifting each spatial profile by an appropriate amount to obtain a line of constant velocity (right column). To determine which model provided a better fit to the data, we computed the partial correlations between the models and the data (Levitt et al., 1994
; Priebe et al., 2003
).
For the neuron in Figure 10a, the correlation between the velocity-tuned prediction and the data were 0.70, whereas the correlation for the separable prediction was 0.44. For the neuron in Figure 10b, the values were 0.29 and 0.83. Thus, the neuron in Figure 10a conforms to a simple model of velocity tuning, whereas the neuron in Figure 10b is tuned for a particular displacement in space and time.
Figure 11a (top row) shows the results for V1 and MT. In both areas, there was a preponderance of separable neurons. In V1, 9 of 113 neurons were significantly velocity tuned, whereas in MT, there were only 2 of 145 that were significantly velocity tuned. The rest were either significantly separable or could not be classified by this technique.
The lack of velocity tuning in MT is somewhat surprising given previous results with sinusoidal gratings (Perrone and Thiele, 2001
; Priebe et al., 2003
). These studies found that a subpopulation of MT neurons exhibited velocity tuning that was invariant over a range of spatial and temporal stimulus frequencies. One study (Priebe et al., 2003
) found that velocity tuning increased with contrast, which may reconcile our findings with theirs. The stimuli in our experiment, although high in luminance contrast (99%), were extremely compact in space and time, so that the total contrast per stimulus was extremely low. Also, relative to receptive field size, the stimuli were much larger in V1 than in MT, which might explain why we find slightly more inseparability in V1.
We examined this issue by collecting additional maps from 30 MT neurons with the spot stimuli replaced with a pair of long bars (2.4°) oriented perpendicular to the preferred direction of each neuron. The stimulus sequence and analysis were otherwise identical, but the total contrast (power) delivered to the receptive field on each stimulus frame was greater by a factor of
100. Figure 11a (bottom row) shows a comparison of the nonparametric analysis for the 30 neurons. The panel on the left shows the results for the bar stimuli, which clearly indicate a shift for some neurons toward inseparability. This shift was significant in 10% (3 of 30) of the neurons (Fisher's rz transformation, p < 0.05). The panel on the right shows the results for the same neurons using the small spot stimuli. As in the larger sample, nearly all were significantly separable. Thus, it appears that MT neurons exhibit more velocity-tuned behavior at higher stimulus contrasts, as reported previously (Priebe et al., 2003
).
The difference appears to be genuinely attributable to the higher contrast and not to the difference in the spatial structure of the spot and bar stimuli. When we tested 11 of these neurons with lower-contrast (12%) bars of the same length, the spacetime maps were once again separable. There was little difference between these maps and those obtained with spots (data not shown).
One of the primary reasons for the lack of spacetime interactions in our data were the strong preference of most neurons for short interstimulus intervals (
). In fact, most of the neurons had very little response to spatial displacements of any magnitude when 
was more than two refreshes (33 ms) (Churchland and Lisberger, 2001
). Consequently, the
s
maps were often quite noisy. Noise tends to increase the TDI and decrease the separability of most neurons, so that our analyses probably overestimated the extent of the interactions. (The TDI increases because random noise has, on average, equal energy at all orientations, so that the position of the peak of the discrete Fourier transform is random. Consequently, the mean TDI for random noise is near 0.5. Similarly, random noise has a relatively flat distribution of singular vectors, leading to low separability.) This becomes clear when one examines the total responsiveness of the neurons as a function of 
[the value of the function V(
2; 
) described in Materials and Methods, at the peak of
2]. The normalized population averages shown in Figure 11b indicate that, on average, by 
= 50 ms, the responses had fallen to approximately one-third of their peak. In other words, V1 and MT do not exhibit invariant velocity tuning in
s/
space in large part because they simply do not respond to large values of 
.