Abstract
A recent study by Priebe et al., (2006) has shown that a small proportion (27%) of primate directionally selective, complex V1 neurons are tuned for the speed of image motion. In this study, I show that the weighted intersection mechanism (WIM) model, which was previously proposed to explain speed tuning in middle temporal neurons, can also explain the tuning found in complex V1 neurons. With the addition of a contrast gain mechanism, this model is able to replicate the effects of contrast on V1 speed tuning, a phenomenon that was recently discovered by Priebe et al., (2006). The WIM model simulations also indicate that V1 neuron spatiotemporal frequency response maps may be asymmetrical in shape and hence poorly characterized by the symmetrical two-dimensional Gaussian fitting function used by Priebe et al., (2006) to classify their cells. Therefore, the actual proportion of speed tuning among directional complex V1 cells may be higher than the 27% estimate suggested by these authors.
Introduction
It has been a long-standing puzzle in the biological and visual sciences as to how local visual motion is encoded by the nervous system. One way of probing the motion processing capabilities of visual neurons is to test them with a range of spatial and temporal frequencies using moving sine-wave gratings. This technique can be used to map out the spatiotemporal frequency (STF) response profiles of neurons. This has been done with middle temporal (MT) neurons (Perrone and Thiele, 2001; Priebe et al., 2003), and recently Priebe et al., (2006) applied this technique to monkey V1 neurons. They found that some directionally selective, complex V1 neurons are “speed tuned.”
Previous studies of the motion sensitivities of visual neurons classified a cell as being speed tuned if it responded selectively to a particular speed when tested with a “broadband” stimulus, such as a moving bar or edge. Using this criterion, it was found that the majority of MT cells are speed tuned (Maunsell and Van Essen, 1983). When the study of neural motion processing was extended to the STF (Fourier) domain (Perrone and Thiele, 2001; Priebe et al., 2003), the definition of speed tuning was made more specific. Hence, for a neuron to be truly speed tuned, the spatial frequency (sf) and temporal frequency (tf) that stimulate it the most should be related by the following equation: tf = v × sf, where v is a constant (equal to the optimum grating speed). When plotted in the form of an STF tuning surface (see Fig. 2b), the output of a speed-tuned neuron has a peak of maximum activity that forms an oriented ridge with slope v. A line drawn through the peak regions passes through the origin when plotted on linear axes (Perrone and Thiele, 2001).
Priebe et al., (2006) examined the STF response maps of V1 neurons under a range of stimulus conditions and developed a “speed tuning index” (ξ), which was designed to quantify the degree of speed tuning present in their sample of V1 cells (see Materials and Methods). They discovered that a shift from high- to low-contrast stimuli often reduced the degree of speed tuning in a particular neuron. The STF response maps of the neurons underwent a systematic change; STF maps that were not speed tuned (separable) under low-contrast conditions became more speed tuned (inseparable) at higher contrast levels.
Priebe et al., (2006) suggested that the STF response map changes they observed in their V1 and MT neurons (Priebe et al., 2003) could be a result of some unspecified mechanism that altered the contrast gain in various parts of the response field of the neurons. We have previously proposed a model, the weighted intersection mechanism (WIM), that outlines how separable, “non-speed tuned” STF response fields can be transformed into inseparable speed-tuned fields (Perrone and Thiele, 2002; Perrone, 2004, 2005). In this study, I will show that this model is able to replicate key aspects of the V1 speed-tuning data of Priebe et al., (2006).
Materials and Methods
The WIM model.
The main details of the model were described previously (Perrone and Thiele, 2002; Perrone, 2004, 2005). Features of the WIM model used in the simulations reported in this study that differ from the published version are mainly described here. We have shown that the oriented STF response surfaces found in MT neurons can be generated from two V1 neurons, one with low-pass temporal frequency tuning (S) and another with bandpass temporal frequency tuning (T) (Perrone and Thiele, 2002). In the time domain, the S type has a unimodal temporal response profile that extends for the duration of the stimulus (sustained), and the T type has a biphasic profile with the response primarily at stimulus onset and offset (transient). The spatiotemporal energy (Adelson and Bergen, 1985; Watson and Ahumada, 1985) outputs from the S and T neurons are combined using the following equation: The overall S and T responses are determined by the multiplicative combination of their separate temporal and spatial frequency sensitivity functions (see below, Spatial frequency tuning). The α and δ parameters are constants that control the overall tuning of the WIM sensor (Perrone and Thiele, 2002). The α parameter can be used to control the range of spatial frequencies that the sensor will respond to, and the δ parameter is used to set the gain and speed tuning bandwidth of the sensor. The optimum speed tuning for the sensor can be controlled using φ (Perrone, 2005). For Figure 1 simulations, the values of α, δ, and φ were 0.1, 0.7, and 1.0, respectively. The peak spatial frequency tuning of the S neuron was set at 2.77 cycles/degree (c/deg). For Figure 2 and 3 simulations, the peak spatial frequency tuning of the S neuron was set to 2.0 c/deg, and α, δ, and φ were set to 0.5, 2.4, and 0.5 (Fig. 2) and 1.0, 1.8, and 1.0 (Fig. 3), respectively.
Temporal frequency tuning.
In previous versions of the model, the low-pass S-neuron temporal frequency tuning function was based on a function derived by Watson (1986). A simpler function, based on a Gaussian, was used in the simulations reported in this study. In the frequency domain, the equation used was as follows: f̃sust (tf) = exp(−0.5tf 2σ2)cos(2πtfθ) − exp(−0.5tf 2σ2)sin(2πtfθ)i, where tf is the temporal frequency measured in hertz and
The T-neuron temporal frequency tuning function is bandpass in shape and is given by the following equation: f̃trans(tf) = kf̃sust(tf)tfi. The magnitudes of both of these functions are good matches (Perrone, 2005) to the temporal frequency tuning functions often observed in V1 neurons (Foster et al., 1985; Hawken et al., 1996). For all of the simulations reported here, σ = 0.06, θ = 0.07, and k = 0.25.
Spatial frequency tuning.
The spatial frequency tuning functions used in the WIM model are based on the difference of difference of Gaussians with separation function used by Hawken and Parker (1987) to fit their V1 spatial frequency tuning data (Perrone, 2004). The T-neuron spatial frequency tuning function, ũtrans(sf), differs in a special way from the S-neuron spatial frequency function, ũsust(sf), so that when they are combined with the S and T temporal frequency tuning functions, a WIM sensor is generated that has an oriented (inseparable) STF response surface (Perrone and Thiele, 2002). For all of the model simulations in this study, the sustained and transient spatiotemporal energy (S and T in Eq. 1) was determined from the combined magnitudes of the spatial and temporal frequency functions [i.e., S(sf, tf) = |ũsust(sf)| × |f̃sust(tf)| and T(sf, tf) = |ũtrans(sf)| × |f̃trans(tf)|].
Contrast sensitivity.
Previously published versions of the WIM model assumed that the contrast of the stimulus was 100%, and no mechanism was included to allow for any effects of contrast. For the simulations in this study, an additional component was added to the model so that the effect of stimulus contrast could be assessed. The gain of the S and T input neurons (see Fig. 2a) was controlled using a modified Naka-Rushton equation: gain = pc/(c + s), where c is the contrast of the grating, p is the peak response, and s is the semi-saturation constant (Thompson et al., 2006). For Figure 1–3 simulations, the p and s values used in the gain equation for the S neuron were 2.6 and 2.0, respectively. For the T neuron, p and s were set to 1.0 and 0.1, respectively. The s values are all in the range of the physiologically determined estimates for the average semi-saturation constants of parvocellular and magnocellular cells (Kaplan and Shapley, 1986) and for V1 neurons (Sclar et al., 1990). Similar contrast sensitivity functions to those shown in Figure 2a have been used successfully to model human perceptual effects of contrast on speed perception (Thompson et al., 2006).
Fitting functions and STF sampling.
The model STF response surfaces were fit using the same function adopted by Priebe et al., (2006; their Eq. 3). It is a modified two-dimensional Gaussian function in which the preference for temporal frequency can be made to depend on the stimulus spatial frequency. The WIM model outputs were fit with this function using the “nlinfit” function in MatLab (MathWorks, Cambridge, MA). The main estimated parameter of the fitted function is ξ (the exponent of a power-law relationship between preferred temporal frequency and the stimulus spatial frequency). The value of ξ can range from 0 (no speed tuning) to 1.0 (“perfect” speed tuning). A neuron with no speed tuning has STF response surface contours with major axes that are aligned with the spatial- and temporal-frequency axes. When plotted on linear axes, the major axis of the peak STF response-surface contour is vertical, and this type of STF surface is commonly described as being “separable.” Under the rating system of Priebe et al., (2006), a neuron with perfect speed tuning (ξ = 1) has a tilted STF response map (inseparable), and each spatial frequency is tuned to the same speed (form invariance). This latter requirement is an important part of the definition of speed tuning, because a neuron can have an inseparable (tilted) STF response map and still not be speed tuned. For the simulations shown in Figure 1, the spatial frequency was sampled at 1, 2, 4, and 8 c/deg. The temporal frequency was sampled at 0.25, 0.5, 1, 2, 4, 8, 16, and 32 Hz. This was designed to match the Log2 sampling of frequency space used by Priebe et al., (2006). For Figure 2, b and c, the spatial frequency ranged from 0 to 4 c/deg in 0.25 c/deg steps. The temporal frequency ranged from 0 to 20 Hz in 0.25 Hz steps. For Figure 3a, the spatial frequency ranged from 0.25 to 8 c/deg in 0.25 c/deg steps, and the temporal frequency ranged from 0.25 to 32 Hz in 0.25 Hz steps.
Contrast gain.
To simulate the contrast gain plots of Priebe et al., (2006), we followed their convention of taking the ratio (high contrast/low contrast) of the two response maps. The resulting contrast gain map is divided into quadrants with the origin corresponding to the peak of the low-contrast map. In Figure 1c, the origin lies at 2.0 c/deg and 2.0 Hz. Following Priebe et al., (2006), the mean contrast gain is found for each quadrant, and these values are normalized by the mean contrast gain across the entire response field.
Results
V1 STF response surfaces and the effect of contrast
The motion sensors in the WIM model have STF response surfaces that closely match the maps found by Priebe et al., (2006) for their directional, complex V1 neurons (Fig. 1a). The different types of STF response surfaces (separable to inseparable) apparent in the Priebe et al., (2006) data set were easily replicated by varying the contrast of the stimulus, the peak spatial frequency tuning of the WIM sensors, and/or their optimum speed tuning.
Priebe et al., (2006) found that when they reduced the contrast of their grating stimuli from 32 to 8%, the STF response surface for a particular neuron became less oriented (more separable) with a concomitant downward shift in the temporal frequency value of the peak. They also found that “contrast gain” contour plots that show the ratio of the two fields (32% contrast/8% contrast) tended to have peak values in the top right and bottom left part of the maps. The WIM model is able to replicate both of these results (Fig. 1). The values of ξ for Figure 1, a and b, response maps are 0.42 and 0.1, respectively, which is an exact match to the mean values found by Priebe et al., (2006) for their sample of complex V1 neurons.
For Figure 1c, the mean normalized contrast gain values (see Materials and Methods) for the northeast and southwest quadrants were 1.61 and 4.12, respectively. For the northwest and southeast quadrants, the values were 0.02 and 0.67, respectively. The result of larger values in the northeast and southwest quadrants is consistent with the trend found by Priebe et al., (2006) over their sample of complex V1 cells.
Origin of the contrast effects
The WIM model also offers an explanation for why Priebe et al., (2006) obtained their contrast effects. A reduction in contrast changes the relative magnitude of the S and T outputs (Fig. 2a), which alters the value of φ in Equation 1. This drives the sensor to a lower optimum speed (Perrone, 2005), and so the STF response surface for 8% contrast has a peak with a temporal frequency value that is shifted downward relative to the 32% contrast condition (Fig. 1a,b). This explains one of the trends (a downward shifted peak) noticed by Priebe et al., (2006) in their V1 data.
In addition, as the size of the S and T energy outputs drop (as a result of decreasing contrast), the WIM output tends toward log(φT + S + α)/δ. At high contrast levels, this only happens when φT = S (i.e., when the relationship tf = v × sf holds) (Fig. 2b). However, at low contrast levels, the WIM output tends toward log(φT + S + α)/δ for cases in which tf ≠ v × sf, because both S and T are very small and thus abs(logφT − logS) ≅ 0 (see Eq. 1). An STF response map for very low (4%) contrast inputs [WIM(sf,tf) ≅ log(φT + S + α)/δ] is shown in Figure 2c. It is approximately separable (Perrone and Thiele, 2002), and it accounts for the shift noted by Priebe et al., (2006) toward separable response maps for their low-contrast condition.
Detecting speed tuning in STF response maps
While performing the model simulations, we were often surprised at the low values of ξ that were generated for some STF response maps that we knew to be definitely speed tuned. To investigate this more fully, we tested the WIM sensors with a greater range of spatial and temporal frequencies. Figure 3a shows the STF response map for a WIM sensor tuned to 2°/s and tested at 32% contrast. It has been plotted on log axes using the convention adopted by Priebe et al., (2006). However, the spatial and temporal frequencies tested were in linear steps (0.25 c/deg, 0.25 Hz), and the shading has been removed to better visualize the structure of the maps.
There is a ridge of peak activity in the central part of the map that clearly lies along the 2°/s iso-speed line (Fig. 3a, dashed line). The best-fitting, two-dimensional Gaussian function map (see Materials and Methods) is shown in Figure 3b. The Priebe et al., (2006) index, ξ for the Figure 3a map is only 0.12 (i.e., the sensor is non-speed tuned according to their criterion). The fitting function used by Priebe et al., (2006) is not isolating the speed-tuned central region of the STF response map of the sensor. The reason for this is that the fitting function is symmetrical in log–log space. The WIM sensor map is not symmetrical; there are regions away from the tilted central portion that are almost separable (Fig. 3a, top and bottom). To accommodate these regions, the fitting function ends up rotated counterclockwise and closer to the vertical.
The problem with the fits is also apparent in Figure 3c, in which the output of the WIM sensor is plotted against the speed of the moving grating for three different spatial frequencies (1, 2, and 8 c/deg). Each spatial frequency is tuned to the same speed (2°/s), and the peaks of the curves (Fig. 3c, solid lines) all line up at this speed. One would therefore expect this sensor to have a ξ of 1.0 (rather than 0.12). It is apparent from the misalignment between the model data curves and the best-fitting Gaussian function curves (Fig. 3c, dashed lines) that the asymmetry in the data and the peaked nature of the data curves are causing problems for the fitting procedure.
This analysis shows that the particular fitting function adopted by Priebe et al., (2006) is unsuited for some STF response surfaces. The tests also indicate that the value of the ξ index used by Priebe et al., (2006) is likely to be very sensitive to the range of the spatial and temporal frequencies sampled. The size of ξ is expected to be influenced by just how much of the asymmetrical part of the response map is included in the analysis. This prediction was verified experimentally by trying different sampling schemes for the test spatial and temporal frequencies. Log2 sampling over the same range (see Materials and Methods) increased ξ to 0.36 (from 0.12 for linear sampling). When linear sampling was used and the highest temporal frequency was decreased from 32 to 16 Hz, the value of ξ increased to 0.72. With less of the separable region at the top of the map (Fig. 3a), the fitting function showed less of a shift to the vertical.
All of these results indicate that the ξ speed tuning index adopted by Priebe et al., (2006) is very sensitive to the type of sampling scheme used when the STF response map is not two-dimensional Gaussian in form. We have already shown that MT neuron STF maps can be well fit using the WIM model (Perrone and Thiele, 2002), and so some MT maps are likely to have asymmetries similar to those apparent in Figure 3a. There is also a possibility that some V1 neuron STF maps have similar asymmetries. As long as that possibility exists, the statistical technique of Priebe et al., (2006) cannot be considered a reliable method for assessing the extent of speed tuning in V1 (or MT).
Discussion
Evidence for the WIM model
The discovery of speed-tuned neurons in V1 by Priebe et al., (2006) is a significant breakthrough and goes a long way toward revealing the transformations that occur at different stages of the visual system. The close match between the Priebe et al., (2006) data and the model output supports the idea that a WIM-like process (Perrone and Thiele, 2002) may be at work within V1 itself. The high- to low-contrast stimuli results of Priebe et al., (2006) (replicated in Fig. 1) suggest the involvement of at least two classes of neurons in the development of speed-tuned neurons.
The WIM model simulations show that the speed-tuned V1 neurons detected by Priebe et al., (2006) could arise from the combination of two separate classes of neurons (also within V1): nondirectional neurons with low-pass temporal frequency tuning and directional neurons with bandpass temporal frequency tuning. The simulations also suggest that the former class has contrast sensitivity functions that match those found in parvocellular-projecting ganglion cells (Kaplan and Shapley, 1986). The latter class (transient type) has a saturating contrast sensitivity function (Fig. 2a) similar to that found in magnocellular-projecting ganglion cells (Kaplan and Shapley, 1986).
These two neuron classes (both with separable STF tuning) could act together to form a separate category of speed-tuned neurons with inseparable STF tuning (Perrone and Thiele, 2002). We would expect all of the neurons that are formed from the combination of these two classes (using something like the WIM model rule given by Eq. 1) to be speed tuned. Our hypothesis that there is an interaction between parvo- and magno-type neurons within V1 is not new. For example, De Valois and Cottaris (1998) looked at V1 neuron properties in the space–time domain (as apposed to the frequency domain considered here) and mapped out the spatiotemporal receptive fields of V1 neurons using flashed bars. They demonstrated that spatiotemporal-oriented cells (directional cells equivalent to the T units in the WIM model) could be constructed from subunits with sustained (parvo-like) and transient (magno-like) temporal frequency tuning properties. In addition to the neurons considered by De Valois and Cottaris (1998), the WIM model makes use of another broad class of V1 neurons that are nondirectional (i.e., lacking spatiotemporal orientation) and that have parvo-like (low-pass) temporal frequency tuning. This latter class would be the equivalent to the S units of the model.
The amount of speed tuning in V1 and MT
The WIM model simulations reveal that it is difficult to correctly measure the STF response surfaces of neurons; inadequate sampling of the frequency space will often fail to reveal the true structure of the response maps, and measures such as the ξ statistic used by Priebe et al., (2006) do not accurately capture the actual underlying speed tuning. The speed-tuning curves of MT neurons obtained with moving bars are quite peaked with concave regions on either side of the peak (Maunsell and Van Essen, 1983; Lagae et al., 1993). The WIM model sensors were designed to have similar “peaky” speed-tuning curves [Perrone and Thiele (2002), their Fig. 5]. As shown in the Figure 3 simulations, logarithmic Gaussians do a poor job of fitting such peaked functions and tend to produce an underestimate of the actual amount of speed tuning.
This may account for some of the controversy that has arisen over the extent of speed tuning in primate MT (Perrone and Thiele, 2001; Priebe et al., 2003) and in the Pigeon Accessory Optic System (Crowder et al., 2003; Winship et al., 2006). The simulations reported in this study raise the possibility that the estimates for the proportion of speed-tuned cells in V1 (Priebe et al., 2006) and MT (Priebe et al., 2003) are likely to be on the conservative side; speed tuning may be more prevalent in these areas than some of the current data suggest.
Footnotes
-
I thank the two anonymous reviewers for their helpful suggestions and feedback regarding this manuscript.
- Correspondence should be addressed to John A. Perrone, Department of Psychology, The University of Waikato, Private Bag 3105, Hamilton 3240, New Zealand. jpnz{at}waikato.ac.nz