Abstract
Direction selective neurons in macaque primary visual cortex are narrowly tuned for orientation, and are thus afflicted by the aperture problem. At the next stage of motion processing, in the middle temporal (MT) area, some cells appear to solve this problem, responding to the pattern motion direction of plaids. Models have been proposed to account for this computation, but they do not replicate the diversity of responses observed in MT. We recorded from 386 cells in area MT of two male macaques, while presenting a wide range of random-line stimuli and their compositions into noise plaids. As we broadened the range of stimuli used to probe the cells, yielding ever more challenging conditions for extracting pattern motion, the diversity of the responses observed increased, and the fraction of cells that faithfully encoded pattern motion direction shrank. However, we show here that a pattern motion signal is present at the population level. We identified four mechanisms, one never proposed before, that together might account for the observed diversity in single-cell responses. Pattern motion is thus extracted in area MT, but it is encoded across the population, and not in a small subset of pattern neurons.
SIGNIFICANCE STATEMENT Some neurons in the middle temporal area of macaques solve the aperture problem, signaling the direction of motion of complex patterns. As the number of pattern types used to probe this mechanism is increased, fewer and fewer cells retain this capability. We show here that different cells fail in different ways, and that simply summing their responses averages away their failures, yielding a clear pattern motion signal. Similar encodings, which unequivocally violate the “neuron as a feature detector” hypothesis that has dominated sensory processing theories for the past 50 years, might apply throughout the brain.
Introduction
Some neurons in primary visual cortex (area V1), ∼20% in macaques (De Valois et al., 1982; Hawken at al., 1988; Hamilton et al., 1989; Prince et al., 2002), are selective for the direction of motion of visual stimuli. Such neurons are also narrowly tuned for stimulus orientation (Hubel and Wiesel, 1962) and are thus afflicted by the “aperture problem” (Fennema and Thompson, 1979; Marr and Ullman, 1981): They are only sensitive to the component of motion orthogonal to their preferred orientation, thus providing potentially misleading information about motion direction.
Some neurons (known as component cells) in the middle temporal (MT) area, an area that receives strong and direct projections from area V1 (Cragg, 1969; Maunsell and van Essen, 1983; Movshon and Newsome, 1996), respond to moving stimuli much like V1 cells. However, others (called pattern cells) are sensitive to the 2-D direction of motion of complex patterns (Movshon et al., 1985), thus appearing to have solved the aperture problem. Most MT cells do not unambiguously carry either signal.
The rules by which MT pattern cells might combine their V1 inputs have been extensively studied (Kawakami and Okamoto, 1996; Simoncelli and Heeger, 1998; Perrone, 2004; Rust et al., 2006; Nishimoto and Gallant, 2011; Quaia et al., 2016). Most experimental studies probed MT neurons with patterns obtained by summing two sinusoidal gratings drifting at the same speed in different directions (typically 120° apart), so-called Type I plaids. Recently, unikinetic plaids, obtained by summing a drifting and a static grating of different orientations, have also been used (Khawaja et al., 2013; Wallisch and Movshon, 2019). The direction of motion of an unikinetic plaid is parallel to the orientation of the static grating, a stimulus that when presented in isolation induces little response in most MT cells (Albright, 1984; Wallisch and Movshon, 2019). These studies revealed a large heterogeneity of responses, with only few MT cells faithfully encoding pattern motion direction with both types of plaids. It has thus been suggested that a true pattern motion computation might be finalized only in the middle superior temporal (MST) area, the next stage of motion processing, where many cells indeed encode pattern motion direction for both Type I and unikinetic plaids (Khawaja et al., 2013).
The idea that an additional processing step is needed rests on the concept of neurons as feature detectors, under which cells that encode neither component nor pattern direction are a nuisance, an indicator of work in progress. An alternative view is that computations are distributed across populations of neurons, and fully revealed only at the aggregate (i.e., population) level.
To test this hypothesis, we recorded from 386 neurons in monkey area MT, while presenting a large variety of plaid stimuli. In addition to Type I and unikinetic plaids, we also presented plaids in which a drifting one-dimensional (1-D) noise pattern is paired with a 1-D noise pattern of different orientation that is created anew for each frame (i.e., it flickers in place). This stimulus, which we call a flicker plaid, induces reflexive eye movements (Quaia et al., 2016) that are identical to those seen when the flickering pattern is replaced by a static one having the same orientation (forming a unikinetic plaid). This artificial stimulus represents an even stronger challenge for models of pattern computation, as flicker plaids are not associated with a well-defined pattern motion direction, and yet the primate brain appears to assign one to them.
As expected, we found that single-unit responses are heterogeneous and do not lie on a simple continuum between “pattern” and “component” responses. However, we also found that area MT robustly encodes the pattern direction of all plaids at the population level, possibly accounting even for the reflexive eye movement responses to flicker plaids. It thus appears that the computation of pattern direction is completed in area MT, and downstream areas can read out this signal by linear pooling of MT responses.
Materials and Methods
Electrophysiology
We recorded extracellular spiking activity from 2 male rhesus monkeys (Macaca mulatta). Surgery under general anesthesia was performed in each monkey to implant a head post and a recording chamber over a craniotomy providing access to area MT on one side of the brain. During each recording session, the monkey was awake and had its head restrained by means of a head post. While the monkey passively fixated a central cross (for which it was periodically rewarded with drops of water or fruit juice), stimuli were presented in fast succession (500 ms presentations separated by 100 ms blank–mid gray screen intervals) and neural activity was recorded from area MT using multicontact linear electrodes (V-probes, Plexon, 24 contacts with a spacing of 50 µm). Analog electrical signals from each electrode were digitized at 40 kHz and stored to disk for offline analysis.
All procedures were performed in accordance with the U.S. Public Health Service Policy on the humane care and use of laboratory animals, and all protocols were approved by the National Eye Institute Animal Care and Use Committee.
Spike sorting
Only well-isolated units were used for the analyses reported here. This required careful spike sorting of the signals recorded from the multicontact electrodes. First, spikes were detected using a voltage threshold applied to bandpass filtered signals. Triggered waveforms were projected into spaces defined by features of the waveform, PCA components, or similarity to a template. As previously described in many studies, spike sorting yield and quality were substantially improved by treating sets of three or four adjacent contacts as n-trodes. Since fully automated spike sorting methods do not currently have the required accuracy, we used a custom-developed interactive method, in which cluster boundaries were estimated with a Gaussian mixture model and then verified or adjusted manually. Ambiguous cases were discussed by two users; and if a consensus was not reached, the putative unit in question was discarded.
Visual stimuli
The data presented here were part of a larger study, under which several types of visual stimuli were presented while recording from area MT. In this paper, we focus on the responses evoked in MT neurons by seven classes of stimuli. Four classes of stimuli have a single component.
1-D random line stimulus (1-D noise), drifting in one of 12 directions, spaced uniformly around the circle (i.e., 30° apart). Stimulus direction is assigned according to the following convention: 0° corresponds to a stimulus in which the lines are vertical and drift to the right. Angles increase in a counterclockwise direction (90° is up, 180° is to the left, and 270°/–90° is down). The drifting speed orthogonal to the orientation of the lines was 10°-15°/s. We graphically represent these stimuli with a solid arrow, pointing in the direction of motion. A 0° drifting 1-D noise pattern is thus represented by a horizontal arrow pointing to the right (Fig. 1).
Two-dimensional (2-D) random dot stimuli (2-D noise), drifting at the same speed and in the same directions as the 1-D noise patterns described above. We represent these stimuli using a dotted arrow, pointing in the direction of motion.
Static 1-D noise patterns, having one of 6 orientations, spaced every 30° from –150° to 180°. In a 0° static pattern, the lines are vertical, and thus have the same orientation as a 0° or 180° drifting 1-D noise pattern. We represent these stimuli using a solid line, parallel to their orientation (i.e., a vertical line for a 0° static pattern).
Flickering 1-D noise patterns (i.e., 1-D noise patterns generated anew on each frame), having the same orientations as the static patterns just described. Their graphical representation is a dashed line, parallel to the orientation of their lines.
We then have three classes of plaid stimuli. These are obtained by summing two components from the 1-D noise classes above (drifting, static, or flickering). We number the two components and plot the response to the plaid as a function of the direction of motion of the first component, which is always a drifting 1-D noise pattern. This direction is not the same as the pattern motion direction of the plaid. This departure from the convention used in previous studies to identify plaids based on their pattern motion direction is motivated by flicker plaids (described below) not having a well-defined pattern motion direction. Our three plaid classes are as follows:
Type I plaids, obtained by summing two 1-D noise patterns, having directions of motion 120° apart. The second component is rotated 120° clockwise relative to the first; thus, in a 0° Type I plaid, the first component is a 0° drifting 1-D noise pattern (i.e., its lines are vertical and drift to the right) and the second is a 240°/–120° drifting 1-D noise pattern (i.e., its lines are slanted from top left to bottom right, and drift down and to the left). Its pattern direction is –60° (Fig. 1; blue arrows indicate drifting direction of each component, and the orange arrow indicates pattern motion direction).
Unikinetic plaids, obtained by pairing a drifting 1-D noise pattern (whose direction determines the direction assigned to the plaid) with a static 1-D noise pattern that is rotated 45° clockwise (we identify these plaids as UP–45) or counterclockwise (UP45) relative to the drifting noise pattern. In a 0° UP45 plaid, the first component is a 0° drifting 1-D noise pattern (i.e., its lines are vertical and drift to the right) and the second one is 45° static 1-D noise pattern (i.e., its lines slant from top left to bottom right). Its pattern motion direction is –45° (Fig. 1; the solid blue line is parallel to the orientation of the static component). A 0° UP-45 plaid has the same first component, but the second is a –45°/315° static 1-D noise pattern (i.e., its lines slant from top right to bottom left). Its pattern motion direction is 45°.
Flicker plaids, also presented in two configurations (FP45 and FP-45), which follow the same conventions as the unikinetic plaids, but in which the static component is replaced by a flickering component (indicated by a dashed blue line in Fig. 1). These stimuli cannot be interpreted as rigidly translating patterns, and thus do not have a pattern motion direction.
These last two types of plaids were the same we used in our eye movements studies. All stimuli were presented in a circular aperture, as if drifting behind it. The size and location of the aperture were fixed in a recording session but varied from session to session as a function of the location and size of the receptive field (RF) of the units being recorded. Unlike classical single-unit recordings, the size of the stimuli used was thus not precisely matched to the classical RF size of each neuron, but generally impinged in their RF surround, by varying extents from neuron to neuron.
Another important difference between our study and earlier single-unit recordings in MT is that, because we used multicontact recording arrays and recorded from up to 19 well-isolated cells simultaneously, we could not optimize the speed of the stimuli to match the preferred speed of each neuron. We thus simply selected for all stimuli a speed that was in the center of the previously reported distribution of preferred speeds for MT neurons (Wang and Movshon, 2016). Theoretical considerations (Kawakami and Okamoto, 1996; Simoncelli and Heeger, 1998) and previous studies (Okamoto et al., 1999; Kumano and Uka, 2013) indicate that, when a stimulus is too slow for an MT pattern cell, its direction tuning curve to a drifting 1-D noise pattern stimulus will be bimodal; similarly, when a stimulus is too fast for an MT component cell, a bimodal direction tuning curve will be observed for 2-D noise patterns. We found many such bimodal tuning curves in our data.
Data analysis and statistics
To analyze the tuning properties of our neuronal sample, we first computed for each neuron the single-trial spike count in a time window starting 50 ms after stimulus onset and ending 50 ms after stimulus offset. Tuning curves were then obtained by averaging the spike count over all the trials in which the same stimulus was presented. The 68% confidence interval (CI) of each mean spike count was computed using standard bootstrap techniques. The resting firing rate was estimated by counting all the spikes generated in the period from 50 ms before to 30 ms after the onset of a stimulus. These time windows were selected because the latency of individual cells ranged between 35 and 50 ms. Single-trial firing rate estimates were obtained by convolving each spike train with a Gaussian kernel (8 ms SD, unit area). Mean firing rates over time for a given stimulus were then computed by averaging these estimates across the relevant trials (obtaining the equivalent of a smoothed poststimulus time histogram).
The preferred direction of each cell was estimated from the vector average of the tuning curve associated with 1-D or 2-D noise stimuli. 2-D patterns usually provide a more reliable estimate of the preferred direction of MT neurons (Wang and Movshon, 2016; Quaia et al., 2021), and by default we thus used these stimuli. For the (putatively component) cells in which the 2-D noise tuning curve was bimodal (see below for the rule used to identify them), we however used the 1-D noise tuning curve instead. While theoretically the two peaks of the bimodal tuning curve should straddle the preferred direction, and thus the vector average should still provide a reliable measure, we found that this was not always true (Quaia et al., 2021). There was a significant subset of cases in which the two peaks were quite asymmetric, and the direction based on 1-D noise responses provided a more reliable estimate (as determined from visual inspection of the responses to the various plaids). An example can be seen in Figure 2 (bottom row).
Several functions (polynomial, spline, Lowess, etc.) could be used to interpolate tuning curves. Given their circular periodicity, the simplest function is, however, a low-pass filtered version of a Fourier interpolator. Because we took 12 samples around the circle, the Fourier transform has seven components (mean response, F, 2F, 3F, 4F, 5F, and 6F, the last corresponding to the Nyquist frequency). To smooth it, we simply set to zero the amplitude of the two highest-frequency components (i.e., F5 and F6, those with a period of 72° and 60°, respectively). Unlike other fitting/interpolation schemes, this one is deterministic (no free parameters).
Previous studies used two metrics to classify MT cells as either component or pattern: the pattern index (Movshon et al., 1985; Smith et al., 2005; Rust et al., 2006), which compares the direction tuning curves to 1-D patterns and Type I plaids, and the unikinetic rotation (Wallisch and Movshon, 2019), which compares the tuning curves to two unikinetic plaids. Unfortunately, both measures perform well (i.e., as designed) only if the speed of the stimulus is matched to the preferred speed of the cell. However, when the stimulus is not optimized for the cell, as it happens inevitably with multicontact electrodes, these measures become unreliable (Quaia et al., 2021). Accordingly, we propose two alternative measures, which are robust to the effects of suboptimal stimulus speed. These measures are based on the Fourier decomposition of the direction tuning curves to Type I and unikinetic/flicker plaids, and in particular on the coefficients of the first three harmonics (we indicate the amplitude coefficients as A1, A2, and A3 and the phase coefficients as P1, P2, and P3, the latter we express in degrees, in the range] −180°, 180°]).
Based on the direction tuning curve to Type I plaids (TI), we define the bikinetic pattern index (bPI) as follows:
This index varies between −1 (putative component cell) and 1 (putative pattern cell). It is based on the observation that the direction tuning curve for Type I plaids should be unimodal in pattern cells, and bimodal in component cells. With closely sampled directions, one could directly estimate the degree to which a tuning curve is unimodal. However, with the coarse sampling usually used in neurophysiological experiments, some form of interpolation would be required. The alignment of the phase of different Fourier components, which the above formula estimates, represents a more principled (and, given the properties of the Fourier transform, in same sense optimal) approach, to estimate the degree to which the tuning curve to Type I plaids is unimodal versus bimodal.
For unikinetic plaids, the unikinetic rotation (Wallisch and Movshon, 2019) estimates the relative rotation of the tuning curves to two unikinetic plaids. Also in this case, coarse direction sampling makes computing this rotation fraught, a problem that we once again addressed by relying on the coefficients of the Fourier components. In this case, the index we defined is as follows:
Like the unikinetic rotation, it estimates the relative rotation of the tuning curves to two unikinetic plaids. One might think that the first harmonic would be most sensitive to such rotation. Indeed, in going from a component to a pattern cell, the tuning curve to unikinetic plaids becomes broader, and the direction in which it broadens is different for UP45 and UP-45 plaids. However, such broadening is determined mostly by the relative phase of the second harmonic, which is close to 0° in component cells, and close to 180° in pattern cells. Empirically, we found that considering putative opponent pairs of such cells makes the index slightly more robust, especially when the index is applied to flicker plaids. Based on the direction tuning curves to unikinetic plaids (UP45 and UP-45), and their opponent versions, we then define the unikinetic pattern index (uPI) as follows:
Like the bPI, this index also conveniently varies between −1 (putative component cell) and 1 (putative pattern cell). We can define a similar index, indicated as fPI, for flicker plaids.
Finally, we define the following global pattern index:
The interested reader can find a thorough analysis and validation of these indices elsewhere (Quaia et al., 2021). Because, unlike the pattern index, our indices are not based on comparing actual responses to model responses, their value cannot be used to statistically evaluate whether a cell is a significantly better match to a pattern or to a component ideal. Our indices are thus like the indices that are routinely used to evaluate the ability of cells to signal orientation, motion direction, binocular disparity, etc.
To identify cells with a bimodal tuning curve to 1-D or 2-D noise (for purposes of properly inferring the preferred direction of a cell, see above), we computed analytically the first and second derivative of the Fourier interpolator described above, and thus determined the location and magnitude of peaks and troughs of the tuning curve. We labeled a tuning curve as bimodal if it had two peaks, one on each side of the preferred direction of the cell, separated by between 60° and 150°, and with the smaller peak being at least 25% of the main peak. Our bPI and uPI indices are not dependent on this classification.
Population responses
In addition to the single-unit measures described above, we computed the summed activity of subpopulations of neurons. The first step was to rotate the tuning curves of the cells in a subpopulation to align their preferred direction with the rightward direction (0° under our convention). We did this both with the raw data (in which case, given our directional sampling, the alignment could only be conducted with a margin of error of ±15°, well above the precision of our preferred direction measure, whose average 68% CI was equal to 3.11°), or by first interpolating the tuning curves with the Fourier method above described and then rotating them (in this case, with a margin of error of ±0.5°, below the precision of our preferred direction measure). The two methods yielded similar results, and the results presented in this paper are based on the first method.
Once we had the tuning curves for the neurons in a population so aligned, we then computed the population firing rate (
The rationale for using the square root operator is that, if neurons fire according to a Poisson process, as often assumed, taking the square root approximately equalizes the variance (Kihlberg et al., 1972). It would thus make sense for a downstream neuron to pool information this way, that is, by underweighting neurons that fire more (but are less reliable) relative to those that fire less (but are more reliable). Whether this occurs in the brain is not known. If all the neurons have identical discharge,
Pattern direction decoding model
We also simulated a linear decoding model based on the spike trains recorded from the 140 cells that we classified as pattern-like (based on the value of their global pattern index [gPI]). To provide a continuous readout over time, we first converted spikes into an estimate of the putative EPSP induced by the cell in a downstream neuron. This was obtained by convolving each spike train from our pattern cells with a causal kernel of the following form:
Next, we assumed that the readout mechanism is composed of 360 cells, one for each degree of pattern motion direction. Each readout cell computes a weighted sum of the incoming EPSPs from the pattern cells. The weights are imposed by assuming that each pattern cell projects to the readout units according to a Gaussian weight function, centered at the preferred direction of the cell θ and with a standard deviation (SD) that matches the bandwidth σ of the tuning curve to 2-D noise. The projection weights of each cell are constrained to sum to unity. The weights can then be collated into a projection matrix W, having one row for each readout unit and one column for each pattern cell (i.e., 360 rows by 140 columns).
The activation over time of our readout units in one pseudo-trial is then
For each of the 12 directions sampled with the 2-D noise stimulus, we computed for each pattern cell the average postsynaptic potential over all trials recorded, collated them into a matrix P, and then computed a matrix A according to the above formula. We then summed the activation matrices for the 12 directions, and computed their average over time, obtaining a 360-D vector T. Under uniform distribution of preferred directions and firing rates, this vector should be flat, and we can then compute a vector C such that each element is equal to 1 over the corresponding element in T.
For the readout, we then first multiply each column of A element-wise by C, and then identify the index associated with the largest value. Obviously, this will result, on average, in a near-perfect readout of the direction of motion of 2-D noise stimuli, but our interest lies on how the direction of motion of plaids is decoded. This is a simpler readout scheme compared with those used in other studies (e.g., Jazayeri and Movshon, 2006; Graf et al., 2011; Berens et al., 2012; Yates et al., 2020): It makes no claims of optimality, it uses only positive projection weights, and it does not implicitly assume antagonistic (i.e., opponent) decoding. Importantly, it has no tunable (or trainable) parameters.
Results
The mechanisms that support the combination of 1-D (or component) motion signals into the appropriate planar (or pattern) motion signal can be studied by recording from MT neurons while presenting plaid stimuli. That a trace of this operation first emerges in MT has been firmly established (Movshon et al., 1985; Movshon and Newsome, 1996); what is still debated is whether this operation is finalized in area MT, or whether instead a partial result is fed to downstream areas for further processing, as recently suggested (Khawaja et al., 2013; Wallisch and Movshon, 2019).
We addressed this question by recording with multicontact electrodes from the MT area of 2 macaque monkeys. After filtering out units that did not meet our criteria for isolation and direction-selectivity (direction index > 0.65), we report here the response properties of 386 well-isolated single units, which were recorded while presenting a wide range of visual patterns. Like others before us, we probed MT units using orientation narrowband (1-D) components and plaids; unlike others, we used 1-D components that are broadband in spatial frequency (1-D noise patterns), and plaids obtained by combining two such components. In addition to the classic Type I plaids, in which two 1-D components drift at the same speed in two directions 120° apart, we also used unikinetic plaids (indicated as UP in figures), in which one 1-D component drifts while another is static and oriented 45° away (in either direction, yielding two families of unikinetic plaids). Unikinetic plaids, although composed of sinusoidal gratings, have been used recently in two other MT studies (Khawaja et al., 2013; Wallisch and Movshon, 2019). We also used, for the first time in neurophysiological recordings, flicker plaids, in which one 1-D noise pattern drifts while another oriented 45° away (in either direction) is generated anew on each frame (i.e., it flickers in place). Such stimuli induce unexpected reflexive eye movements in both humans and monkeys (Quaia et al., 2016).
MT cell classification
MT cells have been typically classified as component-like or pattern-like cells by comparing their direction tuning curves to plaids to models based on their direction tuning curve measured with a single drifting grating. A basic assumption of such comparisons is that the latter is unimodal. Unfortunately, this assumption is violated in pattern-like cells when the drifting speed of the stimulus is lower than the preferred speed of the cell (Okamoto et al., 1999; Kumano and Uka, 2013). When recording with single electrodes, this is usually avoided by optimizing the speed of the stimulus to each cell. With multicontact electrodes, responses of neurons having different preferred speeds to the same stimulus are recorded, and such bimodal tuning curves are routinely observed.
In Figure 2, we show, for five neurons (selected because they are representative of the diversity we observed), their direction tuning curves to some of the stimuli we presented. As explained in Materials and Methods, what we mean as stimulus direction in the case of plaids is not the pattern motion of the stimulus (i.e., the direction in which the plaid can be seen as moving rigidly), but rather the motion of the only drifting 1-D noise pattern for unikinetic and flicker plaids and of one of the two 1-D noise patterns (the one whose direction of motion is counterclockwise relative to the other) for Type I plaids. This convention, a departure from previous studies, is because of the lack of a properly defined pattern motion direction for flicker plaids. Since keeping track of all these directions and conventions is not easy, the configurations of the components of plaid stimuli for some stimulus directions are shown as visual aids in some figures.
As in previous studies, in MT we found cells that behave like prototypical component cells (Fig. 2, second row), and thus respond to the individual components in the plaids. Others instead strongly resemble ideal pattern cells (Fig. 2, fourth row), responding to the plaids as if they were a single object moving in one direction. Most neurons, however, are difficult to categorize, with direction tuning curves that do not match either ideal. Focusing on unikinetic and flicker plaids, we note that the direction tuning curves for some cells (Fig. 2, top two rows) match our expectations of cells responding only to the individual components. There are, however, also cells (Fig. 2, third row) that behave approximately like an ideal pattern cell would be expected to when presented with unikinetic plaids, but respond identically to the two flicker plaids, as expected from a component cell. Still other cells (Fig. 2, bottom two rows) respond similarly to unikinetic and flicker plaids. We also found cells with a double-peaked direction tuning curve for 2-D noise patterns (random dot patterns), which is expected (Kawakami and Okamoto, 1996; Simoncelli and Heeger, 1998) if the stimulus moves faster than the preferred speed of a component cell (Fig. 1, top row). We call these cells “component-fast.” Also present are cells with a double-peaked direction tuning curve for 1-D noise patterns, which as noted above is expected when the stimulus moves slower than the preferred speed of a pattern cell (Fig. 1, bottom row). We call these cells “pattern-slow.” Unlike theoretical expectations, the two peaks are not always of equal height; often one peak is considerably higher than the other, making classic measures used to determine direction preference, such as the vector average/center of mass measure, unreliable. The interested reader can find additional examples in Quaia et al. (2021).
Because the measures that have been traditionally used to classify MT cells as component or pattern do not work reliably for “pattern-slow” cells (Quaia et al., 2021), we introduced two new measures (see Materials and Methods): one based on the responses of the cell to Type I plaids (bPI) and one based on the responses to unikinetic plaids (uPI). Unlike with the original pattern index, responses to 1-D stimuli are not considered in computing these measures.
A scatter plot, for all our cells, of these two measures is shown in Figure 3. The two measures are strongly correlated (green line, r = 0.61, p = 4.19 × 10−41), with the second and fourth quadrants (where lie cells that are classified as component by one measure and pattern by the other) containing the fewest data points. We also computed a compound measure, called gPI, equal to the average of these two measures (and thus proportional to the projection of the data points on the identity axis, dashed black line). Based on this measure, we then classify the cells as component (blue dots) if gPI <−0.25 (blue shaded region), as pattern (orange dots) if gPI > 0.25 (orange shaded region), and mixed (gray dots) otherwise. These threshold values were chosen with the goal of having most points in the first quadrant classified as pattern cells, most points in the third quadrant as component cells, and most points in the first and fourth quadrants (i.e., those cells that behave more component-like with one type of plaids, but more pattern-like with the other) as mixed.
How do pattern cells respond to flicker plaids? Inspection of Figure 2 indicates that their response varies a lot from cell to cell. To quantify this diversity, and to see how the response to flicker plaids compares to that induced by unikinetic plaids, we computed for flicker plaids a measure (fPI, flicker Pattern Index) equivalent to the uPI index shown above. In Figure 4 (left), we show a scatter plot of these two measures, with individual cells color-coded as in Figure 3. The two measures are weakly correlated (green line, r = 0.239, p = 2.02 × 10−6); note how many pattern cells (orange dots) fall in the fourth quadrant and are thus associated with a negative fPI. Such a weak correlation suggests considerable heterogeneity in the mechanisms that produce these responses. These measures were computed using a spike count window that encompasses the entire stimulus presentation duration (500 ms). In Figure 4 (right), we plot the same measure based on a shorter window, at the beginning of the response (50-100 ms from stimulus onset). In this early period, the correlation between the two measures is larger (green line, r = 0.431, p = 6.76 × 10−19), and significantly so (p = 0.0026, t test on Fisher-Z-transformed correlation values; [0.03-0.31] 95% nonparametric CI of the difference), indicating that responses to unikinetic and flicker plaids are initially similar, but diverge over time. Many more pattern cells fall in the first quadrant in Figure 4 (right), and thus have a positive fPI in this early interval. Many cells thus respond to flicker plaids as if they were unikinetic plaids (thus hallucinating a spurious pattern motion signal from stimuli that physically do not have one) soon after stimulus onset, but much less so later on, a point that we will come back to.
Population coding of pattern motion direction
Our results, obtained using spatially broadband stimuli, agree with previous reports from recordings using sinewave gratings and plaids. Based on those results, it was concluded that either few MT cells signal pattern motion direction (those in the top right corner in Fig. 3), and only those signals are propagated to later areas, or further processing is necessary, for example in the MST area. There is, however, a third possibility: The pattern motion direction signal might be more evident at the population level than at the level of individual neurons. This might seem counterintuitive, and it would not be expected if MT cells fell on a continuum from component-like to pattern-like. However, this might occur if different cells deviate from the pattern ideal in different ways. If such idiosyncrasies wash out at the population level, the population activity might reflect pattern motion much more faithfully than expected from the average indices across the same population.
To investigate this possibility, we grouped our cells into three classes (151 component cells, 95 mixed cells, and 140 pattern cells) based on the value of gPI, as described above. We then rotated the tuning curves of all neurons so that their preferred direction aligned with 0°, and then pooled the responses within each subpopulation (see Materials and Methods). This is thus an artificial population, in which all the cells prefer motion to the right (0° in our convention). It allows us to average out the idiosyncrasies in the directional tuning of each cell to see what their average directional tuning curve to each of our stimuli looks like. This sort of pooling is possible because all our cells were recorded while the same stimuli were presented; it would not be justified in the context of classic single-electrode recording methods, when different stimuli are presented to each cell (optimized to its preferences). In Figures 5 and 6, we plot the time evolution of each subpopulation firing rate, separately for each of the stimuli used. The number at the bottom right corner of each panel indicates the maximum firing rate for that panel (i.e., what the most saturated red in that panel corresponds to). Positive responses (above the spontaneous rate for the subpopulation) are shown in red, negative values in blue. Several trends are immediately obvious. First, the direction bandwidth for 1-D noise stimuli increases from component to pattern cells (Fig. 5, top row), as previously reported with sine waves, although as we noted earlier in our data this result is contaminated by the presence of pattern-slow cells. Second, the direction bandwidth for 2-D noise stimuli decreases from component to pattern cells (Fig. 5, second row), which again is expected from previous experimental and computational studies, but again in our case there is a confound, associated with the component-fast cells. As previously reported (Albright, 1984), responses to static stimuli (available only for 295 neurons in our study: 111 component cells, 73 mixed cells, and 112 pattern cells), are largely transient. In component cells, static stimuli parallel to the preferred drifting stimulus (i.e., 0° or 180°) generate the strongest response, but for pattern cells stronger responses are evoked by orthogonal static stimuli (i.e., ±90°, Fig. 5, third row). The same phenomenon is also observed for responses to flickering noise (Fig. 5, bottom row), which however produce much more sustained responses than static stimuli at 0° or 180° (whereas the response to orthogonal flickering stimuli are transient, following a time course similar to that seen with static stimuli). Finally, the magnitude of the response to 2-D noise stimuli is similar across groups, whereas for 1-D noise it drops considerably from component to pattern cells. Orientation broadband stimuli are thus much better at driving all types of MT cells, an important advantage for studies of population activity.
In Figure 6, we show the time evolution of population responses to plaid stimuli. Component cells show two clear peaks in response to Type I stimuli, one at 0° and the other at 120°, the directions at which either of the two components is aligned with the preferred direction of the population. They also respond to unikinetic and flicker plaids, with primary peaks at 0° (drifting component aligned with the preferred direction of the population) and secondary peaks at ±135° (static or flickering component aligned with the preferred orientation of the population). The secondary peaks are more transient for unikinetic than for flicker plaids. Direction selective V1 cells would be expected to behave just like this. The aggregate response of the pattern group instead carries a clear pattern direction signal. There is a single peak at 60° for Type I plaids, and a single peak at ∼±45° in unikinetic plaids. The responses to flicker plaids are the most interesting. Initially, they mimic the response to unikinetic plaids, with a peak at ∼±45°, but then the tuning broadens considerably, so that later on a wide range of stimulus directions, centered at ∼0°, induce similar activations. Not surprisingly, the mixed group is somewhere in the middle.
The pattern subpopulation thus encodes pattern motion direction quite well. Another way of quantifying this is by computing our pattern indices for the population tuning curve, measured over the 50-550 ms and 50-100 ms poststimulus time intervals, as done before for individual cells. The values we obtained (bPI = 0.97, uPI = 0.76, fPI = 0.44, uPIearly = 0.92, fPIearly = 0.63, gPI = 0.87) are plotted in Figures 3 and 4 as black asterisks. These are all considerably higher than the average values of these parameters for the pattern cells included in the population (bPI = 0.69, uPI = 0.43, fPI = −0.05, uPIearly = 0.47, fPIearly = 0.12, gPI = 0.56): The population performs better than would be expected from the performance of individual cells.
To more clearly highlight the time evolution of plaid responses in pattern cells, we took time slices through these population activities (Fig. 7). To increase the signal-to-noise ratio, we mirrored the responses to UP-45 and FP-45 stimuli at ∼0° and averaged them with those to UP45 and FP45 stimuli. At all time points (different line colors, see legend), the response to Type I plaids clearly peaks at ∼60°. Similarly, with unikinetic plaids, the peak is always clearly and unequivocally at 45°. With flicker plaids, a single peak at ∼45° is clearly present at 75 ms, but subsequently two peaks of similar magnitude emerge: one at ∼45° and one at ∼−45°.
Linear readout from MT pattern cells
The previous analysis considered 140 pattern cells, all tuned to the same direction, obtained by rotating the tuning curves of the cells we recorded, and computed the average of their square-root transformed responses. If we were to assume that similar populations tile the entire range of preferred directions, it is obvious that a simple readout of such populations would be sufficient to reliably infer the direction of pattern motion. We could, however, also ask how a linear readout model applied directly to the single-trial output of our 140 pattern cells, without rotating their tuning curves or introducing additional simulated cells, would fare (see Pattern direction decoding model). Because we did not record from all cells simultaneously, we sampled, for each stimulus type and direction of motion, one trial for each neuron, creating a pseudo-simultaneous population activity, effectively discarding noise-correlations between neurons. This population of spike trains was fed to a simple linear decoder model, which produced an estimate over time of the pattern motion direction. Repeating this procedure 1000 times for each stimulus type and direction allowed us to compute the circular mean and SD of this estimate (again, over time).
In Figure 8, we show the result of this analysis. In each panel we plot, for one of the eight stimulus types, the difference between the estimated pattern motion direction and the direction of motion of the stimulus averaged over the 12 directions. The time trace is truncated at 300 ms because decoding performance has stabilized by that time. Perfect decoding is indicated by the dashed black line. The direction of motion of the 2-D noise stimulus is extracted quickly and highly reliably; this was, however, to some extent (the average, not the consistency) built into the model because we used the responses to these stimuli to compensate for the uneven sampling of motion directions in our sample of neurons (see Materials and Methods; the responses to other stimuli were not used to constrain the model in any way, and are thus predictions of the decoding model). The motion direction of the 1-D noise stimulus is also correctly identified on average, although not as reliably. This indicates that MT pattern cells are better adept at extracting the direction of motion of complex stimuli than that of the artificial 1-D stimuli often used in the laboratory. The directions of motion of UP90 plaids and Type I plaids are also extracted quickly and reliably. The linear readout of our pattern subpopulation was more reliable for UP90 than for 1-D noise, indicating that the addition of the orthogonal static grating reduced directional ambiguity.
For unikinetic plaids, the direction is initially correctly identified, but a small bias toward the direction of the moving 1-D stimulus (i.e., toward 0° with our conventions) emerges later on. For flicker plaid, the behavior is similar, but the bias is considerably larger, and the reliability is also lower.
The above decoding analysis considered only the cells that we classified as pattern-like, based on the gPI being >0.25. We could, however, also consider populations that either include fewer cells, limiting ourselves to the most pattern-like, or more cells, including more mixed or component-like cells. To explore the gamut of these possibilities, we thus repeated the above analysis for different values of the gPI threshold, ranging from −1 (i.e., all 386 cells are included) to 0.75 (only the 31 most pattern-like cells are included). To summarize the data, we report (Fig. 9) the average directional offset (left column) and its SD (right column) in two time windows: one early in the response (70-140 ms) and one late (230–300 ms). Each row corresponds to a different stimulus type, and we average the deviations (and SD) for the two types of unikinetic and flicker plaids. The direction of 1-D and 2-D noise stimuli can be decoded quite well on average with all populations, although the SD increases as the number of cells decreases. Even Type I plaids can be decoded rather well with all populations, which is not very surprising given that their pattern direction is equal to the average of the moving direction of their two components. Unikinetic and flicker plaids are, however, decoded well only when the population is limited to pattern-like cells. This should also not be surprising since, with these stimuli, the component cells ignore the static/flickering stimulus, only respond to the 1-D drifting component, and thus “vote” 0°. What is interesting is that our initial choice for the threshold (0.25) is in some sense optimal, as further reducing the population leads to not only a steep increase in SD, but also a decrease in accuracy. This reinforces our conclusion that pattern motion is encoded in a population response, and not in the output of a handful of pattern cells.
Mechanisms of pattern motion computation
Over the years, several mechanisms have been proposed as being necessary or useful for generating pattern motion direction signals from direction-selective V1 neurons, the major bottom-up input to MT cells (Movshon and Newsome, 1996; Simoncelli and Heeger, 1998; Perrone, 2004; Rust et al., 2006).
First, it has been suggested that pattern cells should pool over directional inputs tuned to nearby directions, which would explain their broader bandwidth to 1-D sinusoidal stimuli (Wang and Movshon, 2016), a phenomenon we also observed with our 1-D noise patterns (Figure 5, although in our case this comparison is contaminated by the bimodal responses to 1-D stimuli in the pattern-slow cells) (see Quaia et al., 2021).
Second, it has been proposed that pattern cells have stronger opponency (Rust et al., 2006), meaning that they should be more strongly suppressed by movement in the opposite direction of motion. Cells with strong opponency should be weakly activated by flickering stimuli (which can be seen as the sum of stimuli moving in opposite directions), because in such cells the response to the preferred direction of motion would be suppressed by motion energy in the opposite direction. In contrast, cells with little opponency should respond to a flickering stimulus more or less as they would to a 1-D stimulus drifting in their preferred direction (motion in the opposite direction would elicit neither enhancement nor suppression). We found that pattern cells indeed respond less strongly to flickering stimuli (compared with their response to 1-D drifting stimuli) than component cells. In Figure 10A, we plot the base 2 logarithm of the ratio between the response to flickering and drifting 1-D noise stimuli, at the preferred direction for each cell, as a function of the gPI index. Cells are colored as in Figure 3; we clipped the ratios at ±4, to avoid having the correlation dominated by neurons (mostly pattern cells) that did not respond to flicker plaids (points at −4). There is a strong negative correlation between the two (green line, r = −0.3, p = 1.95 × 10−9), with larger values of gPI associated with weaker response to flicker plaids, indicative of stronger opponency. The mean log ratio is −0.646 for component cells and −1.4 for pattern cells, a significant difference (Kolmogorov–Smirnov test, p = 2.09 × 10−10).
A third proposed ingredient, especially useful for shaping the response to unikinetic plaids, is enhancement from static stimuli orthogonal to the preferred orientation of the cell. Because MT cells respond only weakly and transiently to static stimuli (Albright, 1984), it has been questioned (Wallisch and Movshon, 2019) whether such signal could account for the extraction of pattern motion direction in unikinetic plaids. Looking directly at responses to static stimuli might, however, not be the best way of revealing this mechanism, since nonlinearities are expected to play a significant role in pattern motion computations. To directly probe this mechanism, we presented an additional stimulus in which a drifting 1-D noise pattern is paired with an orthogonal static 1-D noise pattern. We indicate this stimulus class as UP90. This is a stimulus that is usually perceived as transparent, and that might not be expected to strongly affect motion mechanisms. However, if orthogonal static stimuli excite pattern cells, we expect pattern cells (but not component cells) to show a stronger response to the UP90 stimulus than to drifting 1-D noise. In Figure 10B, we plot the base 2 logarithm of the ratio between the response to UP90 and drifting 1-D noise stimuli, at the preferred direction for each cell, as a function of the gPI. We clipped the ratios to ±4, to avoid having the correlation dominated by neurons (mostly pattern cells) that responded much more strongly to UP90 stimuli (points at 4). There is a strong positive correlation between the two (green line, r = 0.481, p = 9.94 × 10−24), with larger values of gPI associated with stronger enhancement from the presence of an orthogonal static stimulus (the correlation is stronger with the uPI [r = 0.49, p = 10−24] than with the bPI [r = 0.385, p = 4.56 = 7 × 10−15], indicating that this mechanism plays a particularly important role in cells that signal pattern motion with unikinetic plaids, an issue we return to later). The mean log ratio is 0.138 for component cells and 0.93 for pattern cells, a highly significant difference (Kolmogorov–Smirnov test, p = 6.43 × 10−24). These stronger neural responses to UP90 than to 1-D stimuli in pattern cells correlate with, and are presumably responsible for, the larger reflexive eye movements induced by UP90 stimuli (Quaia et al., 2015).
There is a fourth potential mechanism that, as far as we know, has not been proposed before: Inhibition from static stimuli parallel to the preferred orientation of the cell. To see why it might be useful, it is instructive to compare the population response of our group of pattern cells to 1-D stimuli and to unikinetic plaids. In Figure 11, we plot (top row) with thick lines the response of the subpopulation of cells (48 units) having unimodal tuning curves to 1-D stimuli (for bimodality criterion used, see Materials and Methods), whereas thin lines are used for the entire subpopulation of pattern cells (140 units). We also plot (bottom row) the difference between the two, which highlights the directional signals that need to be added or subtracted from the tuning curve to 1-D patterns to obtain the tuning curves to unikinetic plaids. This difference is what must be somehow contributed by the static grating present in each unikinetic plaid. While the magnitude of the enhancement in correspondence with an orthogonal (horizontal in this example) static component is larger, the need for suppression by a parallel (vertical in this example) static component is considerable. The enhancement is only present when the cell is driven by the 1-D drifting component, again indicating the importance of opponency and nonlinearities.
Unfortunately, probing this mechanism directly by creating the equivalent of the UP90 stimulus, something that could be called UP0, is not readily feasible with noise stimuli. Looking at the time course of the difference in response between 1-D stimuli and unikinetic and flicker plaids can, however, give us some insights on how this mechanism might operate. In Figure 12 (top row), we show the evolution over time, averaged over the 140 pattern cells, of the responses to 1-D noise patterns and unikinetic and flicker plaids. We focus on the evolution at the stimulus orientations associated with the strongest enhancement and suppression in Figure 11 (i.e., ±45°). In the bottom row, as in Figure 11, we show the difference between responses to 1-D and UP/FP stimuli, thus highlighting the impact of the static/flickering pattern over time. With unikinetic plaids (left column), both enhancement and suppression exhibit an initial large transient followed by a lower sustained effect. As we could also evince from Figure 11, enhancement is stronger in magnitude than suppression. The effect for flicker plaids (right column) is even more interesting, as it potentially reveals both the presence of this static parallel suppression mechanism, and why pattern cells derive a (spurious) pattern signal from flicker plaids only initially. While the enhancement at 45° is similar to that observed for unikinetic plaids (although much reduced in strength, approximately by half), at −45° there is an initial weak transient suppression, but no sustained suppression. The weaker effect (both enhancement and suppression) of flickering compared with static stimuli can be easily explained by positing that, in both cases, this signal originates in nondirection selective V1 cells (i.e., cells that have a low-pass temporal frequency tuning): Flickering stimuli have their energy distributed at all temporal frequencies, and compared with static stimuli would thus be less effective at driving these cells. We can further speculate that the signal delivered by these cells to MT is more effective at enhancing than at suppressing the motion signals in MT (two different subpopulations of V1 low-pass cells will carry the orthogonal enhancement and parallel suppression, and thus we only need to hypothesize that the excitatory connections are stronger than the inhibitory ones). Under these assumptions, a simple threshold nonlinearity on the suppressive signal would be sufficient to quantitatively reproduce our results.
Discussion
By recording from a large number of MT cells while presenting a wide range of stimuli, we confirmed that only few MT cells encode pattern motion direction. And if a broader set of stimuli were used, it may be that no single neurons do. However, we also demonstrated, for the first time, that when the summed response of a subpopulation of such cells is considered, an accurate and unequivocal pattern motion direction signal emerges (at least with the plaids tested so far).
Notably, we found (Fig. 7) that simply pooling across a sizable subpopulation of MT neurons yields a much clearer pattern direction signal than that provided on average by the cells in the subpopulation (Figs. 3 and 4). This implies that there is not a simple continuum from component to pattern cells in MT, as also suggested by others using different arguments (Wang and Movshon, 2016; Wallisch and Movshon, 2019). Instead, an individual cell can have some properties of a component cell and some of a pattern cell, or, from a different perspective, can fail to be a proper pattern cell along one of many dimensions. Thinking in terms of a popular model of pattern motion computation (Simoncelli and Heeger, 1998; Nishimoto and Gallant, 2011; Quaia et al., 2016), which envisions a pattern cell as filling a velocity plane in Fourier space, different cells might only partially fill the plane, but in different ways. Indices that attempt to summarize the degree to which a cell is pattern-like versus component-like, those used in previous studies as well as those we used here, will thus only be able to provide an incomplete picture. This is not so much a failure of MT neurons, but rather a failure of the neuron-as-a-feature-detector doctrine, which attempts to assign a definite functional role to each neuron. Instead, we showed that MT generates a pattern direction signal at the population level: The code is distributed. This suggests that the specialization (i.e., learning) for signaling pattern motion direction operates at the population, and not at the single-neuron, level. This might follow quite simply by an inability of the system to solve the credit assignment problem (which affects any form of learning) at the single-neuron level.
This result has an additional implication: The widespread presence of pattern signals in single neurons in area MST (Khawaja et al., 2013), which has direct reciprocal connections with area MT (Maunsell and Van Essen, 1983), needs not be taken as evidence of further processing occurring downstream from MT: A simple summation of (selected) inputs from MT is sufficient.
We also identified a novel mechanism that plays a key role in shaping the response of individual MT cells to pattern motion: Suppression from a signal tuned to the same orientation as that preferred by the cell, but restricted to low temporal frequencies after an initial broadband transient, contributes significantly to the rotation of tuning curves with unikinetic and flicker plaids (Fig. 11). This is in addition to the previously proposed excitatory signal from static stimuli orthogonal to preferred orientation of the cell for moving stimuli (Albright, 1984; Simoncelli and Heeger, 1998; Quaia et al., 2016).
Furthermore, we showed (Fig. 12) that quickly vanishing suppression from parallel flickering stimuli could account for the short life of the pattern motion response to flicker plaids (Fig. 8). The robust tracking eye movements we previously reported in response to these stimuli (Quaia et al., 2016) are at ultrashort latencies (50 ms in monkeys, 70 ms in humans) and are thus mediated by the earliest cortical responses. However, unlike unikinetic plaids, such stimuli induce only a weak appearance of motion parallel to the flickering lines, made difficult to judge by the fact that they do not cohere (Quaia et al., 2016). Given the neural data reported here, positing a longer temporal integration window for perception than for eye movements might suffice to account for both results with a single linear decoding model, although a quantitative comparison in the same subjects would strengthen this interpretation.
Nishimoto and Gallant (2011) inferred the spatiotemporal tuning of MT neurons from their responses to natural movies and found little evidence for the presence of either excitatory or inhibitory effects from static signals; they concluded that this supports the view that MT is highly specialized for motion. As we have argued here and elsewhere (Quaia et al., 2016), there is however strong psychophysical and physiological evidence that static signals strongly influence motion perception, tracking eye movements, and the responses of MT neurons (motion-from-form mechanisms). This discrepancy might be attributed to a limit, acknowledged by the authors, in the Nishimoto and Gallant (2011) study: They treated spatiotemporal energy anywhere in the RF of an MT neuron in the same way. However, it has been shown (Majaj et al., 2007) that signals for pattern motion identification are only integrated in MT if they co-occur at the level of V1 RF sized regions (which are 20-100 times smaller than MT RF sizes at the same eccentricity) (Born and Bradley, 2005; Wang and Movshon, 2016). The common presence in naturalistic movies of static signals not colocalized with motion signals, to which MT cells would indeed respond only very weakly, could thus have easily led to the inference that MT cells are not driven (or suppressed) by static signals. This is one of those cases in which artificial stimuli, like those used in this study, have advantages over naturalistic stimuli (Rust and Movshon, 2005).
The need for static and motion signals to be tightly colocalized, at a scale considerably smaller than that of MT RFs, implies that the integration of these signals, presumably arriving to MT via different routes, must occur at the level of MT subunits (i.e., possibly within dendritic tree compartments). Furthermore, motion signals must act in a gating manner over static signals, a highly nonlinear operation. This might account for the observation that a static signal in isolation only induces a weak, transient, response in MT pattern cells (Fig. 5); and yet when coupled with a drifting pattern in an unikinetic plaid, it can exert a sustained effect (Fig. 12) (see also Wallisch and Movshon, 2019). While it is not difficult to imagine a scheme under which the observed signals can arise, dissecting this operation will require further investigation, and might reveal how signals from disparate sources are nonlinearly integrated in single cells, possibly leading to a significant revision of existing models of pattern motion computation at the single-cell level.
In conclusion, we have added to our understanding of which mechanisms mediate the emergence of pattern motion signals at the level of individual MT cells, and we have shown that simple summation of MT neurons' responses might be sufficient to explain a wide range of both perceptual and eye movement responses to Type I plaids, unikinetic plaids, and flicker plaids, even in cases where perceptual and eye movement responses differ from each other.
Footnotes
This work was supported by National Eye Institute Intramural Research Program.
The authors declare no competing financial interests.
- Correspondence should be addressed to Christian Quaia at quaiac{at}nei.nih.gov