Abstract
Natural scenes often contain multiple objects and surfaces. However, how neurons in the visual cortex represent multiple visual stimuli is not well understood. Previous studies have shown that, when multiple stimuli compete in one feature domain, the evoked neuronal response is biased toward the stimulus that has a stronger signal strength. We recorded from two male macaques to investigate how neurons in the middle temporal cortex (MT) represent multiple stimuli that compete in more than one feature domain. Visual stimuli were two random-dot patches moving in different directions. One stimulus had low luminance contrast and moved with high coherence, whereas the other had high contrast and moved with low coherence. We found that how MT neurons represent multiple stimuli depended on the spatial arrangement. When two stimuli were overlapping, MT responses were dominated by the stimulus component that had high contrast. When two stimuli were spatially separated within the receptive fields, the contrast dominance was abolished. We found the same results when using contrast to compete with motion speed. Our neural data and computer simulations using a V1-MT model suggest that the contrast dominance found with overlapping stimuli is due to normalization occurring at an input stage fed to MT, and MT neurons cannot overturn this bias based on their own feature selectivity. The interaction between spatially separated stimuli can largely be explained by normalization within MT. Our results revealed new rules on stimulus competition and highlighted the impact of hierarchical processing on representing multiple stimuli in the visual cortex.
SIGNIFICANCE STATEMENT Previous studies have shown that the neural representation of multiple visual stimuli can be accounted for by a divisive normalization model. By using multiple stimuli that compete in more than one feature domain, we found that luminance contrast has a dominant effect in determining competition between multiple stimuli when they are overlapping but not spatially separated. Our results revealed that neuronal responses to multiple stimuli in a given cortical area cannot be simply predicted by the population neural responses elicited in that area by the individual stimulus components. To understand the neural representation of multiple stimuli, rather than considering response normalization only within the area of interest, one must consider the computations including normalization occurring along the hierarchical visual pathway.
Introduction
In natural scenes, multiple visual stimuli are often present in a local spatial region. Although it is generally well understood how neurons in the visual cortex encode a single stimulus, how neurons encode multiple visual stimuli within their receptive fields (RFs) remains to be elucidated. Because visual perception depends critically on the integration and segregation of multiple visual stimuli (Braddick, 1993), understanding the neural representation of multiple stimuli is of significant importance.
The middle temporal (MT) cortex is an extrastriate brain area that is important for visual motion processing (Britten, 2003; Born and Bradley, 2005; Park and Tadin, 2019). Neurons in area MT receive feedforward inputs from direction-selective neurons in V1 (Movshon and Newsome, 1996) and have RFs ∼10 times larger in size than those of the neurons in the primary visual cortex (V1) at the same eccentricities (Gattass and Gross, 1981; Albright and Desimone, 1987). Previous studies have shown that neuronal responses in area MT elicited by multiple moving stimuli follow a sublinear summation of the responses elicited by the individual stimulus components (Snowden et al., 1991; Qian and Andersen, 1994; Ferrera and Lisberger, 1997; Recanzone et al., 1997; Britten and Heuer, 1999; Heuer and Britten, 2002; McDonald et al., 2014), consistent with a model of divisive normalization (Simoncelli and Heeger, 1998; Britten and Heuer, 1999; Carandini and Heeger, 2011).
Work in our laboratory has shown that the direction tuning curves of MT neurons to overlapping random-dot stimuli moving transparently in different directions can also be described as a weighted sum of the responses elicited by the individual stimulus components (Xiao et al., 2014; Xiao and Huang, 2015). When two stimulus components have different signal strengths in one feature domain, defined either by motion coherence or luminance contrast, MT neurons pool the stimulus component that has a stronger signal strength with greater weight (Xiao et al., 2014). The response bias in MT toward the stimulus component that has a stronger signal strength can be accounted for by a descriptive model of divisive normalization (Xiao et al., 2014), similar to the contrast normalization model used to describe neuronal responses in V1 (Carandini et al., 1997; Busse et al., 2009).
However, natural scenes contain multiple visual stimuli that often differ in more than one feature domain. For example, one stimulus may have a stronger signal strength in feature A but a weaker signal strength in feature B, whereas another stimulus may have a weaker signal strength in feature A but a stronger signal strength in feature B. In this case, it is unclear which stimulus has an overall stronger signal strength and, more generally, how visual stimuli with multiple competing features interact within neurons' RFs.
One possibility is that, for neurons in a given brain area, the overall signal strength of a visual stimulus is reflected in the evoked responses of a population of neurons in that area. Due to divisive normalization within that area, a neuron may weigh a visual stimulus more strongly if the population neural response elicited by that stimulus is greater than the population response elicited by a competing stimulus. Alternatively, how neurons in a given brain area weigh multiple competing stimuli may be the result of neural computations occurring in multiple stages along the hierarchical visual pathway and may not be explained by simply considering the population neural responses elicited by the individual stimulus components in the area of interest.
Here, we investigate the rule by which neurons in area MT encode multiple moving stimuli that compete in more than one feature domain. We found that MT responses to multiple stimuli changed drastically when the spatial arrangement of the visual stimuli was varied. Our results reveal how visual stimuli that differ in multiple feature domains interact within neurons' RFs and shed light on how the neuronal responses in a given cortical area are shaped by neural processing along the hierarchical visual pathway.
Materials and Methods
Two male adult rhesus monkeys (Macaca mulatta) were used in the neurophysiological experiments. Experimental protocols were approved by the Institutional Animal Care and Use Committee of the University of Wisconsin–Madison and conform to U.S. Department of Agriculture regulations and to the National Institutes of Health guidelines for the care and use of laboratory animals. Procedures for surgical preparation and electrophysiological recordings were routine and similar to those described previously (Xiao et al., 2015). A head post and a recording cylinder were implanted during sterile surgery with the animal under isoflurane anesthesia. For electrophysiological recordings from neurons in area MT, we took a vertical approach and used tungsten electrodes (1–3 MΩ; FHC). We identified area MT by its characteristically large portion of directionally selective neurons, small RFs relative to those of neighboring medial superior temporal cortex (area MST), its location at the posterior bank of the superior temporal sulcus, and visual topography of the RFs (Gattass and Gross, 1981). Electrical signals were amplified and single units were identified with a real-time template-matching system and an offline spike sorter (Plexon). Eye position was monitored using a video-based eye tracker (EyeLink, SR Research) at a rate of 1000 Hz.
Visual stimuli and experimental procedure.
Stimulus presentation and data acquisition were controlled by a real-time data acquisition program, “Maestro” (https://sites.google.com/a/srscicomp.com/maestro/home). Visual stimuli were presented on a 25-inch CRT monitor at a viewing distance of 63 cm. Monitor resolution was 1024 × 768 pixels, with a refresh rate of 100 Hz. Stimuli were generated by a Linux workstation using an OpenGL application that communicated with an experimental control computer. The luminance of the video monitor was measured with a photometer (LS-110, Minolta) and was gamma-corrected.
Visual stimuli were achromatic random-dot patches presented within a circular aperture with a diameter of 3°. Individual dots were squares of 2 pixels extending 0.08° on each side, and each random-dot patch had a dot density of 2.7 dots/degree2 (deg2). The dots had a luminance of either 79 or 22 cd/m2 and were presented on a uniform background with a luminance of 10 cd/m2, which gives rise to a Michelson contrast of either 77.5 or 37.5%. Random dots in each patch moved within the stationary aperture in a specified direction. The motion coherence of each random-dot patch was set to either 100 or 60%. To generate a random-dot patch moving at N% of motion coherence (Newsome and Paré, 1988; Britten et al., 1992), N% of the “signal” dots were selected to move coherently, whereas the rest of the dots, referred to as the “noise” dots, were repositioned randomly within the aperture. Random selections of the signal and noise dots occurred at each monitor frame. Therefore, a given dot would switch back and forth between a signal dot and a noise dot. The lifetime of each dot was as long as the motion duration.
In each experimental trial, the monkey maintained fixation within a 1° × 1° electronic window around a small fixation point. After a neuron was isolated, we first characterized its direction selectivity by interleaving trials of a 30° × 27° random-dot patch, moving in different directions at a step of 45° and at a speed of 10°/s. The direction selectivity and preferred direction (PD) were determined online using MATLAB (MathWorks). We then characterized the speed tuning of the neuron using a random-dot patch moving at different speeds (1, 2, 4, 8, 16, 32, or 64°/s) in the PD of the neuron. Using a cubic spline, the preferred speed (PS) of the neuron was taken as the speed that evoked the highest firing rate in the fitted speed tuning curve. Next, we used a series of 5° × 5° random-dot patches moving in the PD and at the PS of the neuron to map the RF of the neuron. The location of the patch was randomized, and the screen was tiled in 5° steps. The RF map was interpolated at 0.5° intervals, and the location giving rise to the highest firing rate was taken as the center of the RF.
In the main experiments, the visual stimuli appeared after the monkey maintained fixation for 200 ms. To separate the neuronal responses to the stimulus motion from those due to the stimulus onset, the visual stimuli were first turned on and remained stationary for 200 ms before they started to move for 500 ms. The visual stimuli were then turned off. The monkeys maintained fixation for an additional 200 ms after the stimulus offset. In some stimulus trials, two random-dot patches that moved in different directions, referred to as two stimulus components, were presented simultaneously. The direction separation between two stimulus components was fixed at 90°. We varied the vector average (VA) direction of the bidirectional stimulus ∼360° to characterize the response tuning curve. The two stimulus components were either overlapping in one of two locations (site a or b) within the RF or spatially separated within the RF, one centered at site a and the other at site b, with at least a 1° gap between the borders of the two random-dot patches (illustrated in Fig. 1). In other trials, only one stimulus component was presented at either site a or site b, and the direction was varied to characterize the tuning curve to the stimulus component. For the majority of the experiments, the VA and component directions were varied in 15° steps. In a small set of experiments, the directions were varied in 30° steps. The trials presenting bidirectional stimuli and individual stimulus components were randomly interleaved.
In the first experiment, one random-dot patch, referred to as the “low-contrast and high-coherence” component, had a luminance contrast of 37.5% and a motion coherence of 100%. The other random-dot patch, referred to as the “high-contrast and low-coherence” component, had a luminance contrast of 77.5% and a motion coherence of 60%. Both stimulus components moved at the same speed, which was set at the PS of the neuron if it was <10°/s, or at 10°/s if the PS was ≥10°/s. Note that when a random-dot patch moved at 60% coherence in a given direction, the visual stimulus was different from a situation where 60% of the dots always moved coherently and the remaining 40% of dots always moved randomly. Because the random selection of signal and noise dots occurred at each monitor frame in our stimuli, a noise dot at one frame may turn into a signal dot in the next frame and move in the coherent direction. Perceptually, it is difficult to segregate the noise dots from the signal dots of the same stimulus component. The noise dots of the high-contrast and low-coherence component are not an independent entity and do not appear to interfere with the coherence of the low-contrast and high-coherence component perceptually.
In the second experiment, we set the motion coherence of both random-dot patches to 100% but used different speeds for the two stimulus components. One random-dot patch, referred to as the “low contrast and faster speed” component, had a luminance contrast of 37.5% and moved at 10°/s. The other random-dot patch, referred to as the high-contrast and slower-speed” component, had a luminance contrast of 77.5% and moved at 2.5°/s.
Experimental design and statistical analysis.
Response firing rate was calculated during the period of 500 ms stimulus motion and averaged across repeated trials. We fitted the raw direction tuning curves for the bidirectional stimuli and the individual stimulus components using splines at a resolution of 1°. We then rotated the spline-fitted tuning curve to the bidirectional stimuli so that the VA direction of 0° was aligned with the PD of each neuron. In the first experiment, the responses of each neuron to the bidirectional stimuli and individual stimulus components were normalized by the maximum response to the low-contrast and high-coherence component. In the second experiment, the responses of each neuron were normalized by the maximum response to the faster speed component. We averaged the rotated and normalized tuning curves across neurons to obtain population-averaged tuning curves.
To quantify the relationship between the responses elicited by the bidirectional stimuli and those elicited by the individual stimulus components, we fitted the direction tuning curves using a summation plus nonlinear interaction (SNL) model (Eq. 1), which has been shown to provide a better fit of MT responses elicited by bidirectional stimuli than a linear weighted summation model (Xiao et al., 2014): where Rpred is the response to the bidirectional stimuli predicted by the model; θ1 and θ2 are the two component directions; R1 and R2 are the measured component responses elicited by the two stimulus components when presented alone; w1 and w2 are the response weights for R1 and R2, respectively; and b is the coefficient of multiplicative interaction between the component responses. To determine whether the response elicited by the bidirectional stimuli showed a significant bias toward one of the two stimulus components, we compared the response weights, w1 and w2 using either a paired t test or a Wilcoxon signed-rank test.
We also fitted the response tuning curves to the bidirectional stimuli using a few variants of a divisive normalization model (Carandini and Heeger, 2011; see Results). The model fits were obtained using the constrained minimization tool “fmincon” (MATLAB) to minimize the sum of squared error.
To evaluate the goodness of fit of a model for the response tuning curve to the bidirectional stimuli, we calculated the percentage of variance (PV) accounted for by the model as follows: where SSE is the sum of squared errors between the model fit and the neuronal data, and SST is the sum of squared differences between the data and the mean of the data (Morgan et al., 2008).
V1-MT model.
We adapted a computational model proposed by Simoncelli and Heeger (1998; http://www.cns.nyu.edu/∼lcv/MTmodel/) to reconstruct our visual stimuli and to simulate the neuronal response tuning to the bidirectional stimuli that were either overlapping or spatially separated. The model contained several consecutive stages, which can be interpreted as V1 simple, V1 complex, and MT (Simoncelli and Heeger, 1998; Rust et al., 2006). Based on the dimensions of the video monitor and the viewing distance in our neurophysiological experiments, 1° of visual angle corresponds to 21 pixels. The random-dot patch in our model simulations had a circular aperture with a diameter of 63 pixels (i.e., 3°) and the same dot density as used in our experiments. Each dot had a size of 2 × 2 pixels.
We set the RFs of model neurons by Gaussian convolutional filters (Table 1). We estimated the size of the RF for each neuron type by summing the lengths of the incorporated filters. For the spatially separated stimuli, we set a blank gap between the two stimulus components as the RF size of the V1 complex neuron, which is 1.2°, to ensure that no V1 neuron would be driven by both stimulus components. We generated direction-selective neuron populations that approximately tiled a sphere in the frequency domain. We tuned the contrast response functions by adjusting C50 values for V1 and MT neurons. These C50 values were represented in the model as σ2 in the normalization equation (Eq. 3), which was applied to both V1 complex cell and MT stages of the model (adapted from Simoncelli and Heeger, 1998; Rust et al., 2006): where Rn(t) represents the linear filter response of the nth neuron; R′n(t) represents the normalized response of either the V1 complex cell or MT neuron; ⌊ ⌋ denotes half-wave rectification; K represents the strength of normalization, which was set as 1−σ2; m represents the normalization pool of the nth neuron; w represents the Gaussian spatial weighting profile of the normalization pool, with an SD of SDnorm. The model parameters for V1 and MT stages are defined in Table 1. We fitted the model contrast response functions to neural data from V1 and MT as described by Sclar et al. (1990). Similarly, we tuned coherence responses by varying the spatial scale of the normalization pool (m), the weighting profile within the pool (w), and the size of the V1 linear RF. The MT coherence response function was fitted to data replotted from Figure 1C in Britten and Newsome (1998). We are not aware of published neural data on V1 coherence response function. Therefore, the parameters for V1 model neurons were varied to simulate our MT responses to bidirectional stimuli without a constraint on V1 coherence response function. The same model parameters were used for the overlapping and spatially separated conditions.
We explored several variants of the model architecture. The model parameters were fitted after each architectural manipulation. The following changes enabled the model to better capture the trends of the stimulus competition found in our neural data. First, we used area-normalized Gaussian functions to set the weights for the spatial pooling and local population normalization. Second, multiple frequency scales for V1 simple cells were computed by tripling the SD of the underlying third-order derivative Gaussian, similar to the doubling suggested by Simoncelli and Heeger (1998)—this change was made after spectral analysis of stimuli showed that a wider range of scales was necessary to capture motion at lower coherence. Third, V1 afferent weights were not adjusted to zero mean, allowing MT neurons to have variable proportions of positive and negative inputs. Finally and importantly, rectification and static nonlinearity were applied to the MT stage after spatial pooling and before normalization, which is physiologically plausible and provides a better fit of our neural data.
Results
We asked the question of how neurons in extrastriate area MT represent multiple visual stimuli that compete in more than one feature domain. To address this question, we conducted neurophysiological experiments and computer simulations. We recorded electrophysiological data from isolated single neurons in area MT of two macaque monkeys while they performed a fixation task. Visual stimuli were two random-dot patches moving simultaneously in different directions within the RFs. In the first experiment, we used luminance contrast and motion coherence as two competing features. One stimulus had high contrast but moved with low coherence, whereas the other stimulus had low contrast but moved with high coherence (see Materials and Methods). We manipulated the spatial arrangement of the visual stimuli to investigate the contributions of earlier visual areas and area MT in mediating the competition between multiple stimuli. In a second experiment, we used luminance contrast and motion speed as two competing features. We first present the results from the neurophysiological experiments and then computer simulations.
Neurophysiological experiments
We measured the direction tuning curves of MT neurons in response to two stimuli that had competing visual features and moved simultaneously in different directions. Our dataset includes recordings from 76 MT neurons, 43 from monkey G and 33 from monkey B. We set the angular separation between the motion directions of two individual stimuli, referred to as the stimulus components, at 90° and varied the VA direction of the stimuli. In the first experiment, one stimulus component had a low contrast of 37.5% and moved at a high motion coherence of 100%. The other component had a high contrast of 77.5% and moved at a low coherence of 60%. Figure 1 shows the direction tuning curves of two representative neurons. The red curve shows the neuronal response elicited when both stimulus components were present, as a function of the VA direction of the two stimulus components. The green and blue curves show the neuronal responses elicited by the individual stimulus components when presented alone. The tuning curves of the component responses are arranged such that, at each VA direction, the data points on the green and blue curves correspond to the responses elicited by the individual stimulus components of that VA direction (note the color-coded abscissas for the component directions in Fig. 1A2).
For the two example neurons, the peak response of the direction tuning curve to the low-contrast and high-coherence component alone (shown in blue) was greater than that of the high-contrast and low-coherence component (shown in green; Fig. 1). This is expected because MT neurons are sensitive to motion coherence within a large coherence range (Britten et al., 1993), whereas their contrast response function saturates at a low luminance contrast (Sclar et al., 1990). Consequently, the average of the response tuning curves to the two stimulus components (shown in gray) was biased toward the low-contrast and high-coherence component. Surprisingly, we found that when the two stimulus components were overlapping, the neuronal responses elicited by the bidirectional stimuli were strongly biased toward the high-contrast and low-coherence component (Fig. 1A1,A2). This response bias was robust and occurred when we placed the overlapping stimuli at a different site within the RF (Fig. 1B1,B2).
Two overlapping visual stimuli could stimulate not only the RFs of single MT neurons but also the RFs of single V1 neurons. The response bias toward the high-contrast and low-coherence component may be caused by the neural processes within area MT or, alternatively, inherited from earlier visual areas, such as V1. To determine the contribution of earlier visual areas to the response bias, we placed two stimulus components at different locations within the RF of a given MT neuron. The two stimulus components were separated by a gap of at least 1° (illustrated in Fig. 1C1,C2). With this spatial arrangement, the RF of a single V1 neuron could only be stimulated by one of the two stimulus components, whereas the RF of an MT neuron could still be stimulated by both components. We found that the response tuning to the bidirectional stimuli changed drastically when stimulus components were spatially separated. MT responses elicited by the bidirectional stimuli no longer showed a bias toward the high-contrast and low-coherence component but approximately followed a scaled average of the component responses (Fig. 1C1,C2).
Figure 2 shows the tuning curves averaged across 70 MT neurons. The population-averaged response elicited by the low-contrast and high-coherence component moving in the PD of each neuron, aligned to 0°, was significantly greater than that elicited by the high-contrast and low-coherence component moving in the PD (one-tailed paired t test, p = 4.1 × 10−7). However, when the two stimuli were overlapping, the population response elicited by the bidirectional stimuli was almost completely biased toward the weaker high-contrast and low-coherence component, regardless of the spatial location within the RF (Fig. 2A,B). The bias toward the high-contrast and low-coherence component at a given VA direction was in a manner of “higher contrast take all.” For example, at a VA direction of 45° where the low-contrast and high-coherence component moved in the PD (0°) and the high-contrast and low-coherence component moved in a 90° direction (indicated by a dotted line in Fig. 2A), the bidirectional response closely followed the much weaker response elicited by the high-contrast and low-coherence component. When the two stimulus components were spatially separated within the RF, the strong bias toward the high-contrast and low-coherence component was abolished (Fig. 2C). The population response to the bidirectional stimuli now showed approximately equal weighting of the responses elicited by the individual stimulus components.
The SNL model (see Eq. 1 in Materials and Methods) provided an excellent fit of the MT responses elicited by the bidirectional stimuli, illustrated by the black curves in Figure 1. Across our neuron population, the model fit accounted for 83% of the response variance on average (see Materials and Methods). Figure 3 compares the response weights for the two stimulus components obtained from the SNL model fits. In the overlapping condition, the mean response weight (w2) for the high-contrast and low-coherence component was significantly greater than weight w1 for the low-contrast and high-coherence component (one-tailed paired t test, p = 1.9 × 10−45 for site a, p = 2.5 × 10−28 for site b; Fig. 3A). Nearly all data points, each representing the result from one neuron, were below the unity line. The mean response weight for the high-contrast and low-coherence component was 0.97 (SD = 0.24), whereas the mean weight for the low-contrast and high-coherence component was 0.23 (SD = 0.25), indicating a dominant effect of the high-contrast and low-coherence component in determining the neuronal response to the bidirectional stimuli.
When the two stimulus components were spatially separated within the RF, the response weights changed significantly, becoming symmetrically distributed relative to the unity line (Fig. 3B). The spread of weights in the spatially separated condition is larger than that in the overlapping condition. The mean weight for the high-contrast and low-coherence component decreased to 0.66 (SD = 0.32), whereas the mean weight for the low-contrast and high-coherence component increased to 0.68 (SD = 0.43). The mean weights for the two components were no longer different (paired t test, p = 0.8) but were significantly >0.5 of response averaging (t test, p < 0.001).
To quantify the response bias toward an individual stimulus component, we calculated a bias index (BI): A positive value of the index indicates a bias toward the high-contrast and low-coherence component. Figure 3C shows how this bias index changes with the spatial arrangement of the visual stimuli. In the overlapping condition, the mean BI is 0.71 (median = 0.70, SD = 0.23), which is significantly >0 (one-tailed t test, p = 7.5 × 10−35). In the spatially separated condition, the mean BI is −0.05 (median = −0.01, SD = 0.95), which is not significantly different from 0 (p = 0.7). The mean BI obtained in the overlapping condition is significantly greater than that in the spatially separated condition (one-tailed paired t test, p = 4.7 × 10−9), indicating a change of the response bias when the spatial arrangement of the visual stimuli is altered.
We previously found that the tuning curves of some MT neurons to overlapping bidirectional stimuli can show a directional “side bias” toward one of the two direction components (Xiao and Huang, 2015). A subgroup of neurons prefer the stimulus component on the clockwise side of two motion directions, whereas another group prefers the component direction on the counterclockwise side. These response biases can occur even when both stimulus components have the same contrast and coherence. In the experiment shown in Figures 1, 2, and 3, the high-contrast and low-coherence component always moved in the counterclockwise side direction (Fig. 2A,B). Could the strong bias toward the high-contrast and low-coherence component in the overlapping condition be due to a biased neuron sample that happened to have a strong bias toward the direction component on the counterclockwise side? To address this concern, we arranged the direction components differently.
Figure 4, A and B, shows the average direction tuning curves of 15 MT neurons when the direction of the high-contrast and low-coherence component was placed on the counterclockwise side under the overlapping and spatially separated conditions, as in Figure 2. When the high-contrast and low-coherence component was placed on the clockwise side of the two component directions, the responses of the same 15 neurons to the bidirectional stimuli still showed a strong bias toward the high-contrast and low-coherence component under the overlapping condition (Fig. 4C) and showed approximately equal weighting of the two components under the spatially separated condition (Fig. 4D). Placing the high-contrast and low-coherence component on the clockwise or counterclockwise side of the two component directions had no effect on the response bias, as measured by the bias index under the overlapping and spatially separated conditions (Wilcoxon rank-sum test, p = 0.6).
To shed light on the neural mechanisms underlying the response bias, we examined the time course of the neuronal responses in the overlapping and spatially separated conditions. Figure 5 shows the peristimulus time histograms (PSTHs) calculated using a 10 ms time bin when either the high-contrast and low-coherence component or the low-contrast and high-coherence component moved in the PD. When stimuli were overlapping, as soon as MT neurons started to respond to the onset of the static stimuli (see Materials and Methods), the response elicited by both the high-contrast and low-contrast components (the red curves in Fig. 5A,B) already closely followed the response elicited by the high-contrast component alone (the green curves in Fig. 5A,B), even before the onset of the stimulus motion.
As soon as the neuronal response to stimulus motion started, the neuronal response to the bidirectional stimuli followed the response elicited by the high-contrast and low-coherence component throughout the motion period, regardless of whether the component moved in the PD and elicited a strong response (Fig. 5A) or 90° away from the PD and elicited a weak response (Fig. 5B). Because the strong bias toward the high-contrast and low-coherence component in the overlapping condition occurred at the very beginning of the response onset, it is unlikely that the bias was due to selective attention because attention modulation is delayed relative to the neural response onset (Wannig et al., 2007; Lee and Maunsell, 2010; Ni et al., 2012).
When stimuli were spatially separated, MT neurons also followed the high-contrast and low-coherence component in response to the onset of the static stimuli (Fig. 5C,D). After the motion onset, when the high-contrast and low-coherence component moved in the PD, the motion response elicited by the bidirectional stimuli initially followed the high-contrast and low-coherence component for ∼30 ms and was then “pulled down” by the non-PD component (Fig. 5C, arrow). When the high-contrast and low-coherence component moved in the non-PD, the motion response elicited by the bidirectional stimuli followed the high-contrast and low-coherence component for ∼10 ms after the onset of the motion response to the PD component and was then “pulled up” by the PD component (Fig. 5D, arrow). These results suggest that response normalization under the spatially separated condition takes ∼10–30 ms to occur.
When two stimulus components overlap, the random dots from each component constitute only half of the total number of dots of the two moving surfaces. Could the strong response bias toward the high-contrast and low-coherence component be due to a reduction of the motion coherence of the low-contrast and high-coherence component when the stimuli overlapped? We think this is an unlikely explanation because overlapping reduces the percentage of the signal dots relative to the total number of dots for both stimulus components. In addition, our stimuli moved in two directions separated by 90°. Human observers can reliably segregate the two stimulus components at this angle separation, and the low-contrast and high-coherence component still appears to move coherently. Overlapping does not change the relative coherence levels nor the perceived coherence of the two stimulus components. When overlapping random-dot stimuli have the same luminance contrast but move at different motion coherences, macaque MT response to both stimulus components is biased toward the high-coherence component (Xiao et al., 2014), indicating that stimulus overlapping does not prevent the response bias toward the high-coherence component given equal contrast.
To determine whether the dominance by the high-contrast component on MT responses elicited by overlapping stimuli occurs only when luminance contrast and motion coherence compete with each other, we conducted a second experiment using visual stimuli that differ in luminance contrast and motion speed. We previously found that when two overlapping random-dot patches moved in the same direction at different speeds, within a range of low to intermediate speeds, the responses of MT neurons elicited by the bispeed stimuli were biased toward the faster speed component (Huang et al., unpublished observations). Motivated by this finding, we used motion speed to compete with luminance contrast. As in the main experiment, the visual stimuli contained two random-dot patches moving in two directions separated by 90°, and we varied the VA direction to measure the direction tuning curves. One stimulus component had a high luminance contrast of 77.5% and moved at a slower speed of 2.5°/s. The other stimulus component had a low luminance contrast of 37.5% and moved at a faster speed of 10°/s. Both stimulus components moved at 100% coherence and were either overlapping or spatially separated within the RF of a given MT neuron, as in the first experiment. We also measured the direction tuning curves when the two stimulus components both had high luminance contrast (77.5%) and moved at 2.5°/s and 10°/s, respectively, at 100% coherence.
We recorded from 13 MT neurons using these visual stimuli. Figure 6 shows the population-averaged tuning curves. When both stimulus components had high contrast, the peak response elicited by the faster (10°/s) stimulus component moving in the PD (i.e., 0°) was greater than that elicited by the slower (2.5°/s) component moving in the PD. The component responses are shown in green and purple in Figure 6A. When the two stimulus components were overlapping, the tuning curve elicited by both stimulus components (shown in red) is biased toward the faster stimulus component, more than what is predicted by the average of the component responses (shown in gray; Fig. 6A). We fitted the direction tuning curves using the SNL model for each neuron (Eq. 1). The median response weight obtained by the model fit for the faster stimulus component (0.88) was significantly greater than the median weight (0.41) for the slower component (Wilcoxon signed-rank test, p = 7.3 × 10−4). This result extended our previous finding of the response bias toward the faster stimulus component for stimuli moving in the same direction (unpublished results) to stimuli moving in different directions.
When the overlapping stimuli moving at different speeds had different luminance contrasts, the responses elicited by both stimulus components showed a strong bias toward the high-contrast and slower-speed component, even though the peak response to this component alone was significantly weaker than that to the low-contrast and faster-speed component (Fig. 6B). We found the same result when the two stimulus components overlapped at a different site within the RF (Fig. 6C). Under the overlapping condition, the median response weight for the high-contrast and slower-speed component was 0.81, which was significantly greater than the median weight for the low-contrast and faster-speed component (0.17; Wilcoxon signed-rank test, p = 2.4 × 10−4). Separating the two stimulus components spatially within the RF abolished the bias toward the high-contrast and slower-speed component (Fig. 6D). As the spatial arrangement of the stimulus components changed from overlapping to spatially separated, the median bias index (Eq. 4) decreased significantly from 0.65 (SD = 0.21) to −0.08 (SD = 0.37; Wilcoxon signed-rank test, p = 0.0012). The values of the bias indices under the overlapping and spatially separated conditions were consistent with the bias indices calculated when visual stimuli competed between contrast and motion coherence. These results confirmed that luminance contrast has a dominant effect on MT responses elicited by overlapping stimuli, which is not unique to the competition between contrast and motion coherence. The spatial arrangement of visual stimuli can substantially change the competition between multiple stimuli within the RF.
Fitting response tuning curve using the normalization model
Previous studies have shown that neuronal responses elicited by multiple stimuli in many brain areas can be described by a divisive normalization model (Carandini and Heeger, 2011). We asked whether our results could also be accounted for by response normalization. We first fitted the data using the following equation: where R1 and R2 are the evoked direction tuning curves to the two stimulus components 1 and 2, respectively; θ1 and θ2 are the component directions; S1 and S2 represent the signal strengths of the low-contrast and high-coherence component and the high-contrast and low-coherence component, respectively (see definition below); Rpred is the model-predicted response elicited by both stimulus components presented simultaneously; and n, σ, and c are model parameters with the constraints of n ≥ 1 and c > 0. Equations of a similar form have been used previously to describe normalization involving contrast, in which case the signal strength is simply the luminance contrast (Carandini et al., 1997; Busse et al., 2009; Xiao et al., 2014; Bao and Tsao, 2018). Because our visual stimuli competed in more than one feature domain, it was not obvious which stimulus component had an overall stronger signal strength. Because the brain has to make an inference of the signal strength based on the elicited neural responses, we assumed that the signal strength of a stimulus component, in the “eye” of MT neurons, is reflected in the neural responses elicited by that stimulus component moving in a fixed direction summed across a population of MT neurons that have different PDs evenly spanning 360°. This summed population response is invariant to the direction of the stimulus component, which is suitable for representing signal strength. Equivalently, the summed population neural response in MT can be approximated by summing the responses of each neuron elicited by a stimulus component moving in different directions spanning 360° and averaged across neurons in our data sample. We calculated S1 and S2 based on the following equation: , in which k is the index of the stimulus component. For the low-contrast component, k = 1; for the high-contrast component, k = 2; N is the total number of neurons in the population; Rk,j(θi) represents the raw firing rate of neuron j to motion direction θI of the stimulus component k; M is the number of the direction samples of a direction tuning curve. Because we spline-fitted the direction tuning curve using a step of 1°, M = 360. The values of S1 and S2 for the two stimulus components are shown in Table 2.
This normalization model (Eq. 5) failed to capture the response tuning to overlapping bidirectional stimuli, accounting for only 32% of the response variance (32% for site a, 32% for site b). The model performed better when stimuli were separated, accounting for 62% of the variance. We found similar results when using this model to fit the data from our second experiment, in which luminance contrast competed with motion speed. The model accounted for an average of 45% of the response variance (39% for site a, 50% for site b) when stimuli were overlapping, and 76% of the variance when stimuli were separated (Table 2).
It has been suggested that response normalization can be tuned, such that individual stimulus components contribute differently to normalization (Carandini et al., 1997; Rust et al., 2006; Ni et al., 2012). Therefore, we fitted our data using a tuned normalization equation: where α is a positive parameter that scales the contribution of S2 with respect to S1 to normalization. We found that introducing tuned normalization did not improve the model performance at all when stimuli were overlapping, accounting for an average of 32% of the response variance (32% for site a, 31% for site b). When stimuli were separated, the tuned normalization model accounted for 64% of the variance. We found similar results when fitting the data collected when contrast competed with speed (Table 2).
The poor fit of the responses under the overlapping condition by the standard normalization model (Eq. 5) can be understood because MT neurons showed a very strong bias toward the high-contrast component, whereas S1 and S2 were similar, with S2 (i.e., MT responses elicited by the high-contrast component) being slightly smaller (Table 2). The tuned normalization was not able to improve the fit because, although it changed the relative contributions of the stimulus components to the normalization pool in the denominator, it kept the numerators in Equation 6 unchanged. Hence, the relative weights for the two stimulus components did not change. The failure of the normalization model fit using Equations 5 and 6 under the overlapping condition suggests that the assumption that the response of an MT neuron to multiple overlapping stimuli is governed by the population responses in area MT elicited by the individual stimulus components may be flawed.
To capture the strong bias toward the high-contrast component in the overlapping condition, a weighting parameter is needed in the numerator. Accordingly, we fitted our results using the following equation: where β is a positive parameter and appears in both the numerator and the denominator. This parameter allows the relative response weights for the two stimulus components to vary. When β is >1, the response weight for the high-contrast component (R2) is greater than that for the low-contrast component (R1). As expected, this equation fitted the data well, accounting for >70% of the response variance for both the overlapping and spatially separated conditions (Table 2). However, the normalization model itself does not provide an explanation for why the response weight is greater for the high-contrast component in the overlapping condition but not in the spatially separated condition. Our interpretation of the success of this model fit is that the term β in Equation 7 serves to capture the relative signal strength of the two stimulus components, such that the signal strength of the high-contrast and low-coherence (or low-speed) component is greater than that of the low-contrast and high-coherence (or high-speed) component under the overlapping but not spatially separated condition. Equation 7 is similar to a tuned normalization model used by Ni and colleagues (Ni et al., 2012; Ni and Maunsell, 2017, 2019) to explain the effect of attention on the neuronal response to multiple stimuli within the RF. In Equations 3A and 3B of Ni et al. (2012), there is also a term β in the numerator, which increases the weight of the attended stimulus component. Although β in the numerator can reflect attentional modulation as shown in these previous studies, β can also represent biased feedforward drive that favors one stimulus component under the overlapping condition. Our model simulations, presented in the following section, support the latter possibility. We will further consider the potential involvement of attention in the Discussion.
Computer simulations using a V1-MT model
Our spatially separated visual stimuli fall inside the RFs of single MT neurons, whereas only one of the stimulus components would fall inside the RFs of single V1 neurons. Hence, our spatially separated visual stimuli can interact within the RFs of MT neurons but not V1 neurons. In contrast, the overlapping stimuli can interact within the RFs of both MT and V1 neurons. To explore the neural mechanisms underlying our physiological findings, we conducted computer simulations using a hierarchical feedforward model adapted from Simoncelli and Heeger (1998). This model consists of two processing stages corresponding to areas V1 and MT. Each stage carries out a series of computations, including spatiotemporal filtering, spatial pooling, rectification, and divisive normalization. At the V1 stage, simple cells receive input directly from the visual stimulus, and complex cells pool inputs from rectified and divisively normalized responses of V1 simple cells. At the MT stage, MT neurons pool inputs from V1 complex cells, followed by rectification and divisive normalization (Simoncelli and Heeger, 1998; Rust et al., 2006).
We generated random-dot visual stimuli that are similar to those used in our physiological experiments with the same size and dot density and simulated the neuronal responses in areas MT and V1. The visual stimuli and a simplified architecture of the model are illustrated in Figure 7. The diameter of each random-dot patch was 3°, extending 63 pixels. The dot density of each random-dot patch was 2.7 dots/degree2. The RF sizes of model V1 and MT neurons, set by the sizes of the convolution filters, were 1.2° and 10° in diameter, respectively (see Materials and Methods). The populations of model neurons in V1 and MT stages approximately tiled a sphere in the spatiotemporal frequency domain, as in the Simoncelli and Heeger (1998) model. The RFs of V1 and MT neuron populations covered a region of the visual field that was 17.3° × 17.3°. In the overlapping condition, the apertures of two random-dot patches overlapped within the RFs (Fig. 7A). In the spatially separated condition, the two random-dot patches were placed side by side, separated by a blank gap that was 1.2° wide, within the RFs of single MT neurons (Fig. 7B). In the overlapping condition, the V1 neurons whose RFs covered site a were activated by both stimulus components (Fig. 7A). In the spatially separated condition, V1 neurons were activated by only one stimulus component, either at site a or site b (Fig. 7B).
We tuned the model parameters (see Materials and Methods) to match the experimentally measured contrast response functions of V1 and MT neurons (Sclar et al., 1990) and the coherence response function of MT neurons (Britten and Newsome, 1998). The simulated contrast response functions of V1 and MT neurons fitted the experimental data almost perfectly, and the simulated coherence response function of MT neurons matched the data reasonably well (Fig. 8A–C). As far as we know, an experimentally measured coherence response function of V1 neurons has not been described previously. Our simulations show that V1 responses increased with the coherence level of moving random-dot stimuli (Fig. 8D). The model V1 neurons had slightly higher firing rates in response to low-coherence stimuli and, as expected, more trial-to-trial variability in comparison with the model MT neurons due to small V1 RFs (Fig. 8C,D).
The MT responses elicited by our visual stimuli that competed between luminance contrast and motion coherence were well captured by the model. Consistent with our experimental data (Fig. 2), the tuning curve of model MT neurons to the low-contrast and high-coherence component had a greater peak response than that of the high-contrast and low-coherence component (Fig. 9A,B). In the overlapping condition, the simulated MT response elicited by the bidirectional stimuli was nearly completely biased toward the weaker high-contrast and low-coherence component (Fig. 9A), as found in the neural data. The model also captured the change of MT response tuning when visual stimuli were rearranged spatially. In the spatially separated condition, the tuning curve of model MT neurons elicited by the bidirectional stimuli was no longer dominated by the high-contrast and low-coherence component (Fig. 9B).
At the V1 stage of the model, the tuning curves of V1 complex cells showed a slightly greater mean peak response to the high-contrast and low-coherence component than to the low-contrast and high-coherence component (Fig. 9C). In the overlapping condition, the simulated V1 response elicited by the bidirectional stimuli was strongly biased toward the high-contrast and low-coherence component (Fig. 9C), to an extent similar to that found in model MT neurons (Fig. 9A), as measured by the weights for the component responses using the SNL model fits. The bias index (Eq. 4) for the V1 model neurons was 0.77, and that for the MT model neurons was 0.84. These simulation results suggest that the strong bias toward the high-contrast and low-coherence component found in MT neurons is inherited from V1.
In the spatially separated condition, the V1 response elicited by the bidirectional stimuli was the same as that elicited by the single stimulus component placed within the RFs of V1 neurons (Fig. 9D,E). Although the V1 peak response elicited by the high-contrast and low-coherence component at site a was slightly stronger than that elicited by the low-contrast and high-coherence component at site b, the MT response elicited by the bidirectional stimuli was skewed toward the low-contrast and high-coherence component, consistent with the average of the component responses (Fig. 9B). These simulation results suggest that MT response elicited by the bidirectional stimuli in the spatially separated condition (Fig. 9B) may be due to feature competition within MT.
The dot density of overlapping stimuli was twice the density of a single stimulus in the spatially separated condition. To understand whether the change of stimulus competition with the spatial arrangement of visual stimuli was confounded by the dot density, we conducted the same model simulations using a dot density that was either half of (1.35 dots/degree2) or twice (5.4 dots/degree2) the original dot density (2.7 dots/degree2). We found essentially the same results using these three different dot densities (results not shown), suggesting that our findings cannot be explained by the difference of dot density under the overlapping and spatially separated conditions.
The response tuning curves of single MT neurons measured by varying the VA direction of the bidirectional stimuli can be mapped to the responses of a population of MT neurons that have different PDs, elicited by the bidirectional stimuli moving in a given VA direction. Figure 7 summarizes the changes of the response distributions across neuron populations at V1 and MT stages under the overlapping and spatially separated conditions.
To determine whether our findings can be generalized to visual stimuli other than random dots, we conducted model simulations using sinusoidal gratings that had different luminance contrasts and spatial frequencies (SFs). Because V1 and MT neurons have heterogeneous SF selectivity, we set SF of the visual stimuli in relation to the preferred SF of the model MT neuron. We chose the stimulus parameters such that one grating component had a “high-contrast and less-preferred SF,” whereas the other grating component had a “low-contrast and preferred SF.” The two grating components drifted at the same temporal frequency in two directions separated by 90° and were either overlapping or spatially separated within the RF of an MT neuron. Except for visual stimuli, all other model parameters were identical to those in our model simulations using random-dot stimuli.
We found the same results using drifting gratings as those obtained using random-dot stimuli (Fig. 10). The model MT neuron showed a strong response bias toward the high-contrast and less-preferred SF component when two grating components were overlapping (Fig. 10A), even though the MT response elicited by the high-contrast and less-preferred SF component presented alone (Fig. 10A, green curve) was significantly weaker than that elicited by the low-contrast and preferred SF component (Fig. 10A, blue curve). The strong bias toward the high-contrast and less-preferred SF grating was abolished when the two gratings were spatially separated (Fig. 10B).
In the model, a given MT neuron pools the inputs from a group of V1 complex cells. The model V1 neuron shown in Figure 10 had the same preferred SF as the model MT neuron. The direction tuning curves of this V1 neuron showed greater response to the high-contrast and less-preferred SF component than to the low-contrast and preferred SF component (Fig. 10C). In the overlapping condition, the simulated V1 response elicited by the bidirectional stimuli was strongly biased toward the high-contrast and less-preferred SF grating, completely ignoring the low-contrast and preferred SF grating (Fig. 10C). When gratings were spatially separated, the V1 neuron only responded to one grating component (Fig. 10D,E). These simulation results further support the idea that the strong bias toward the high-contrast component found in MT under the overlapping condition is inherited from V1. Our model simulations predict that MT neurons may inherit stimulus preferences of V1 neurons across a wide range of feature domains when multiple stimuli compete within the RFs of V1 neurons. Together, our results reveal the importance of neural processing at different stages of the visual hierarchy in determining how multiple visual stimuli compete within RFs of neurons in a given brain area.
Discussion
We have shown that the way MT neurons represent multiple stimuli competing in more than one feature domain depends on the spatial arrangement of the visual stimuli. When two stimuli are overlapping, MT responses are dominated by the stimulus component that has high contrast. When two stimuli are spatially separated, the contrast dominance is abolished. Our neural data and model simulations suggest that the contrast dominance found with overlapping stimuli is due to normalization occurring at an input stage fed to MT, and MT neurons cannot overturn this contrast dominance based on their own feature selectivity. The interaction between spatially separated stimuli can largely be explained by normalization within area MT. By using multiple visual stimuli competing in more than one feature domain, our study revealed how neural processing along the hierarchical visual pathway shapes neural representation of multiple visual stimuli in extrastriate cortex.
Consideration of the effect of attention
Attention can bias neuronal responses elicited by multiple stimuli in the RF in favor of the attended stimulus (Treue and Maunsell, 1996; Ferrera and Lisberger, 1997; Reynolds et al., 1999; Treue and Martínez Trujillo, 1999; Recanzone and Wurtz, 2000; Li and Basso, 2005; Lee and Maunsell, 2010). Although the animals in this study performed a fixation task without the need to engage goal-directed attention, could the high-contrast component capture stimulus-driven attention (Corbetta and Shulman, 2002) and bias the neuronal response elicited by the overlapping stimuli? Several considerations argue against this possibility. Although an abrupt stimulus onset captures attention (Yantis and Jonides, 1984), a visual stimulus that is brighter than other distractors does not automatically capture attention (Jonides and Yantis, 1988). The two stimulus components of our overlapping stimuli were turned on and started to move at the same time. The stimulus onset may automatically draw attention toward the spatial location of the overlapping stimuli, but it is unlikely to draw attention toward only the high-contrast component. Furthermore, stimulus-driven attention occurs with a time delay (Nakayama and Mackeben, 1989) and its effect on neuronal responses in MT is transient, lasting for ∼70 ms (Busse et al., 2008). In contrast, we found that the response bias toward the high-contrast component is present at the very beginning of the neuronal responses following the stimulus onset, and the bias is very stable and persistent throughout the motion period (Fig. 5). In addition, Wannig et al. (2007) have shown that attention directed to one of two overlapping surfaces can alter the responses of MT neurons. However, the effect of the surface-based attention was also delayed, and attention led to a response magnitude modulation of ∼20% in MT between conditions when attention was directed to two different surfaces (Wannig et al., 2007). Even in the unlikely case that the animals were consistently attending to the high-contrast component throughout the stimulus presentation period in our study, the effect of attention would be insufficient to account for the instant and nearly complete dominance by the high-contrast component.
Mechanisms underlying stimulus interactions
The primate visual system is hierarchically organized (Maunsell and van Essen, 1983; Felleman and Van Essen, 1991). The response properties of neurons in a visual area are shaped by feedforward input, as well as intra-areal and feedback processes. To understand the mechanisms underlying neural encoding of multiple stimuli, it is important to determine how these processes contribute to the RF properties in a given visual area. However, it is often difficult to disentangle the contribution of feedforward input from other neural processes. We have previously found that, in response to overlapping stimuli, MT neurons show a bias toward the stimulus component that has a higher signal strength, defined by either luminance contrast or motion coherence (Xiao et al., 2014). The response bias can be described by a model of divisive normalization. Because neurons in V1 also show a bias toward the stimulus component that has a higher contrast (Busse et al., 2009; MacEvoy et al., 2009) and divisive normalization may occur in both V1 and MT (Simoncelli and Heeger, 1998; Heuer and Britten, 2002), it was unclear how the feedforward input from V1 contributed to the response bias found in MT.
In this study, we were able to differentiate the impact of feedforward input from other neural processes on the response properties of MT neurons. Our results suggest that neurons in V1 may respond more strongly to the high-contrast and low-coherence component than to the low-contrast and high-coherence component used in our experiment due to the sensitivity of V1 neurons to contrast and coherence. When two stimuli overlap, the responses of V1 neurons elicited by both stimulus components may already show a strong bias toward the high-contrast and low-coherence component due to divisive normalization in V1 (Fig. 9C). MT neurons are no longer able to remix the stimulus components according to their own sensitivities to contrast and coherence. In other words, MT neurons inherit the response bias toward the high-contrast component from their input. When two visual stimuli are spatially separated, MT neurons receive inputs from two different pools of V1 neurons, and each neuron pool responds to only one stimulus component (Fig. 7B). The neuronal responses elicited by the two stimulus components remain separated in V1. MT neurons can mix the responses elicited by the two stimulus components via spatial and directional pooling and divisive normalization within MT. As a result, the mixing in the MT may well reflect the sensitivities of MT neurons to different stimulus features. Our model simulations make predictions regarding how V1 and MT neurons respond to multiple competing stimuli (e.g., as shown in Figs. 9C, 10), which can be tested in a future physiological study.
Implications on normalization and encoding of multiple visual stimuli
Our finding that the response weighting for competing stimuli depends on the spatial arrangement provides a new perspective on the well established normalization model (Carandini and Heeger, 2011). The basic form of normalization equations (Eqs. 5, 6) predicts that the response weight for a stimulus component increases with its signal strength, but does not consider the spatial arrangement of the visual stimuli. We made a surprising finding that MT response to overlapping stimuli cannot be predicted by the population neural responses in MT elicited by the individual stimulus components. One must consider the neural computations occurring along the hierarchical visual pathway. Majaj et al. (2007) showed that pattern direction-selective neurons in MT characterized by overlapping drifting gratings (i.e., plaid) did not integrate the directions of the component gratings when they were spatially separated within the RF, suggesting that the computation underlying pattern direction selectivity in MT is local. Different from the plaid, the overlapping random-dot stimuli used in our study elicit the percept of motion transparency. We showed that changing the spatial arrangement of visual stimuli can have a substantial impact not only on motion integration but also on the competition between multiple stimuli. Our results revealed that contrast has a dominant effect in determining stimulus competition within a local spatial region when multiple stimuli differ in more than one feature domain. When visual stimuli are spatially separated, the effect of contrast is substantially reduced.
A seminal model involving MT neurons pooling inputs from V1 and divisive normalization in both V1 and MT has been successful in explaining a range of experimental results of MT responses (Simoncelli and Heeger, 1998; Rust et al., 2006). However, the model in its original form does not specify how features are spatially integrated, and it does not differentiate overlapping and spatially separated stimuli (Majaj et al., 2007). In our study, we adapted this model to simulate both overlapping and spatially separated conditions and showed that the framework can explain our main physiological findings. Also using this model, Busse et al. (2009) previously demonstrated the impact of response normalization in V1 on neural response in MT. They showed that, by making the contrasts of two drifting gratings of a plaid to be unequal, the response of a model MT neuron changed from representing the pattern motion of the plaid to mostly representing the higher-contrast grating component, likely due to contrast normalization in V1 (Busse et al., 2009). However, the MT response elicited by the higher-contrast grating alone could also be greater than that elicited by the lower-contrast grating. Response normalization within MT may also contribute to the model-predicted response bias toward the higher-contrast component in MT, akin to our experimental result obtained using random-dot stimuli with unequal contrasts (Xiao et al., 2014). In comparison, our current study provides unequivocal new evidence on how responses in MT are shaped by the hierarchical network. By using two stimuli competing in more than one feature domain, we demonstrated neurophysiologically and computationally the substantial impact of stimulus competition in the input stage on the neuronal responses in MT and how that impact changes with the spatial arrangement of visual stimuli. Our findings may also apply to other visual areas in the hierarchical network, including those in the ventral visual stream where response normalization has been well documented.
Footnotes
This work was supported by National Institutes of Health Grant R01-EY-022443 and in part by the Office of the Director, National Institutes of Health Grant P51-OD-011106 to the Wisconsin National Primate Research Center, University of Wisconsin–Madison. This research was conducted at a facility constructed with support from Research Facilities Improvement Program Grants RR15459-01 and RR020141-01. We thank Jianbo Xiao for assistance on electrophysiological recordings and data analysis, Bryce Arseneau for technical support, Drs. Jennifer Coonen and Saverio Capuano for veterinary care, and Dr. Kevin Brunner for assistance with surgery.
The authors declare no competing financial interests.
- Correspondence should be addressed to Xin Huang at Xin.Huang{at}wisc.edu