We examined how spatially directed attention affected the integration of motion in neurons of the middle temporal (MT) area of visual cortex. We recorded from single MT neurons while monkeys performed a motion detection task under two attentional states. Using 0% coherent random dot motion, we estimated the optimal linear transfer function (or kernel) between the global motion and the neuronal response. This linear kernel filtered the random dot motion across direction, speed, and time. Slightly less than one-half of the neurons produced reasonably well defined kernels that also tended to account for both the directional selectivity and responses to coherent motion of different strengths. This subpopulation of cells had faster, more transient, and more robust responses to visual stimuli than neurons with kernels that did not contain well defined regions of integration. For those neurons that had large attentional modulation and produced well defined kernels, we found attention scaled the temporal profile of the transfer function with no appreciable shift in time or change in shape. Thus, for MT neurons described by a linear transfer function, attention produced a multiplicative scaling of the temporal integration window.
Neurons throughout the visual cortex are modulated by spatial attention (Desimone and Duncan, 1995; Braun et al., 2001; Treue, 2003). The number of action potentials produced by a neuron typically increases when attention is directed to the region in space that coincides with the receptive field (RF) of a neuron. How this modulation affects the way neurons integrate visual stimuli is currently an active area of investigation (McAdams and Maunsell, 1999; Treue and Martinez-Trujillo, 1999; Reynolds et al., 2000; Fries et al., 2001; Martinez-Trujillo and Treue, 2002; Niebur et al., 2002). In this study, we used linear systems identification to understand how attention modulates the temporal integration of neurons in the middle temporal (MT) visual area.
Recent studies suggest that attention operates in a multiplicative manner and does not change stimulus selectivity (McAdams and Maunsell, 1999; Treue and Martinez-Trujillo, 1999; Recanzone and Wurtz, 2000). Thus, attentional modulation (AM) can be thought of as a gain change that scales the tuning curves of cortical neurons without altering their shape. This observation has important implications for the way AM of neuronal activity produces attentional effects on the behavior (Cook and Maunsell, 2002a). What is not known, however, is the extent to which the multiplicative action of attention also applies to other aspects of neuronal processing, such as temporal integration.
One possibility is that attention increases the sensitivity of neurons without altering the time course of their response. This would result in the gain change that has been observed in tuning curves. Alternatively, because tuning curves are typically measured by counting spikes over hundreds of milliseconds, it is possible that attention alters both sensitivity and the time course of the neuronal response. Biophysical mechanisms that may underlie a change in the gain of neuronal responses, such as the opening or closing of membrane conductances, would also produce changes in the membrane time constant that could lead to changes in the temporal integration of synaptic inputs (Rall, 1959). Thus, attention might have measurable effects on the time course of neuronal responses. Knowing whether or not attention affects the temporal integration properties of cortical neurons is important for understanding the way attention exerts its influence on neuronal activity and has not been previously examined in detail.
We recorded the activity of single neurons in area MT in response to random dot motion from monkeys under two attentional states. We used the correlation between the motion stimulus and neuronal response to estimate the way the neurons integrated the motion in our random dot stimulus. Neurons that produced linear kernels with well defined regions of integration had distinctly different temporal response properties compared with cells that produced kernels with less-defined integration regions. For the subset of MT neurons that produced good linear kernels and experienced high AM, we found that attention scaled the temporal integration profile without changing its shape. This suggests that attention exerts a multiplicative effect on the temporal integration properties of MT neurons.
Materials and Methods
Behavioral task. Two monkeys (Macaca mulatta) were trained to perform a motion detection task. While the animal fixated on a central point, two patches of random motion were presented (see Fig. 1 A). Initially, there was no net motion (0% coherent) in the stimulus, and at a random time between 500 and 8000 msec, coherent motion began in one of the patches (motion onset times were exponentially distributed). Once coherent motion began, the monkeys had a reaction time window from 200 to 750 msec in which to release the lever to obtain a juice reward. The strength of the coherent motion was varied between three preset levels (low, medium, and high) that spanned the animal's detection threshold. The proportion of trials the animals correctly detected the coherent motion across all experiments was 50, 92, and 99% for the low, medium, and high coherences, respectively. The location and size of one patch was set to overlap the RF of the neuron under study, and the other was diametrically placed on the other side of the fixation point. The direction and speed of the coherent motion was matched to the preferred direction and speed of the neuron. Directional tuning curves were constructed using the motion detection task with high (50%) coherent motion in one of eight directions. Only trials in which the animals maintained fixation within 1° of the fixation point were analyzed. Trials in which the lever was released before the reaction time window were discarded.
At the beginning of each trial, a static cue of stationary dots was presented to indicate which patch was most likely to contain the coherent motion. The animals covertly directed their attention to the cued location during the trial. We refer to the condition in which the animal was cued to direct its attention to the patch of dots in the RF as the “attend-in” condition, whereas attention directed to the opposite patch is referred to as the “attend-out” condition. To verify that the animals were covertly directing their attention to the cued patch of dots, we used an invalid cue in 20% of the trials (Posner, 1980). Motion detection was better and reaction times were faster in the valid cueing conditions, when the animals were instructed to direct their attention to the patch of dots that contained the coherent motion stimulus (Cook and Maunsell, 2002a). For example, when the coherent motion occurred in the uncued patch, the animal's average detection performance dropped from 92 to 59%, indicating the animal used the spatial cue to direct its attention to the patch of dots that would most likely contain the coherent motion. Invalid cueing only occurred for trials containing the medium strength coherent motion. The static cue was presented in the same location for blocks of 15 completed trials.
The monkeys were also trained to perform a standard memory-delayed saccade task (White and Sparks, 1986). In this task, the monkey fixated on a central point while a peripheral target (0.25° in diameter) appeared for 500 msec. To obtain a reward, the monkey had to remember the target location for 500-2500 msec and then, after the central fixation point was extinguished, saccade to within 2.5° of its location within 300 msec.
Visual stimulus. The animal sat 62 cm from a computer monitor (±17 × ±13° of visual angle; 1600 × 1200 pixels; 75 Hz refresh). The stimuli consisted of two patches of white dots (each 0.25° diameter; 78 cd/m2) on a dark gray background (12 cd/m2) with a dot density of 2.1 dots/degree2. Each patch of dots was updated every other frame (i.e., every 26.6 msec) using the following procedure. The dots in each patch were evenly divided into two groups. On each update, one group was replaced with new, randomly positioned dots, whereas dots in the other group were displaced by a fixed distance. The dots in this latter group determined the motion coherence. For 0% coherence, all the dots in this group moved a fixed distance in a random direction. For coherent motion greater than zero, a proportion of the dots moved with a fixed distance in the same direction. This proportion determined the strength of the coherent motion. On the next update (26.6 msec later), the groups were switched. This arrangement insured that all the dots had a lifetime of four video frames (two updates or 53.2 msec) before they were replaced and that there would be no changes in the apparent dot density associated with the onset of coherent motion. Because half the dots are always randomly replotted, regardless of the proportion of dots moving coherently, our motion had a maximum strength of 50% coherent. For example, at 25% coherent motion, half the dots are randomly replotted, one-quarter are moving with the same fixed distance and direction, and one-quarter are moving with the same fixed distance in a random direction.
Data collection. Using standard extracellular recording techniques, we recorded from single neurons in area MT in both animals. When a neuron was isolated, the RF was mapped using a manually controlled bar while the animal fixated on a central spot. The diameter of the RFs ranged from 3.9 to 10.7° (median, 7.4). RF center eccentricities ranged from 3.9 to 11.1° (median, 7.9). The preferred speed was also judged using a bar moved by hand. The animals were trained to perform the task at slow or moderate motion speeds, so we usually selected neurons with a preferred speed between 4 and 12°/sec. For most cells, once the RF location, size, preferred direction, and speed were determined, the memory saccade task was run with the targets at the centers of where the random dot patches would be located. Five to 30 (median, 12) correctly completed trials were collected for this task. The motion detection task was then run, and we recorded from the neuron as long as possible. The number of completed trials per coherence level for the motion detection task ranged from 15 to 175 (median, 35). The monkey's performance varied with patch location, size, and motion speed, which were determined by the response properties of the neuron under study. Consequently, different neurons were tested with different coherence levels. The animal's eye position was measured every 5 msec using a scleral search coil (Robinson, 1963; Judge et al., 1980), and the occurrence of action potentials was recorded to the nearest millisecond.
Extracting the global motion from the stimulus. Given two frames of a random dot stimulus, what is the motion stimulus? To answer this, we used a method described by Barlow and Tripathy (1997) that used the correspondence between dots in two sequential frames. Figure 1 B illustrates this approach using two consecutive frames that each contains four dots. Because it is unknown how each dot moved from frame 1 to frame 2, we calculated all possible motion vectors. The possible ways in which the center dot (labeled 1) could have moved are shown by the four motion vectors. Each motion vector has a direction and magnitude, and there are 16 motion vectors in this example (four possible vectors for each of four dots). Thus for N dots, we computed N2 motion vectors between consecutive updates of the random dot motion stimulus. The set of N2 motion vectors represents our description of the global motion from one update to the next. Because the size and location of the patch of dots was set to overlap the RF of the neuron under study, this model of the motion assumes spatial uniformity of motion vectors within the RF. Thus, our motion description has no spatial component.
Figure 1C shows six updates of a 0% coherent random dot stimulus used during an experiment. Each patch of dots was updated every 26.6 msec (every other video frame). N2 motion vectors were computed for each time point using the current and proceeding update, and their distributions are shown in the polar histograms below each patch. In these histograms, the location of a bin corresponds to motion direction, and the distance a bin is from the origin corresponds to the magnitude of the motion vector (dot speed). Each plot has 96 bins (12 directions × 8 speeds), and the number of vectors per bin is indicated by the grayscale of the bin. For example, the transition from 756 to 783 msec shown in Figure 1C had, by chance, a relatively strong rightward motion component at ∼15°/sec as shown by the two nearby light gray bins (arrow).
One important issue was how to represent the sequence of motion vectors in time. The scanning process of the cathode-ray tube (CRT) display resulted in the dots located at the top of each patch being updated first. We used the simplifying assumption that the dots would be updated simultaneously at the time our CRT display scanned the vertical midpoint of the RF of the neuron. Thus, our motion stimulus was modeled as a set of motion impulses occurring every 26.6 msec (Fig. 1C).
Motion vectors with speeds >23.5°/sec were not used in the analysis because most of the MT neurons we recorded from preferred slower speeds (median, 10.7°/sec; maximum, 16°/sec) and were likely not sensitive to high speeds (Van Essen et al., 1981; Lagae et al., 1993). The reason for the bias toward slower speed preferences was that the monkeys were trained with only low speed coherent motion and had difficulty detecting the coherent motion at higher speeds. Thus, during recording sessions, we selected neurons that preferred the lower motion speeds.
Estimation of the linear transfer function. We assumed that the neuronal response of our MT neurons, R, was equal to the motion in our random dot stimulus, M, convolved with a linear kernel, K, or as follows: 1
We did not use a spike-triggered average approach to estimate K because our motion stimulus, M, contained autocorrelations (i.e., was non-white). To estimate the linear kernel, we instead used the discrete formulation approach (DiCarlo et al., 1998; Theunissen et al., 2001; Blake and Merzenich, 2002). This approach expresses the convolution in Equation 1 as a linear regression problem that compensates for any autocorrelations in the motion stimulus.
Only the 0% coherent motion was used to estimate the linear kernel, K. The neuronal response is modeled as the sum of the output of 96 filters convolved in time for each motion direction and speed (Fig. 1 D). The spike response of the neuron was represented as a sequence of zeros and ones (indicating spikes) at the 1 msec sampling interval.
Using digital signal processing routines in MATLAB, we filtered and resampled the sequence of motion impulses and the recorded spikes at a Nyquist sampling rate of 100 Hz. This corresponds to a sampling interval of 10 msec and limited our maximum frequency to 50 Hz. The filtered and resampled neuronal response was assumed to represent the underlying rate function that drives spike production. Other studies have usually binned the neuronal response (e.g., 10 msec bins) to estimate the rate function (DiCarlo et al., 1998). Although both methods are qualitatively similar, Nyquist filtering and resampling eliminates aliasing that could potentially add noise to the estimated kernels.
We defined each of our 96 motion filters to have nine weights (or taps) to cover a 90 msec range (Fig. 1 D). We varied the latency of the kernel, tD, to account for the neuronal latency and to insure our kernel overlapped the neuronal integration window. We computed a separate kernel for each value of tD and selected the kernel that had the best signal-to-noise ratio (S/N; described below). The neuronal response r at time sample index t can be expressed as the weighted sum of the past motion vectors for each direction and speed, or as follows: 2
where md,s,t is the motion strength and kd,s,t, is the filter (or kernel) weight corresponding to direction index d and speed index s at time sample index t. The value c is a constant to account for any static offsets in r, and tD ranged from 20 to 100 msec in 10 msec intervals. Thus, for each of the 96 direction and speed bins in our motion stimulus, there is a linear filter with nine unknown weights corresponding to 10 msec sample intervals. Combining the output from all the linear filters results in 96 × 9 = 864 total unknown weights plus a constant (Fig. 1 D). We refer to this collection of weights as the linear kernel of the neuron.
Equation 2 can be written as a single linear equation with 865 unknowns for every observation of r(t). For example, a neuron that produced 5 min of data would result in ∼30,000 observations (or equations). From this set of linear equations, the kernel weights were determined using built-in MATLAB routines for solving over-determined systems of linear equations in a least squares manner. Mathematically, this is equivalent to linear regression with 865 unknowns and estimates the first-order Wiener kernel that provides the best approximation to the linear impulse response function of the neuron in direction, speed, and time.
Because the noise in the kernel estimates is uncorrelated among the coefficients (DiCarlo et al., 1998), kernels were smoothed by convolving with a three-dimensional low-pass filter (Hamming-windowed-based, linear phase, with a cutoff set at 0.7 of the normalized sampling frequency; from the Digital Signal Processing toolbox in MATLAB). The impulse response function of this smoothing filter is similar in width to a Gaussian filter with a SD of 1 bin but provides better filtering of the high-frequency noise in our kernels with less distortion of the low-frequency kernel shape. Reducing the noise in our estimated kernels was important for analyzing how the kernels were altered by attention. Using other smoothing filters, such as a 1 bin Gaussian, produced kernels with more noise but did not qualitatively change the results of our analysis.
In displaying the kernel weights, we used grayscale plots that show the kernel coefficients as a function of direction and speed. The center of each of the 96 direction and speed bins corresponds to the value of the kernel weight, and linear interpolation was used between the bin centers.
Assessing the kernel quality. We assessed the quality of our kernels using a S/N. We assumed the total variance of our estimated kernel () was equal to the variance of the underlying kernel () plus the variance of the noise (), or as follows: 3
To estimate the variance of the noise, , we computed a noncausal kernel for each neuron. The noncausal kernel correlated the neuronal response with motion vectors forward in time from 0 to 80 msec. Because future motion vectors should be uncorrelated with the current neuronal response, the true noncausal kernel is zero for all coefficients, and any variations represents noise in our estimation process. To check that the variance of the noncausal kernel provided an accurate estimate of the noise variance, we also computed noise variance using the SD of each kernel coefficient that was returned by our regression package in MATLAB. Both methods produced nearly identical estimates of the noise variance in our kernels (r2 = 0.96 across our population of neurons). However, we used the noncausal kernel to estimate because this allowed us to first smooth the noncausal kernels in the same manner as the causal kernels.
Using the variance of our smoothed noncausal kernel as our estimate of the variance of the noise (), we computed the variance of the underlying kernel of the neuron as follows: 4
We then defined the S/N of the estimated kernel as follows: 5
Thus, for neurons with causal kernels containing well defined regions of integration and noncausal kernels that were mostly flat, the S/N would be large. If no underlying kernel existed for a particular neuron, the causal and noncausal estimates would have equal variances () and the S/N would be zero.
Parametric fitting of the kernels. We fit each smoothed causal kernel with a product of Gaussian function that described the tuning along direction (d), speed (s), and time (t): 6
where G is the gain of the kernel, μd is the preferred direction, σd is the directional tuning width, μs is the preferred speed, σs is the speed tuning width, μt is the peak temporal integration, σt is the width of the temporal integration, and C is an offset term to allow for inhibition. Both attentional conditions were fit separately using optimization routines in MATLAB. We limited σt to a lower bound of 4 msec because this is approximately the smallest Gaussian that could be represented without distortion at our 10 msec sampling interval.
Multiplicative scaling of kernels. We scaled the kernels derived for the attend-out condition (KOUT) to match the kernels derived for the attend-in condition (KIN). Thus, we computed a scale factor, β, that best satisfied KIN = βKOUT. For this, we minimized the χ2 merit function (Press et al., 1986): 7
where kINi and kOUTi are the ith kernel weights for the attend-in and attend-out conditions and and are the variances of the kernel noise estimated from the smoothed attend-in and attend-out noncausal kernels.
We also computed a second scaling factor using the estimated kernel variances and (from Eq. 4) for the attend-in and attend-out conditions. In this case, a scaling factor, γ, that satisfied KIN = γKOUT was estimated as follows: 8
To examine whether kernels changed their shape with attention, we examined the residuals as follows: 9
where Kres is the kernel residual and β is the multiplicative scale factor attributable to attention. If attention produced a true multiplicative scaling of the kernels, then Kres would be flat (or zero). However, if the attention produced a shape change in the kernels, then Kres would have some weights that are non-zero.
A difficulty in using the kernel residuals to determine the effects of attention is that any noise in the estimated kernels also contributes to non-zero elements in Kres. However, given our estimates of the noise in the attend-in and attend-out kernels ( and ), we can predict the variance of Kres that we would expect given a pure scaling with no change in shape. This predicted variance, , of the residual kernel is as follows: 10
where β is our estimated multiplicative scale factor. If attention produced a change in kernel shape, the variance of Kres (referred to as ) would be larger than predicted by Equation 10. This hypothesis can be tested using the following statistic: 11
where L is a χ2 value with ν = 864 - 1 degrees of freedom (Zar, 1999).
The goal of this study was to understand the effects of attention on the neuronal integration of motion in MT. To accomplish this, our analysis proceeded in three steps. First, we determined the linear transfer function (or kernel) that described the integration of the motion stimulus by MT neurons. Second, we selected neurons that produced kernels with clear regions of integration as indicated by their good S/N. Finally, we used this subpopulation of neurons to examine the effects of attention on the kernel. Thus, by examining how attention affected the linear transfer function, we could see how attention affects the temporal integration of MT neurons. The neurophysiological recordings analyzed here have been used in two other studies that examined the link between neuronal activity in MT and the perceptual capabilities of the subjects (Cook and Maunsell, 2002a,b). This new study focuses exclusively on how attention affected the way MT neurons processed the motion stimulus. Portions of our results have been previously published in abstract form (Cook and Maunsell, 2003).
Estimating the MT motion integration kernels
Using neuronal responses to 0% coherent motion and an optimal estimation procedure, we constructed linear models that described how MT neurons integrated the motion stimulus. This type of analysis has been used to examine the integrative properties of neurons in many sensory cortical areas (Jones and Palmer, 1987; DeAngelis et al., 1995; DiCarlo et al., 1998; Blake and Merzenich, 2002) and recently in MT (Bair et al., 1997; Livingstone et al., 2001; Borghuis et al., 2003). The kernel weights in these models describe how an MT neuron integrates the motion signal. We wanted to know how these weights were affected by attention to the stimulus.
The optimal set of kernel weights (also referred to as the first-order Weiner kernel) was determined for both attentional states in each neuron using standard numerical methods (see Materials and Methods). Figure 2A shows the computed kernel weights for one MT neuron. Each column shows the time evolution of the kernels for one attentional condition. The kernel weights are shown using the same polar format as the stimulus in Figure 1C, except that the plots have been smoothed by linear interpolating between bin centers. This interpolation makes it easier to distinguish excitatory (light shades of gray) and inhibitory (dark shades of gray) regions. This particular neuron began integrating the motion ∼40 msec after the motion onset. At 40 msec, the kernel weights showed a clear preference for motion up and to the right at a speed of ∼8.5°/sec (white). There was pronounced inhibition for motion in the opposite direction (black). By 60 msec after stimulus onset, the neuron no longer integrates the motion appreciably. The sequence of kernel weights over time provides a model of how the MT neuron integrated the motion.
Figure 3A shows the time evolution of the kernel weights for another MT neuron. For this cell, the integration window also starts at ∼40 msec, and the preferred direction is toward the left. Note that the grayscale range used for displaying the kernels was optimized individually for each neuron. The structure of our kernels is very similar to that calculated for MT neurons by Livingstone et al. (2001) using a sparse stimulus of two random dots. For both example neurons, the kernels look similar between the two attentional conditions. To more closely address the effects of attention on the kernels, however, we needed to first evaluate the quality of our estimated kernels.
Assessing the estimated kernels
The example neurons in Figures 2 and 3 produced kernels with clear regions of excitatory and inhibitory integration. Not all of our MT neurons produced kernels with as well defined regions of integration as shown by these two example cells. To separate neurons that produced kernels with well defined regions of integration from those that did not, we computed the S/N in our kernels (see Materials and Methods).
To estimate the noise in our kernels, we first computed a noncausal kernel using the same methods to compute the causal kernels described above. A noncausal kernel is the optimal linear transfer function between future motion and current neuronal activity. Because there should be no correlations between the spike rate of a neuron and stimuli that will occur in the future, the expected noncausal kernel is flat with all coefficients equal to zero. Any deviations from zero represent chance correlations. These chance correlations are a measure of the noise in our kernels and can be expressed as the variance of the noncausal kernel or .
For each neuron, we assumed the total variance of our estimated kernel () is attributable to the variance of the underlying kernel () plus noise (). Because we calculate from the noncausal kernel, we can estimate the variance of the underlying kernel by subtracting the noise variance from the total variance of our kernel (Eq. 4). We then computed a S/N for our kernel by dividing the SD of the true kernel by the SD of the noise (σkernel/σnoise). An attractive property of this measure of kernel quality is that no assumptions need to be made regarding the shape of the underlying kernel.
If an estimated kernel produced no well defined regions of integration, then the S/N would be expected to be small. Likewise, kernels with well defined regions of integration as in Figures 2 and 3 would be expected to produce relatively high values for the S/N. For our measure of how well the kernel contained well defined regions of integration, we averaged the S/N from both attentional conditions. The distribution of the average S/N is shown in Figure 4A. For reference, the kernels in Figures 2 and 3 produced an average S/N of 2.4 and 1.8, respectively. Figure 5 illustrates three other example cells with a progressively lower S/N. For kernels with a low S/N, there is less of a discernable region of motion integration. In Figure 5C, a S/N of 0.6 corresponds to a kernel that is dominated by noise and was not included in the analysis.
By inspection, we found that kernels with a S/N <0.75 lacked clear structure. We therefore set an arbitrary S/N acceptance criterion of 0.75, resulting in 44 of 93 MT cells classified as “good S/N” (Fig. 4A, gray bars). The remaining cells were considered to have “poor S/N” and not useful for examining the effects of attention on temporal integration. This is because we could not assess the effects of attention on the structure of the kernels if the kernels lacked any discernable structure to begin with.
None of our results were sensitive to the particular criterion used. Using other measures of kernel quality, such as the goodness of a Gaussian fit or the ability of the kernel to account for the average stimulus selectivity of the neuron (shown below) produced very similar sets of acceptable neurons and did not affect our results or conclusions.
To further assess the kernels, we compared the predicted response of the kernels to that of the actual response of the neurons. We wanted to know whether the kernels captured the directional tuning of the neuron as well as the increased firing rate to coherent motion. To do this, we calculated the average response of the neuron and kernel to coherent motion of varying strength and direction. Average responses were computed using the 300 msec period before and after the coherent motion began on each trial. We compared the average firing rate predicted by the kernel to the actual response of the neuron. It is important to emphasize that the kernel was estimated based on the response of the neuron to the 0% coherent motion. Thus, we tested the model using coherent motion that was not used to construct the model.
Figures 2B and 3B show the average firing rates of the two example MT neurons to coherent motion in eight different directions (gray filled triangles). The average firing rates predicted by the kernels (black filled circles) are also shown. The horizontal lines represent the average and predicted response to the 0% coherent motion that preceded the coherent motion in every trial. Although both kernels capture the directional selectivity of the neurons, the one in Figure 3B does a better job matching the average firing rate of the cell.
We also compared the response predicted by the kernel with the response of the cell to increasing coherent motion strength. Figures 2C and 3C show the average response of both the kernel and cell to the three levels of coherent motion in the preferred direction of the cell. The kernel in Figure 2C matched the firing rate of the cell as the strength of the coherent motion increased, whereas in Figure 3C, the kernel exhibits a slight reduction in gain. The solid horizontal line is the average response to the 0% coherent motion. As expected, the kernels predicted the average response to the 0% coherent motion in Figures 2C and 3C because they were estimated using these data. We also compared the response of the neuron and kernel when the animals withdrew their attention (attend-out), which occurred during the medium strength coherent motion only (Figs. 2C, 3C, open symbols). For this, we used the kernel derived when the animals were attending to the stimulus outside the RF. Withdrawing attention reduced neuronal responses to the medium coherent motion and to the 0% coherent stimulus (dashed lines).
We quantified the ability of the kernel to account for the neuronal response to the coherent motion stimuli. Using the directional tuning and response gain (Figs. 2B,C, 3B,C), we computed the proportion of the variance (rp2) in the neuronal response accounted for by the kernels. Twelve points were used (eight directions, three attended coherent motion levels, and one unattended medium coherent motion). We found a wide range of kernel performances as illustrated by the wide distribution of rp2 values for all MT neurons (Fig. 4B). For comparison, the linear kernels for the example neurons in Figures 2 and 3 had rp2 values of 0.68 and 0.84, respectively. Notably, the kernels that had a good S/N tended to have high rp2 values (Fig. 4B, gray bars) and thus captured the neuronal response to the coherent motion. Figure 4B also indicates some kernels with a poor S/N also had high rp2 values. These cells did not have well defined regions of integration, yet accounted for the mean firing rate to the coherent motion. Visual inspection of these cells revealed that many had either weak directional tuning to begin with or, in most cases, had regions of integration that were overwhelmed with noise.
MT neurons usually increase their response when the animals orient their attention to the stimulus located in the RF of the neuron (Seidemann and Newsome, 1999; Treue and Maunsell, 1999; Cook and Maunsell, 2002a). The amount of AM of the average firing rate in response to the 0% coherent motion (in a ratiometric form) is shown for our population of MT neurons in Figure 4C, and as this histogram illustrates, the amount of modulation varies widely between cells. The mean AM to the 0% coherent motion for cells that produced kernels with a good and poor S/N was 11 and 21% (median, 7 and 16%), respectively. However, this difference in modulation was not significant (p = 0.12; two-sample t test).
Why did some neurons produce kernels with regions of well defined integration (as measured by the S/N), whereas others did not? Possible explanations include failure of our description of the motion stimulus to capture the true input driving the cell and the assumption that MT neurons are linear. The quality of the kernel can also be affected by the limited amount of data available for neurons that had very low firing rates or for which fewer trials were sampled. These last two possibilities were analyzed by computing the partial correlations (ρ) between the number of spikes recorded (ns), the S/N of the kernel, and the amount of AM. We found that the number of spikes recorded correlated with the S/N (ρ between ns and S/N = 0.70). Thus, neurons for which we had more data produced kernels with a higher S/N and better defined regions of integration. We found, however, no strong correlations between the kernel S/N and AM or between the number of spikes and AM (ρ between S/N and AM = -0.02; ρ between ns and AM =-0.15). Other studies that have investigated temporal integration of motion by MT neurons did not quantify the quality of their kernels (Bair et al., 1997; Livingstone et al., 2001; Borghuis et al., 2003), and thus we do not know whether the observed range of quality in our estimated kernels is typical.
In addition to the difference in the amount of AM between cells that produce good and poor S/N kernels, these two populations also exhibited substantially different response dynamics to visual stimulus. Figure 4, D and E, shows the average response of neurons to the onset of the coherent motion and to a flash of a small stationary target in the center of the RF. Normalized population averages are shown for the neurons with good S/N kernels (thick gray) and poor S/N kernels (thin dashed black). For each cell, we normalized by the peak neuronal response, allowing each neuron to contribute equally to the average responses shown. The responses to the coherent motion in Figure 4 D came from directional tuning trials that used the same coherent motion level (50%) for all cells. The responses shown are for motion in the preferred direction (as determined from the directional tuning trials). The responses to the target in Figure 4 E came from memory saccade trials performed for most cells (see Materials and Methods). In this case, the animals were fixating when a small target first appeared in the center of the RF (target on).
It is clear from Figure 4, D and E, that the two populations show different temporal dynamics in response to the same stimuli. For the coherent motion, neurons with good S/N kernels show strong, transient responses each time the coherent motion was updated (every 26.6 msec). Although neurons with poor S/N kernels demonstrate a comparable average response to the coherent motion, their dynamics exhibit much less temporal modulation. For the contrast change produced by the target onset in Figure 4E, cells with good S/N kernels had robust transient responses, whereas the cells with poor S/N kernels responded weakly. One possibility for this difference is that neurons that produced kernels with a high S/N have their response properties tightly linked to the visual stimulus. The response properties of neurons with poor S/N kernels may either be influenced by other aspects of the visual input or contain larger extra-retinal components, neither of which would be captured in our kernel estimates. Another possibility is that the integration window of neurons with poor S/N kernels may have exceeded the 90 msec duration defined by our analysis. We do not think this is likely because we varied the delay between stimulus and response up to 100 msec to make sure our kernels would overlap with the temporal integration window of the cell.
Effects of attention on Gaussian fits of the kernels
We wanted to know how the attentional state of the animal affected the kernels estimated for our MT neurons. To address this question, we only analyzed the population of neurons with good S/N kernels because those were the cells that produced kernels with reasonably well defined regions of motion integration that also tended to account for the average response to the coherent motion.
To reveal how attention affected temporal integration, we first parameterized the shape of the estimated kernels. For this, we fit each kernel with a function that was the product of three Gaussians corresponding to direction, speed, and time (Eq. 6). Figures 2, D and E, and 3, D and E, show the results of fitting kernels for our two example cells. In Figures 2D and 3D, we plot as a function of time the coefficients (or weights) of the kernels and the Gaussian fit corresponding to a constant (preferred) direction and speed. This slice of the kernel through time provides a profile of the temporal integration window. Each plot shows four time slices corresponding to the peak excitatory and inhibitory regions for the two attentional conditions. Fits were done for each attentional condition independently. In Figures 2E and 3E, we plot slices of the kernels through direction (constant speed and time), which shows the directional tuning profiles for both attentional conditions. Spatial attention increased the values of the kernel weights for both cells, although the effect was somewhat larger for the cell in Figure 3.
To examine the effects of attention on the kernels, we parameterized the shape of the kernels using the fits to our Gaussian function. Figure 6A shows five of the Gaussian parameters for the kernels with good S/N cells corresponding to the two attentional states: (1) the amplitude of the kernels (G); (2) the time of the peak excitatory and inhibitory integration regions (μt); (3) the SD of the temporal width of the kernels (σt); (4) the preferred direction of motion that corresponds to the peak (μd); and (5) the SD of the width of the directional tuning (σd). The other three parameters of the Gaussian for speed tuning and offset showed similar relationships with attention. Because the average AM of our neurons with good S/N kernels was relatively weak, we separated out those cells that also had high AM ≥ 10% (Fig. 6A, filled circles) (n = 20).
In Figure 6B, we show the distribution of the AM of each parameter. AM was expressed in a ratiometric form (Pin - Pout)/(Pin + Pout), where P is the parameter of interest. The exception, however, was for the modulation of the time and direction peaks, which was computed as the difference between the two attentional conditions (Pin - Pout). The filled bars correspond to good S/N kernels that also experienced high AM. The median modulation of each parameter for good S/N neurons and good S/N with high-attention neurons is indicated by the open and filled triangles, respectively.
Although there is a relatively large amount of variability in the data in Figure 6, only the kernel magnitude (G) tends to be consistently larger when the animals directed their attention to the RF of the neuron. The median AM of the kernel magnitude is 7 and 17% for the kernels with good S/N and good S/N plus high attention, respectively. This modulation was significantly different than zero for the high-attention group (p = 0.21 and 0.004 for the good S/N and good S/N plus high attention, respectively; Wilcoxon signed rank test). The median modulation of the other parameters was not significantly different than zero for either kernels with good S/N or good S/N plus high-attention groups (Wilcoxon signed rank test). Overall, these data suggest attention scaled the Gaussian fits of the kernels with no appreciable shift in peak or change in shape.
Effects of attention on population kernels
To further examine the effects of attention on the linear model, we compared the average population of good S/N kernels for each condition. For each cell, we normalized the kernel weights to the peak value in the attend-in condition. We then computed the population kernel by averaging the normalized kernels from each cell. Before averaging, we aligned both the temporal peak (at time 0 msec) and the speed peak (at 8.5°/sec) for each kernel based on the Gaussian fit to the attend-in condition. We then rotated the kernel so that the preferred direction was up. Because we used only Gaussian parameters for the attend-in condition to align the kernels, this analysis would reveal any systematic shifts or changes in kernel shape that occurred in the attend-out condition. Figure 7A shows the average population kernels for each attentional condition for our neurons with a good S/N. We chose to align the kernels at the peak to reveal the average integration period.
The average population kernels for the 44 good S/N kernels are very similar for both attentional conditions. Figure 7C shows a comparison of the average normalized firing rate to the coherent motion for the cells and that predicted by the kernels. On average, the kernels capture the directional selectivity and increased response to the coherent motion. Figure 7C reveals, however, that the kernels predicted a persistent reduction in response amplitude at higher coherence levels. Because individual cells were recorded using different levels of coherent motion, responses were grouped into low, medium, and high coherent levels in Figure 7C. The average normalized response in the unattended condition shows a small amount of modulation at both 0% coherent (dashed line) and medium coherent motion (open symbols) for both the cells and kernels.
To better reveal the effects of attention on the kernels, we further subdivided the 44 good S/N neurons by computing an average population kernel using only cells with high AM ≥ 10% (Fig. 7B). The average amount of AM of the spike rate for this group was 33% (median, 27%). For these cells, attention did not significantly alter the shape of the kernels between the attend-in and attend-out conditions. Figure 7D shows the average normalized responses for the cells and predicted response for the kernels corresponding to the high-attention group. Note that the average predicted response did not match the gain of the neurons at high coherence levels. However, for the high-attention group of cells, the effect of withdrawing attention is readily seen in the normalized population response to the 0% coherent motion (dash line) and medium coherent motion (open symbols) for both the cells and their corresponding kernels.
To examine the effect of attention on the integration of the motion, we fit the average population kernels in Figure 7, A and B, with the product of the Gaussian model of Equation 6 (fitting was done in the same manner as for the individual kernels). Figure 8, A and B, show the time, direction, and speed profiles of our population kernels and the optimal fits for both attentional conditions. For the temporal profiles, we show slices through both the peak excitatory and inhibitory integration regions. It should be noted that the reason some of the fits do not appear optimal in Figure 8 is because they were optimized on the entire set of kernel coefficients and not just the subset of data points shown.
For the fits to the population kernels, attention produced a 10 and 34% modulation in the amplitude parameter (G) in the good S/N and good S/N plus high-attention kernels, respectively, and was similar to the AM of the spike rate for the two populations of neurons. The width of the temporal integration window did not appreciably change for the high-attention population (attend-in vs attend-out: σt = 15.0 vs 14.0 msec for all good S/N and σt = 16.2 vs 17.4 msec for good S/N plus high attention). Comparing the two population temporal integration profiles suggests that cells with larger AM tended to have wider temporal integration windows [across our 44 good S/N cells, there was a weak correlation between temporal integration width in the attend-in condition (σt) and the amount of AM of 0.24].
The directional tuning widths of the population kernels were also similar between the two attention conditions (attend-in vs attend-out: σd = 55.6 vs 58.4° for all good S/N and σd = 57.7 vs 59.2° for good S/N plus high attention). Similar effects of attention were observed for the speed profile (attend-in vs attend-out: σs = 4.6 vs 4.7°/sec for all good S/N and σs = 4.0 vs 3.9°/sec for good S/N plus high attention). At the higher speeds, however, our Gaussian fits underestimated the data and may reflect the skewed shape of MT neuron speed tuning expressed on a linear scale (Lagae et al., 1993). Overall, attention scaled the average population kernels with no appreciable change in shape.
AM of kernels and spike rate
Our results thus far suggest that attention scaled the kernels with no appreciable change in shape. However, not all neuronal spike rates were equally modulated by attention. Some neurons experienced more modulation of their spike rate than others. We therefore wanted to know how changes in kernel amplitude corresponded with changes in the neuronal spike rate attributable to attention.
In Figure 9A, we plot the AM of the fitted Gaussian amplitude versus the AM of the spike rate in response to the 0% coherent motion. AM is expressed as a ratiometric index and, as in previous analyses, we only show the 44 neurons classified with good S/N kernels. Although there is scatter, this analysis suggests that as the modulation of the spike rate increases, so does the modulation of the fitted Gaussian amplitude (r2 = 0.46). To reveal how the noise in our kernels affected our analysis, we separated out those neurons with the highest S/N kernels (S/N ≥ 1.3; n = 15) (Fig. 9A, filled circles). Cells with the highest S/N kernels show good correlation between the fitted Gaussian amplitude and spike rate modulation (r2 = 0.52). Thus, for neurons for which we had the most reliable kernel estimates, AM of kernel amplitude and spike rate were closely matched.
Model-free effects of attention on the kernels
Until now, we have used the parameters of the Gaussian fits to examine the effects of attention on the motion kernels in MT. Although most kernels were well described by our product of Gaussian model, it is possible some kernels may have experienced changes attributable to attention that were not captured by the Gaussian fit. To address this, we used a model-free analysis to examine how our kernels changed with attention.
If attention multiplicatively scales the kernels, then it should be possible to estimate this attentional scale factor (β) with no prior assumptions on kernel shape. This can be expressed as KIN = βKOUT, where KIN and KOUT are the attend-in and attend-out kernels, respectively. To compute β, we nonlinearly scaled the attend-out kernels to match the attend-in kernels (Eq. 7). Figure 9B shows the scale factor, β, for each cell plotted against the AM of spike rate. Both β and spike rate modulation are expressed in a ratiometric form. For our 44 good S/N neurons, β and spike rate modulation weakly covaried (all points; r2 = 0.13). However, for the 15 neurons with the most reliable kernel estimates, the multiplicative scaling factor, β, was strongly related to the modulation in spike rate (filled points; r2 = 0.76).
The multiplicative scaling of the kernels with attention can also be computed using the estimated kernel variance . If the kernels are multiplicatively scaled copies of each other, then the scale factor can also be computed as the ratio of the SDs of the kernels (Eq. 8). Figure 9C shows this scale factor, γ, plotted against the AM of spike rate (both expressed as an equivalent ratiometric) and is nearly identical to the nonlinearly estimated scale factor β. Importantly, the two model-free scale factors, β and γ, are very similar to the scaling predicted by the Gaussian fits in Figure 9A. This is especially true for the kernels with the highest S/N. Thus, for cells for which we were able to estimate the most reliable kernels, the multiplicative scaling of the kernels with attention, regardless of how it was computed, did a good job of predicting the AM in average firing rates.
Computing a kernel scale factor does not address whether attention produced a multiplicative scaling alone or also produced a change in kernel shape. To examine whether kernels underwent a change in shape with attention, we computed the residual kernel Kres = KIN - βKOUT, using the estimated multiplicative scaling factor, β. If KIN and KOUT are multiplicatively scaled copies of each other, then Kres would be zero. Alternatively, non-zero elements in Kres would indicate a change in kernel shape with attention. One difficulty in determining whether Kres is flat (or contains all zeros), is that noise in our kernel estimates also contributes to non-zero elements in Kres. To determine whether the non-zero elements in Kres are attributable to noise or a shape change in our kernels, we used the estimates of kernel noise () to predict the variance in Kres (referred to as ) we would expect from a pure multiplicative scaling with attention. If is greater than the expected variance (), this would suggest that attention produced a change in kernel shape.
We used a χ2 distribution to test whether the variance in Kres was greater than expected given the variance of the noise in KIN and KOUT (Eq. 11). For this analysis, we used the scaling factor β to compute the expected variance, , in Kres. The distribution of p values for this test are shown in Figure 9D for our 44 good S/N kernels (one-tailed χ2 for the null hypothesis ).
Most p values in Figure 9D fall well above 0.05, which suggests no kernels underwent a significant change in shape. Because noise in the kernels reduces the power of this test to detect a shape change, neurons with the best S/N estimates are highlighted in Figure 9D (solid bars). Even for kernels with the lowest amounts of noise, the p values are well above significance, suggesting these kernels underwent a pure multiplicative scaling with attention. This result agrees with the population averages in Figure 7 that also suggest AM does not change kernel shape in a systematic way.
We examined the effects of attention on the estimated linear transfer function of MT neurons. From a subpopulation of neurons that produced well defined kernels, we found that attention scaled the directional, speed, and temporal profiles of the linear transfer function without an appreciable change in shape. This suggests that AM does not alter the period of temporal integration of single neurons in MT. In addition, we found that neurons that produced the best kernels with the lowest noise also had their response properties to the coherent motion well described by the kernels and demonstrated strong transient response dynamics.
For kernels with good S/N that also had high AM, the effect of attention was also a multiplicative scaling of the kernel. Attention scaled these kernels by 34% (Fig. 8B), approximately the same amount the average firing rate was modulated by attention. Thus, even strong AM did not overtly change the shape of the integration window or shift the window in time. This agrees with previous observations that suggest attention does not alter the stimulus selectivity of neurons in the visual cortex (McAdams and Maunsell, 1999; Treue and Martinez-Trujillo, 1999; Recanzone and Wurtz, 2000). In these studies, the tuning curves of neurons to various stimulus dimensions (e.g., orientation, motion direction) were multiplicatively scaled by attention. Multiplicative interactions have also been observed between different stimulus dimensions (Tolhurst, 1973; Tolhurst and Movshon, 1975; Holub and Morton-Gibson, 1981; Albrecht and Hamilton, 1982; Sclar and Freeman, 1982; Skottun et al., 1987; Hamilton et al., 1989; Friend and Baker, 1993; McLean and Palmer, 1994; Geisler and Albrecht, 1997), suggesting that attentional and sensory inputs may be processed in a similar manner (McAdams and Maunsell, 1999). That attention does not change the shape of the neuronal kernel is supported by a recent study that found attention had a multiplicative effect on the psychophysically measured perceptual filter in humans performing a visual detection task (Eckstein et al., 2002).
Recently, two abstracts have described the effect of attention on spatiotemporal RFs (STRFs) of neurons in V1 (McAdams and Reid, 2003) and V4 (David et al., 2002). The V1 study reported AM of the peak of the estimated STRFs but did not address whether there were corresponding changes in shape. In contrast, the V4 study, which was computed under free-viewing conditions, reported changes in the spatial and temporal tuning of the estimated STRFs. It has also been suggested that attention produces subtle effects on spike timing without changes in the average rate (Niebur et al., 2002). Because of our 10 msec sampling interval, small changes in response latency or the length of the integration window would not be resolved by our kernel estimates.
Before we began this study, there was no a priori reason to think that spatial attention would multiplicatively scale temporal integration of motion. Previous studies used the number of spikes produced over hundreds of milliseconds to describe the multiplicative effects of attention on stimulus selectivity in cortical neurons. Because changes in the synaptic conductance associated with attention could theoretically alter the membrane time constant of neurons (Rall, 1959), it was possible that attention could have dramatically affected the time course of integration, yet still have a multiplicative effect on spike counts. Other models that include the effects of voltage-dependent mechanisms suggest that the membrane time constant can be maintained under changes in synaptic input (Cook and Johnston, 1999). However, these models do not explain how multiplication, which plays a central role in models of sensory processing, is implemented by neurons (Albrecht and Geisler, 1991; Andersen et al., 1997; Sun and Frost, 1998; Taylor et al., 2000; Pena and Konishi, 2001). Although various mechanisms have been proposed to underlie multiplicative interactions in neurons (Srinivasan and Bernard, 1976; Heeger, 1992; Koch and Poggio, 1992; Mel, 1993; Salinas and Abbott, 1996; Hahnloser et al., 1999; Chance et al., 2002), the biophysical mechanisms remain unclear (but see Gabbiani et al., 2002).
The structure of our MT kernels is equivalent to those estimated by Livingstone et al. (2001). In that study, the authors used the motion vectors between two randomly moving dots located in the RF and compensated for small changes in eye position during fixation. Given the relatively large size of the RFs in our population (3.9-10.7°; median, 7.4°), we do not think that small changes in fixation added much noise to our kernel estimates.
Our average temporal profile of our kernels was similar in width to those computed by Borghuis et al. (2003) and Bair et al. (1997) using reverse correlation with random motion sequences. However, many of our best neurons that produced very high S/N kernels had narrow integration windows compared with these previous studies (such as the neuron in Fig. 2). We do not know the origin of this difference. In fact, neurons that produced well defined kernels with a good S/N tended to have shorter temporal integration (the correlation between the kernel S/N and the width of temporal integration was -0.27). Interestingly, neurons with higher AM tended to have slightly longer temporal integration windows (Fig. 8). The reason for this is not known.
It is also possible that some neurons may have been sensitive to motion occurring at speeds higher than that defined by our kernels. We do not think this was likely because we intentionally selected neurons with low speed preferences. This was because the animals were not trained with faster motion stimuli and had difficulty detecting the coherent motion at speeds greater than ∼15°/sec. Using the peak plus 1 SD along the speed dimension of the Gaussian fits (μs + σs) as a measure of maximum speed sensitivity, for the good S/N kernels, the median and maximum were 12.1 and 23.2°/sec, which are below the speed cutoff of 23.5°/sec used in our analysis.
Our average population kernels (Fig. 7A,B) show no trailing temporal inhibition, although several individual cells showed hints of this feature. In the cross-correlation between neuronal responses in MT and a one dimensional moving stimulus, Bair et al. (1997) showed that some MT cells have kernels with weak trailing inhibition. Borghuis et al. (2003) reported approximately half their kernels showed trailing inhibition of some kind. The reason for this discrepancy is unknown. However, our random motion vectors are a more “broadband-like” stimulus compared with the full-field movement impulses used by both the previous studies. Although we assumed the relationship between stimulus and neuronal responses in MT are stationary, recent results suggest that stimulus attributes can affect the estimated linear transfer function of neurons (Bair et al., 1997; Theunissen et al., 2000; Blake and Merzenich, 2002), and this may explain the difference in our results. In addition, the use of the Gaussian function to parameterize the estimated kernel shapes was an approximation and not suggestive that the kernel profiles are true Gaussian.
Our analysis likely selected neurons that were driven by the global motion as described by our motion vectors. MT neurons respond to other stimulus attributes such as visual disparity (DeAngelis et al., 1998) and changes in contrast such that occurred during the replotting of the dots in our stimulus (flicker). We further assumed that MT neurons integrated the motion vectors uniformly across the RF. Therefore, our model would not account for non-uniform motion integration because of “hot spots” of integration in the RF. In addition, it has been suggested that MT neurons use a power-law summation when presented with multiple stimuli in the RF (Britten and Heuer, 1999), which would not be captured by our linear transfer function.
Livingstone et al. (2001) showed that some MT neurons can integrate motion across several previous stimulus frames, although the contribution of this integration was weak. Such interactions were not included in our model. Other factors, such as the number of spikes available (affected by the duration of recording and responsiveness to the 0% coherent motion), also contributed to whether the kernel of a cell could be estimated. It has also been shown that visual cortical neurons, including MT, respond slower to stimuli that increase firing rates than those that reduce firing rates (Bair et al., 2002). This asymmetry in latency is not featured in our linear model.
Despite these limitations, the kernels did a good job of reproducing the directional selectivity of the 44 good S/N neurons in our population (Fig. 7C). Thus, our estimated kernels captured a large portion of how the neurons processed the motion stimulus. The nonlinear components of the neuronal processing that were not captured by our kernels are evident in the fact the kernels underestimated responsiveness to coherent motion. It is possible that a static nonlinearity could be added to the output of the linear kernel to eliminate these discrepancies (Hunter and Korenberg, 1986). Although it is unlikely that attention would affect the nonlinear components of neuronal processing in such a way as to change the attentional effects on the linear kernels reported here, this is an important topic for future studies.
This work was supported by the Howard Hughes Medical Institute (HHMI) and National Institutes of Health Grant R01 EY05911. J.H.R.M. is an Investigator with the HHMI. We thank W. Bosking, C. Boudreau, J. DiCarlo, G. Ghose, and T. Yang for helpful discussion on all aspects of this project. We also thank J. A. Movshon for comments on this manuscript, X. Pitkow for help with the data analysis, and D. Murray and T. Williford for technical assistance.
Correspondence should be addressed to Dr. Erik P. Cook, Department of Physiology, McGill University, 3655 Sir William Osler, Montreal, Quebec, H3G 1Y6 Canada. E-mail:.
Copyright © 2004 Society for Neuroscience 0270-6474/04/247964-14$15.00/0