Abstract
A key feature of neural networks is their ability to rapidly adjust their function, including signal gain and temporal dynamics, in response to changes in sensory inputs. These adjustments are thought to be important for optimizing the sensitivity of the system, yet their mechanisms remain poorly understood. We studied adaptive changes in temporal integration in direction-selective cells in macaque primary visual cortex, where specific hypotheses have been proposed to account for rapid adaptation. By independently stimulating direction-specific channels, we found that the control of temporal integration of motion at one direction was independent of motion signals driven at the orthogonal direction. We also found that individual neurons can simultaneously support two different profiles of temporal integration for motion in orthogonal directions. These findings rule out a broad range of adaptive mechanisms as being key to the control of temporal integration, including untuned normalization and nonlinearities of spike generation and somatic adaptation in the recorded direction-selective cells. Such mechanisms are too broadly tuned, or occur too far downstream, to explain the channel-specific and multiplexed temporal integration that we observe in single neurons. Instead, we are compelled to conclude that parallel processing pathways are involved, and we demonstrate one such circuit using a computer model. This solution allows processing in different direction/orientation channels to be separately optimized and is sensible given that, under typical motion conditions (e.g., translation or looming), speed on the retina is a function of the orientation of image components.
SIGNIFICANCE STATEMENT Many neurons in visual cortex are understood in terms of their spatial and temporal receptive fields. It is now known that the spatiotemporal integration underlying visual responses is not fixed but depends on the visual input. For example, neurons that respond selectively to motion direction integrate signals over a shorter time window when visual motion is fast and a longer window when motion is slow. We investigated the mechanisms underlying this useful adaptation by recording from neurons as they responded to stimuli moving in two different directions at different speeds. Computer simulations of our results enabled us to rule out several candidate theories in favor of a model that integrates across multiple parallel channels that operate at different time scales.
Introduction
Neural systems have the ability to rapidly optimize computations for their designated task in the face of the changing statistics of sensory input (Smirnakis et al., 1997; Brenner et al., 2000; Fairhall et al., 2001; Wark et al., 2007; Gepshtein et al., 2013). A set of rapid adaptive phenomena described across several sensory systems share common features, in that gain and temporal integration change in a consistent manner as a function of stimulus mean and variance. These properties are apparent in audition (Nagel and Doupe, 2006; Dahmen et al., 2010), olfaction (Olsen et al., 2010), and vision (Shapley and Victor, 1981; Dean and Tolhurst, 1986; Carandini and Heeger, 1994; Bair and Movshon, 2004; Borst et al., 2005). To gain insight into the mechanisms underlying these phenomena, we investigated adaptive temporal integration (ATI) in direction-selective (DS) neurons in macaque primary visual cortex (V1). ATI refers to the ability of neurons to display a shorter window of temporal integration for fast stimuli and a longer window for slow stimuli (Bair and Movshon, 2004).
Several basic biophysical mechanisms are known to cause changes in temporal integration consistent with ATI, and these can be divided into untuned and channel-specific classes. One example of an untuned mechanism is cortical normalization, whereby a pool of neurons, including all orientation preferences drives a gain-control signal (Bonds, 1989; Heeger, 1992). Normalization has been associated with changes in temporal properties and continues to attract interest as a fundamental mechanism of cortical function (Reid et al., 1992; Carandini and Heeger, 1994; Carandini et al., 1997; Kouh and Poggio, 2008; Busse et al., 2009; Reynolds and Heeger, 2009; Sit et al., 2009; Carandini and Heeger, 2012). This and other untuned mechanisms, such as those occurring before orientation channels are established (e.g., Ozuysal and Baccus, 2012), would allow the statistics of motion in any direction (at any orientation) to affect temporal integration in all neurons regardless of preferred direction. Channel-specific mechanisms, however, allow motion signals only within a limited channel (e.g., that which matches the tuning of the neuron) to affect the temporal integration of a neuron. This would hold for adaptive mechanisms within individual DS cells and includes apparently adaptive changes in processing that have been shown to emerge intrinsically through the interaction of stimulus statistics and nonlinearities in the system (e.g., spike generation), without requiring any change in system parameters over time (Paninski et al., 2003; Yu and Lee, 2003; Yu et al., 2005; Gaudry and Reinagel, 2007; Hong et al., 2008). Such nonadapting mechanisms are candidates for explaining ATI (Bair and Movshon, 2004; Borst et al., 2005).
Using a random-motion stimulus with two orthogonal axes to independently drive different orientation channels, we found that ATI is channel-specific and that DS cells can simultaneously display two different profiles of temporal integration when processing two independent motion signals. We demonstrate that these observations are not accounted for by previously proposed mechanisms but can be explained by a parallel channel model.
Materials and Methods
Electrophysiology
Single-unit responses were recorded extracellularly from the primary visual cortex (V1) of 14 (5 male, 9 female) anesthetized, paralyzed macaques (Macaca mulatta). All procedures conformed to United Kingdom Home Office regulations on animal experimentation and were approved by the Oxford Committee of Animal Care and Ethical Review, and the Named Veterinary Surgeon of the Oxford University Veterinary Services. Detailed methods are available in our previous study (McLelland et al., 2010). Animals were anesthetized with a combination of respired isoflurane (0.25%) in a moist mixture of 50% O2 and 50% room air, and infused sufentanil citrate (6–30 μg/kg/h) in Hartmann's solution (3 ml/kg/h) supplemented with dextrose (2.5%) and potassium (final concentration 18 mmol/L); and paralyzed with vecuronium bromide (Norcuron, 0.1 mg/kg/h). Artificial respiration was maintained with rate adjustments to keep expired CO2 between 32 and 38 mmHg. Body temperature was maintained near 37°C with a heating pad. EEG and electrocardiogram were monitored to maintain a proper depth of anesthesia. Sterile surgery consisted of a 13 mm trephine craniotomy followed by a small durotomy, placed over parafoveal opercular V1, ∼10 mm lateral to the midline and 4 mm posterior to the lunate sulcus. The corneas were protected with gas-permeable hard contact lenses, with additional lenses to optimize neuronal responses to high spatial frequency (SF) stimuli. After 5 d, animals were given an overdose of sodium pentobarbital (65 mg/kg), exsanguinated, and perfused with 4% paraformaldehyde in saline.
A mechanical microdrive was used to advance quartz-platinum tungsten microelectrodes vertically into the brain (Thomas Recordings). Signals were digitized at 12.5 kHz using a National Instruments analog-to-digital board, and spikes were discriminated using time-amplitude windows (custom software, C-code) and stored at 1 ms resolution.
One experiment was performed in the laboratory of A.K. at Albert Einstein College of Medicine at Yeshiva University, New York (procedures approved by the Institutional Animal Care and Use Committee of the Albert Einstein College of Medicine at Yeshiva University and in compliance with the guidelines set forth in the United States Public Health Service Guide for the Care and Use of Laboratory Animals). Techniques have been described in detail previously (Smith and Kohn, 2008) and were similar to those described above, except that isoflurane was not used during recordings. In addition, recordings were made with a 4 × 4 mm multielectrode array (0.4 mm spacing and 1 mm electrode length, 100 electrodes), implanted into the upper layers of primary visual cortex, ∼10 mm lateral to the midline and ∼8 mm posterior to the lunate sulcus. Events larger than a user-defined threshold were recorded, with subsequent off-line spike sorting to yield single-unit activity.
Visual stimuli
Basic characterization.
We mapped each cell manually, using bars and gratings under mouse control on a CRT (96 or 100 Hz; mean luminance, 27 cd/m2) while adjusting the electrode depth in micrometer increments to obtain well isolated action potential waveforms. We next characterized each cell physiologically by presenting a series of drifting sinusoidal gratings under computer control to generate tuning curves for direction, SF, temporal frequency (TF), and size. Subsequent stimuli were presented with the optimal values of these parameters, except that TF was varied. We classified cells as simple or complex using a modulation index, MI = F1/DC, in response to an optimal drifting grating (Skottun et al., 1991), where DC is the mean evoked firing rate (in excess of spontaneous rate) and F1 is the amplitude of the Fourier component of the response at the TF of the grating.
Random motion stimuli.
To characterize motion integration by DS cells, we used the same dynamic stimulus as used previously (Bair and Movshon, 2004), in which an optimally oriented grating moves randomly (according to psuedo-random m-sequences) back and forth along the axis of preferred motion. During any single trial, the grating moves by a constant magnitude phase shift from one frame to the next, yielding an equivalent temporal frequency (ETF), equal to the TF of the grating if it moved in the same direction for several consecutive frames. ETF values from 0.1 to 25 Hz were tested. Because we used this stimulus to test for the occurrence of ATI in simple cells, as opposed to complex cells in the earlier study (Bair and Movshon, 2004), care was taken to ensure that the starting spatial phase of the grating was near to the preferred spatial phase for the cell. This is important at low ETF values, for which the random motion stimulus will not fully explore the spatial phase domain within a single trial.
Random motion with mask.
In a second series of experiments, we tested whether motion integration along the preferred–antipreferred axis was sensitive to additional motion in the orthogonal axis. We initially tested motion integration in the preferred–antipreferred axis across a range of ETF values as above, but using a 50% contrast grating. We calculated the spike-triggered averages (STAs) in response to these stimuli and selected the lowest and highest ETFs that yielded clear peaks in the STA for use in the subsequent stimuli. Having selected a fast and slow ETF, we then introduced a similar (same position, size, SF, and fast and slow ETFs) randomly moving mask grating at 50% contrast, oriented orthogonally to the target grating. We tested responses to the mask grating alone, and to pairwise combinations of the fast and slow target with fast and slow mask.
Random motion in independent channels.
In a final set of experiments, we tested whether cells could integrate motion in two different directions independently. For this, we used the same pair of orthogonal 50% contrast gratings (fast and slow ETFs) but rotated so that the two gratings were at ±45° to the preferred orientation, such that each grating could drive the cell in a nonoptimal but nonetheless DS manner. For a small number of cells (n = 3), this stimulus did not drive cells adequately (their direction bandwidth was too narrow), and for these cells we rotated both gratings toward the preferred axis, to lie at ±26.5° to the preferred orientation.
Modeling: integrate-and-fire model
We tested the integration properties of a simple leaky integrate-and-fire (LIF) model (Koch, 1999; e.g., Abbott, 1999). The capacitance, C, was fixed (100 pF), and leak conductance, gleak, was varied to control the membrane time constant, τm. The stimulus for this model was an abstraction of the dynamic motion stimulus used to test DS cells in vivo. Specifically, it comprised a binary-amplitude current that randomly took a fixed positive or negative value on each “frame” (10 ms period). The positive value represented movement of the stimulus in the preferred direction, and negative value in the antipreferred direction. The amplitude of this binary current was varied (default value, ±20 pA) to represent stimuli of different strengths (e.g., changes in ETF of the motion stimulus). To this was added a fixed amplitude of Gaussian white noise (SD 40 pA) and a mean DC offset current to bring the stimulus into a range to generate a physiologically relevant firing rate. This offset varied with gleak, from ∼20 to 200 pA. We also included shot noise, in the form of large-amplitude current pulses, with a Poisson distributed rate of 1 Hz, such that a small percentage of generated spikes were completely independent of the binary stimulus. This yields STAs that are more physiological in appearance, but does not qualitatively change the results. STAs were calculated against the binary stimulus.
To emulate our dual-orientation visual stimulus in the LIF model, the input was as above, except that two binary signals of different amplitude were combined (35 and 55 pA in the data presented, although a wide range of values were tested), each independently taking a positive or negative value on each frame. The Gaussian noise and mean DC offset were the same as in the case of the independent signals (i.e., these were not doubled); and to keep the interpretation of results as simple as possible, no shot noise was included in this case.
Modeling: spiking population model
We used a spiking population model of DS cells to gain insight into the mechanisms that could yield changes in temporal integration. This model has been described in detail previously (Baker and Bair, 2012). Briefly, the model comprises 4 subpopulations, as follows. Activity in the LGN layer is generated from a noisy conductance input, the amplitude of which is calculated as the linear convolution of spatiotemporal filters and the visual stimulus. The middle layers comprise excitatory and inhibitory simple cells, with no direction selectivity, but with orientation tuning established by the probabilistic selection of inputs from the LGN layers. Direction selectivity is achieved in the final layer by taking inputs from pairs of simple cells with similar receptive field locations and orientation tunings but a 90 degree offset in spatial phase selectivity (Adelson and Bergen, 1985; Nakayama, 1985). Spikes from the first cell of each pair set up a delayed temporal window that multiplicatively gates spikes from the second cell (Reichardt, 1961). DS cells in the final layer are driven by several such cell pairs and thus have complex receptive fields.
To test a channel-specific model of ATI, we used a modified version of the model in which the first three layers were duplicated to provide a second independent channel of input to the DS cell layer. This new channel had a longer temporal kernel at the LGN and a longer window of temporal interaction for the DS mechanism. Thus, the second channel was tuned for lower TF and slower motion.
Data analysis
STAs were calculated exactly as described previously (Bair and Movshon, 2004), with a box-car type representation of the stimulus that took a value of 1 or −1 for the full duration, ∼10 ms, between frames in which the grating had shifted in the preferred or antipreferred direction, respectively. In calculating STAs for the motion of the mask grating, preferred and antipreferred directions were assigned so that any significant STA peak was positive. Details of statistical tests, calculated using SPSS software, are given in Results.
Results
ATI in simple cells
Before testing whether ATI is broadly tuned or channel-specific, we first tested whether it was apparent in simple cells (n = 15), given that it has previously been reported only in complex cells (Bair and Movshon, 2004). This serves to introduce the phenomenon and the stimulus used to assess it. A negative result would directly and compellingly link ATI to the development of complex cell properties, generally recognized as an essential and critical step in models of the cortical visual hierarchy (Hubel and Wiesel, 1962). We assessed temporal integration for a wide range of stimulus speeds using an ensemble of randomly stepping grating stimuli (Fig. 1A), optimized for orientation, SF, and size, and with starting spatial phase close to the preferred phase of the neuron. Step size (thus speed) varied across trials. The smallest steps moved the grating at an ETF (the TF of the grating if it stepped in the same direction on each frame) of 0.1 Hz, and the largest steps (1/4 cycle) moved the stimulus at ETF 25 Hz. For a typical DS simple cell, Figure 1B shows the set of STAs for nine octaves of stimulus speed. Each trace is the mean of the white-noise velocity stimuli that preceded each spike. The shape of the STAs, and in particular the width, depends on the stimulus speed: faster motion (high ETF) is associated with narrower STAs, and slower motion with wider STAs. To summarize the trend across our population of simple DS cells, we computed the average width (at half-height) of the STAs as a function of ETF (Fig. 1C), which provides a first-order characterization of the duration of time window over which the visual input is integrated to produce the neuronal response. The average STA width ranges from ∼20 ms for the fastest stimulus to over 60 ms for the slowest. As with the complex DS cells studied previously (Fig. 1C, red), this trend was consistent, occurring in every DS simple cell studied, and was not simply inversely correlated with firing rate (Fig. 1D) which did not increase monotonically with ETF.
Standard DS models do not show ATI
Importantly, no such change in temporal integration appears when the same stimuli are presented to a standard motion energy model of direction selectivity (Bair and Movshon, 2004). To test whether biologically plausible network models of DS cells also failed to show ATI (Fig. 2A; see Materials and Methods), we presented our stimulus to a model comprised of spiking cells driven by excitatory and inhibitory conductances. In this model, DS cells are driven by spikes from orientation-tuned non-DS simple cells, which were driven by spiking LGN ON and OFF cells (Baker and Bair, 2012). Figure 2B shows that the STAs from this model also do not show the systematic change in peak width that is characteristic of ATI in neurons (compare Fig. 1B). This raises the question as to what fundamental aspects of the computation leading to cortical DS responses in simple and complex cells are missing from basic models of motion detection.
Differentiating broadly tuned and channel-specific mechanisms of ATI: the target/mask paradigm
We considered several ways that ATI might arise within the context of plausible spiking models of direction selectivity. One possibility is that temporal integration could change early in the system (e.g., in the retina or LGN) as a result of changes in the distribution of power in the stimulus (Shapley and Victor, 1981). We varied the duration of the temporal filters at the LGN level in our network model and found that, indeed, the STAs for the DS cells downstream changed accordingly (Fig. 2C), due to the propagation of temporal integration through the hierarchy. Alternatively, the change in temporal integration could occur in cortex (e.g., as a result of cortical normalization changing τm), the neuronal membrane time constants (Reid et al., 1992; Carandini et al., 1997) because of the higher firing rates and broader TF distribution associated with the faster moving stimuli (Bair and Movshon, 2004). To demonstrate this principle, we examined a single-compartment integrate-and-fire model that received current input that was an abstraction of our random-motion stimulus (Fig. 2D; see Materials and Methods). Peaks in the STA calculated for current input become narrow as τm of the model decreases (Fig. 2E). Both of the above models make the clear prediction that stimuli at orientations other than those preferred by the DS cell would influence the time scale of integration, and thereby change the STA in response to a preferred moving stimulus.
To determine whether ATI is broadly tuned, we tested 27 DS cells in V1 (23 complex, 4 simple) using a variation of our stimulus paradigm in which an orthogonally oriented, dynamically moving mask grating is superimposed on the original optimally oriented target grating (Fig. 3A). Both target and mask can move with low or high ETF. If the mechanism controlling temporal integration is broadly tuned, then the mask should influence the window of temporal integration observed for the target. Figure 3B shows results from an example complex DS cell. The STA calculated for the slow target alone (ETF 1.6 Hz, black trace) is wider than that for the fast target alone (gray trace; ETF 25 Hz), characteristic of ATI. When a fast (25 Hz) orthogonal mask was superimposed on the slow target, the STA was essentially unchanged (Fig. 3B, dark red trace). This was also the case when a slow mask (1.6 Hz) was added (Fig. 3B, dark blue trace). This occurred despite a substantial suppression of firing rate in the presence of either mask (Fig. 3C). The STAs for the fast target (Fig. 3B, gray trace) were also unaffected by the slow and fast (pale blue and pale red traces, respectively; these traces are largely obscured by the overlying gray trace) masks. In short, the STAs for target motion were unaffected by the presence of the masks, indicating that motion signals in orthogonal orientation channels were not involved in setting temporal integration in this neuron.
It was not always the case that the mask had no effect on STAs. Figure 3D shows the results from a different complex DS cell in which the inclusion of the fast mask yielded a marked drop in the amplitude of the STA for the slow target (dark red trace, compare with black) along with a decrease in firing rate (Fig. 3E, dark red bar). This change, however, did not match the prediction of temporal narrowing that should occur if the fast, orthogonal mask were able to influence the temporal integration of the recorded cell. This cell was representative, in that the fast mask with slow target was the only pairing that yielded a change in the STA. The slow mask did not change the STA for either the fast or slow target, and STAs for the fast target were affected by neither mask. This is consistent with the idea that the fast mask generated strong signals, which somehow degraded the directional information in the signals of a weaker stimulus, thereby lowering STA amplitude, but without changing temporal integration (STA width) for the weaker stimulus.
These example cells were typical of the population (n = 27), which is summarized in Figure 4 and Table 1. Finding no difference between results for simple and complex cells, we grouped them together. On average, the masks tended to mildly suppress firing rate to both slow and fast targets (Fig. 4A, left and middle), whereas on their own, they generated a low firing rate above baseline (Fig. 4A, right). Figure 4B–G shows the changes in the STA peak width (left panels) and height (right panels) caused by changes in stimulus speed and by the addition of the masks (distribution means are provided in Table 1). In all cells, consistent with the previous report of ATI (Bair and Movshon, 2004), the change from slow to fast target motion yielded a decrease in STA width (Fig. 4B; significant change on average, paired t test, p < 0.001) but no average change in STA height (Fig. 4E; p > 0.05). In contrast, the inclusion of a fast mask with the slow target had no significant effect on STA width (p > 0.05; Fig. 4C, red dots; contrast with Fig. 4B) but on average caused a significant decrease in STA height (p < 0.001; Fig. 4F, red dots; contrast with Fig. 4E). STAs to the fast target (Fig. 4D,G) were not sensitive to the presence of either slow or fast mask (p > 0.05); and similarly, inclusion of the slow mask did not affect STAs to the slow target (p > 0.05; Fig. 4C,F, blue dots).
We conclude that, because the temporal profile of target motion integration was largely unaffected by mask motion, the mechanism that controls temporal integration is not broadly tuned for orientation and direction. Specifically, consider the case when a vertical grating is moving slowly while a superimposed horizontal grating is moving quickly. Our results imply that the vertically tuned DS population is encoding motion within a longer time window extending further back in time, whereas the horizontally tuned cells are encoding more recent motion in a shorter window, even though these cells have overlapping receptive fields, presumably share common LGN afferents, and could influence each other via classical cortical normalization, if it is indeed at play. Thus, ATI involves orientation-channel specific adaptive computation.
Implications for cortical normalization
Before further pursuing the mechanisms underlying ATI, we briefly address the question of how the fast mask can reduce the amplitude of the STA, thus the directionality of the signal, without affecting the temporal integration. We reasoned above that untuned cortical normalization, if present, was not influencing temporal integration, but consider here whether it might cause the observed amplitude change.
Most current models would suggest that the orthogonal mask should engage normalization, and the decrease we find in mean firing rate is consistent with this, and with cross-orientation suppression, originally attributed to normalization (Burr et al., 1981; Morrone et al., 1982; Bonds, 1989; Carandini and Heeger, 1994). However, the decrease in firing rate might be expected to cause an increase in STA amplitude if it resulted from normalizing inhibition that pulled the membrane potential away from spike threshold, thereby reducing the chance of noise-related discharge. We confirmed this by presenting the target/mask stimulus to our network DS model with and without cortical normalization. Without normalization, we found no influence of the mask on the STAs (Fig. 5A). With normalization, implemented by opening inhibitory conductances in the DS units, the firing rate decreased and the STA amplitude increased (Fig. 5B, red trace). This increase is opposite to the decrease in STA amplitude that we observed in DS neurons for the slow target when the fast mask was included (Figs. 3D, 4F), suggesting that something quite unlike cortical normalization is at play. However, this cannot be taken as evidence against the existence of cortical normalization: the increase in STA amplitude suggested by the model is rather small and could conceivably be masked by whatever mechanism actually does underlie the experimentally observed decrease in STA amplitude (explored below).
Mask-driven reduction in STA amplitude
How then can we account for the mask-associated decrease in STA amplitude (diagrammatically represented in Fig. 6A)? It turns out that addressing this question provides further insight into the mechanism of ATI. From a basic consideration of the STA calculation, the simplest way to scale down the STA amplitude without changing the temporal profile is to increase the fraction of spikes that are not stimulus driven (i.e., that are independent of the signal against which the STA is calculated). This would be the case if extra spikes were driven by the mask because the random sequences driving the mask and target motions are independent. Several lines of evidence suggest that this is the case.
First, if the decrease in STA amplitude were due to the introduction of mask-driven spikes, then we might expect the greatest fall in STA amplitude to occur in those cells driven by the mask itself. We plotted the change in STA height against the firing rate for the mask alone (Fig. 6B) and found that they were significantly inversely correlated across cells (Spearman's ρ = −0.459; p < 0.05). When the mask is present with the target, it is more difficult to judge whether the mask is driving independent spikes because of the opposing influence of cross-orientation suppression (Morrone et al., 1982; Bonds, 1989; Carandini and Heeger, 1994). Nonetheless, we found a significant correlation between the decrease in STA amplitude and the change in firing rate from the target-alone to target-with-mask condition (Fig. 6C; Spearman's ρ = −0.424; p < 0.05). In other words, the cells that showed the largest decreases in STA amplitude were those in which the mask least suppressed, or even increased, the target-driven firing rate.
A second feature relevant to the potential of the mask to drive spikes is direction-tuning bandwidth. Because the mask is 90° away from the preferred orientation, presumably only broadly tuned cells would respond to the mask. As Figure 6D shows, there was a significant correlation between decrease in STA amplitude and direction-tuning bandwidth (Spearman's ρ = −0.422; p < 0.05).
Given these observations, we asked whether cells could be DS for mask motion. This might occur if orientation tuning was broad and asymmetric around the peak value. We therefore computed STAs against the direction of mask motion rather than target motion. A negative finding would not be informative because STAs could be flat either because the mask did not drive spikes or because those responses were not DS. A positive finding, however, would imply that the mask is driving a certain proportion of spikes. Nearly one-third of cells (8 of 25) did indeed show the relevant peaks in STAs, as demonstrated by the mask STAs for an example cell in Figure 6F. We quantified the size of these peaks by measuring power in the mask STA (root mean squared from −100 to 0 ms, normalized by the same measure from −200 to −100 ms) and found this measure to be significantly correlated with the decrease in the amplitude of the target-derived STA (Fig. 6E; Spearman's ρ = −0.591; p < 0.005).
In summary, the effect of the mask appears to be twofold: (1) a mild suppression of firing rate that is consistent with cross-orientation suppression, and (2) the introduction of additional spikes that are uncorrelated with the target motion, possibly from additional noise but sometimes by the mask directly driving the cell, where direction bandwidth is sufficiently broad. This influence of the mask is notably distinct from the prediction that it would change the temporal integration.
Simultaneous target and mask STAs
The finding that cells can have STA peaks for mask motion leads to an important test for mechanisms of ATI: can a single cell simultaneously show different profiles of temporal integration for two superimposed stimuli? We were able to test this directly from the data for the slow target plus fast mask condition in those cells where the simultaneous STA peaks were sufficiently large for both the target and the orthogonal mask. There were only four such cells (although 8 cells showed STA peaks for the fast mask, in 4 of those, STAs for simultaneous slow target motion were too suppressed for their temporal profile to be accurately measured). Nevertheless, in all of these cells, distinct STAs for target and mask calculated for a single set of spikes clearly showed different temporal profiles: wide for the slow target and narrow for the fast mask. An example complex DS cell is shown in Figure 7A. Having made this observation, in the final experiments, we deliberately rotated the original target+mask stimulus by 45°, so that, rather than comparing target and mask stimuli, we presented two orthogonally oriented gratings (Fig. 7B), each of which was likely to drive the cell in a DS manner (in three cells, gratings at 45° to the preferred direction drove cells too weakly, and so we rotated the gratings slightly so that they were at ±26.5° to the preferred direction). We found that not only did the simultaneous STAs have different temporal profiles for the two independent motions in all cells (n = 7), but that by swapping the fast and slow gratings, we could switch the temporal profile of integration associated with each orientation (n = 5; this comparison was not possible for two of the cells because the slow stimulus in one of the orientations did not drive the cell strongly enough to generate a clear STA). Figure 7B presents the relevant STAs from a typical cell. Figure 7C compares STA width for the fast grating to that simultaneously obtained for an orthogonal slow grating from all 7 cells tested (diagonal crosses; for colored crosses, pairs with the same color are from a single cell, where we were able to switch the fast and slow grating orientations and still obtain measurable STA widths for both gratings), as well as from the 4 cells that yielded valid pairs of STAs using the original target and mask paradigm. Fast grating STA width was significantly less than slow grating STA width (paired t test, p < 0.0001, regardless of whether just one or both values from cells giving a pair of results were included).
The observation that single cells show multitemporal encoding, meaning that their discharge simultaneously reflects two different temporal integration profiles for stimulus components with similar temporal statistics (i.e., it is the DX, not the DT that varies with ETF in our motion stimulus), indicates that it is implausible for the observed ATI to arise from an adaptive change at the soma (or whole-cell level) in the DS cells being recorded, simply because the cell presumably could not simultaneously be in the two different adapted states required by the two STA widths. An explanation of ATI in terms of adaptation within the recorded DS neuron could be maintained only if the adaptation remained local to dendritic subcompartments that were segregated based on the preferred orientation of the afferents. As we demonstrate next, the nature of the multitemporal encoding observed here precludes another recently proposed mechanism for ATI.
Stimulus statistics and the spiking nonlinearity
Recently, there has been substantial interest in models that show adaptation to stimulus statistics without requiring a change in model parameters (Rudd and Brown, 1997; Paninski et al., 2003; Yu and Lee, 2003; Borst et al., 2005; Yu et al., 2005; Gaudry and Reinagel, 2007; Hong et al., 2008). These studies have focused mainly on gain adaptation, but some specifically include a temporal adaptive component (Paninski et al., 2003; Yu and Lee, 2003; Borst et al., 2005; Gaudry and Reinagel, 2007). In the simplest case, these effects can arise in a LIF model, through the interaction between stimulus statistics and the spiking nonlinearity. In this case, increases in either mean or variance of inputs can yield a narrowing of the temporal kernel (Paninski et al., 2003; Yu and Lee, 2003).
This kind of mechanism intuitively provides a good candidate for ATI: changes in stimulus ETF could effectively change input variance to DS cells, and the resulting adaptive change in processing would be rapid and channel-specific. However, it is not intuitively obvious whether this would support the multitemporal encoding observed in DS neurons for our target/mask paradigm. We therefore simulated an LIF model with input current that represented our visual stimulus. Specifically, preferred and antipreferred motions were represented as positive and negative values, respectively, of a binary current signal, and ETF as the amplitude of the signal (Fig. 8A; see Materials and Methods). We first confirmed that low- and high-amplitude inputs, presented independently, yielded relatively broad and narrow STAs, respectively (Fig. 8B, solid green and blue traces), consistent with the literature above. However, when both low- and high-amplitude currents were input simultaneously (with independent switching from negative to positive values), the resulting STAs for each signal were both substantially narrowed (Fig. 8B, dashed lines), but now the STA for the low-amplitude input was narrower than that for the high-amplitude input. We repeated this test over a broad range of stimulus parameters (mean current offset, Gaussian noise, and binary signal amplitudes, and for a conductance-based input stepping between different small and large excitatory conductance levels), and it was always the case that, for the combined stimulus, the STA for the low amplitude signal was narrower than that for the high amplitude signal. This behavior does not at all match our in vivo observation that a fast mask induces no change in STA width for a slow stimulus (Fig. 4C). Thus, we see no way that the theoretical model of intrinsic adaptation being put forward, when applied to spike generation in DS cells, can account for our experimental findings.
It is worth making two further observations regarding these simulation results. First, the combined input signal has a higher variance, which is associated with narrower STAs, as mentioned above. It has previously been reported (Bair and Movshon, 2004) that, for an LIF model, the STA for a given input is narrowed when independent noise is added (their Fig. 11). Second, although the LIF model does not capture our results, it does show multitemporal integration, in that the two simultaneously produced STAs have different widths, despite the similar temporal statistics of the independent binary inputs. The two STAs result from a combination of two parallel inputs that differ in amplitude interacting with the spiking nonlinearity. In contrast, we will next examine a model that involves parallel inputs, which can explain our observations without appealing to the spike generating nonlinearity.
A channel-specific model of ATI
We have shown that some major models proposed to account for changes in temporal processing are unsuitable to account for ATI. Here we put forward a model representing a class of circuits that can account for our observations of channel specificity and multitemporal encoding, through the critical property of having parallel channels drive the DS cell. In this particular model, LGN populations with different temporal filters drive orientation-tuned simple cells in two channels with different temporal frequency preferences (Fig. 8C; see Materials and Methods). These simple cells drive DS mechanisms, envisioned here as dendritic subunits (Häusser and Mel, 2003), with different windows of temporal interaction. Each DS subunit contributes a conductance that is summed by the LIF soma of the DS cell. When tested with the dynamic motion stimulus, this model showed a change in STA width as stimulus ETF was changed (Fig. 8D). Further, when driven by the 45°-rotated dual-grating stimulus, the model simultaneously showed two different STA widths (Fig. 8E), and swapping orientations of the fast and slow gratings swapped the kernel widths (Fig. 8E, bottom).
The key feature of this model is that multiple parallel mechanisms impart different temporal signatures to the visually driven signals. These signatures then appear in the STAs. Another model that fits this criterion is the Reichardt detector model (Reichardt, 1961), like that used by Borst et al. (2005), which uses two clearly distinct temporal filter pathways, a low-pass and a high-pass, that stamp their temporal signatures into the STAs. This Reichardt detector model also shows both channel-specific ATI and multitemporal encoding for our direction modulation stimulus (data not shown) but otherwise has temporal properties that differ strongly from those of cortical DS cells.
Other forms of channel-specific models can be conceived that do not make use of parallel, differently tuned TF channels. An alternative could be adaptation within an orientation-specific channel. However, to support different temporal kernels simultaneously, this would again introduce a requirement for multiple parallel (orientation-tuned) channels feeding the DS cell. The model presented suggests a means of testing the contribution of TF-tuned versus orientation-tuned channels experimentally: the current stimulus paradigm separates both TF and orientation, but consider a similar stimulus comprising superimposed gratings with different ETF but identical orientation: if the DS cells are still able to generate distinct STAs to the two gratings, as is the case for our model, then parallel TF channels must play a role.
To return to more general principles, the fundamental point of the model is that we are unable to conceive of nonparallel architectures that can account for our results.
Discussion
In seeking a deeper understanding of how the cortex controls temporal integration of dynamic visual images, particularly with respect to ATI in the signals carried by DS neurons in V1, we have identified two important principles: channel specificity and the capacity for multitemporal encoding. Channel specificity means that the temporal integration of motion at a particular orientation is little influenced by motion in independent orientation channels. It opens the possibility that the visual system may gain some advantage by optimizing separately the processing of signals from distinct components within a local region of the image. It offers a refined perspective relative to that of untuned cortical normalization, which uses one very broadly tuned signal to calibrate the encoding across all orientation components. Multitemporal encoding also refines our image of temporal processing by suggesting that the output of the neuron may represent a multiplex of signals processed at various time scales, as opposed to one temporal signature limited by gross biophysical properties, such as a somatic time constant. Below we discuss how these observations constrain the mechanisms involved in ATI and how they relate to past studies.
Channel specificity of ATI
We have shown that ATI is not consistent with broadly tuned adaptive mechanisms, and thus cannot result from phenomena in the retina or LGN or from cortical normalization, in which signals from stimuli of all orientations become intermixed. Instead, temporal integration is set in a channel-specific manner. Channel specificity has long been recognized for other adaptive processes, such as pattern adaptation (Blakemore and Campbell, 1969; Maffei et al., 1973; Kohn, 2007), which shows specificity for stimulus orientation, SF, and TF. However, these phenomena seem distinct from ATI in that they require prolonged presentation of the adapting stimulus, over the course of seconds or minutes.
Would the channel specificity of ATI contribute to optimization of the system for stimulus statistics? If statistics of visual motion in natural conditions were relatively constant across directions, one might expect an adaptive process that aims to improve sensitivity to be broadly tuned. In effect, the system would be averaging across the speeds of motion present to arrive at a single optimal adapted state. However, if motion statistics can vary substantially across orientation channels, it would be advantageous to have channel-specific tuning as observed here for ATI. Indeed, there are situations in which motion statistics within a scene might vary considerably across directions. For example, self-motion (e.g., running through a forest) makes all things orthogonal to the axis of motion move rapidly, but all things nearly parallel to the motion move slowly. Even in the simplest case of an object translating across the visual field with constant velocity, v, a range of TFs are present, simply because TF is a function of the orientation of components relative to the direction of movement (TF = v × cos(θ) × SF, where θ is the angle between v and the normal to the orientation). Similar reasoning argues for separate optimization in SF channels, as TF is a function of SF for a translating pattern. This calls for further work in terms of measuring statistics of motion across orientation and spatial frequency channels in natural images, and testing whether channel specificity applies to SF in addition to orientation.
Is ATI an intrinsic result of the spiking nonlinearity?
It is known that apparently adaptive changes can arise through the interaction of stimulus statistics and nonlinearities (e.g., spike threshold), without requiring any change in system parameters over time (Rudd and Brown, 1997; Paninski et al., 2003; Yu and Lee, 2003; Borst et al., 2005; Yu et al., 2005; Gaudry and Reinagel, 2007; Hong et al., 2008). It was shown that, in both LIF and Hodgkin-Huxley models, temporal kernels contract with increased stimulus mean or variance (Yu and Lee, 2003). Increasing speed of a dynamically moving stimulus could translate to increased variance in the input to a DS cell, leading to an attractive hypothesis for the generation of ATI. Nevertheless, our results suggest that this hypothesis cannot hold for macaque DS neurons.
First, from a single neuron point of view, changes in input variance or mean will typically correlate with changes in firing rate (Yu and Lee, 2003). It has already been noted that, in vivo, STA widths could be different for equal firing rates (Bair and Movshon, 2004). Here we extend this by showing that, conversely, STA width can remain strikingly constant in the face of significant changes in firing rate caused by the mask (Fig. 3B,C).
Second, we do not observe these effects in simulations using the spiking population model. For example, for the STAs shown in Figure 2B, there is minimal change in STA width despite a more than twofold change in both firing rates and in the variance of conductances to the DS cell. This is not to imply that the model proves that there is no scope for the spiking nonlinearity mechanism to set STA width; indeed, presumably a parameter range exists in which it does. Rather, it demonstrates that in a network of functioning DS units with physiologically relevant single-cell properties and connectivity, the temporal integration of the stimulus reflected in the STA can be dominated by other factors, which in this version of the model includes the temporal filter of the LGN cells and the temporal window of the DS interaction.
Third, and most tellingly, we have shown that individual DS neurons can simultaneously show different profiles of temporal integration for independent visual inputs and that these temporal profiles are unchanged from those observed when each input is presented separately. This behavior turns out to be quite unlike that observed when an LIF model is driven by the sum of two independent inputs with similar temporal properties. Using an LIF model (Fig. 8B), we confirmed the previously reported relationship between input variance and temporal integration (Paninski et al., 2003; Yu and Lee, 2003) for a binary input current and showed that, when two such independent inputs are combined, the STAs for each were distinct. However, unlike the experimental data, both STAs were narrowed. Further, the STA for the lower variance signal was the broader of the two when presented in isolation but was the narrower for the combined signal. This final point is important: regarding the experimental results, one could speculate that secondary changes, such as a decrease in mean input level (e.g., through cortical normalization), could offset the narrowing effect of increased variance for one of the STAs, but because the low variance signal now has a narrower STA than the high variance signal, it is impossible for such secondary effects to restore the original temporal profile of both STAs, as would be required to reproduce our experimental findings.
Candidate mechanisms of ATI
We have shown that ATI is already present in simple DS cells in V1, but that it is unlikely to arise from changes in the temporal integrative properties of cells earlier in the visual processing hierarchy that are untuned for stimulus orientation. Further, it does not rely on a broadly tuned cortical feedback mechanism, nor does it arise at the output of DS cells either via an adaptive mechanism (Sanchez-Vives et al., 2000) or intrinsically through the interaction of input statistics and spiking nonlinearity. Using our spiking population model, we have shown that a system based on parallel channels with different temporal tuning is a plausible solution.
An alternative is classical adaptation, whereby a system parameter changes over time, albeit rapidly, in response to changes in stimulus statistics. However, we cannot envisage how a single channel could simultaneously support two different kernels, requiring two different adapted states, and thus we are constrained again to posit a parallel-channel architecture, this time in the orientation, rather than the TF domain as in our model. This may not be trivial: it would imply that a DS cell having relatively broad directional tuning does so, not because it is driven by a single broadly tuned orientation channel, but because it takes input from a range of orientation channels. Our model suggests an experimental paradigm to test specifically for the contribution of parallel TF or orientation channels (see Results).
Regarding candidates for the adaptive process itself, if there is one, we have already noted that adaptation within the DS cell itself, such as spike rate or calcium-dependent adaptation, must be excluded, unless directional computations and adaptation can be compartmentalized (e.g., within individual dendrites). This is not as unlikely as it might at first seem: the capacity of dendrites to respond nonlinearly according to the spatiotemporal sequence of input activation has recently been demonstrated (Branco et al., 2010); and if the fine-scale connectivity exists to yield DS responses on that basis, then there is no reason to suppose that different dendrites could not adapt independently. Alternatively, a model of ATI for stimulus contrast based on synaptic depression has been suggested (van Rossum et al., 2008). It will be interesting to explore whether the implementation of such a mechanism in a DS model can account for our observations.
Footnotes
This work was supported by a Wellcome Trust Senior Research Fellowship in the Basic Biomedical Sciences. B.A. was supported by BBSRC Project Grant BBC5049431 and by the Wellcome Trust. W.B. was supported by St. John's College, Oxford, and by the National Science Foundation Collaborative Research in Computational Neuroscience Grant IIS-1309725. Recordings in the A.K. laboratory were supported by National Institutes of Health Grant EY016774.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Douglas McLelland, Centre National de la Recherche Scientifique CERCO Unité Mixte de Recherche 5549, Pavillon Baudot CHU Purpan, BP 25202, 31052 Toulouse Cedex, France. mclelland{at}cerco.ups-tlse.fr