A variety of studies in the visual system demonstrate that coarse spatial features are processed before those of fine detail. This aspect of visual processing is assumed to originate in striate cortex, where single cells exhibit a refinement of spatial frequency tuning over the duration of their response. However, in early visual pathways, well known temporal differences are present between center and surround components of receptive fields. Specifically, response latency of the receptive field center is relatively shorter than that of the surround. This spatiotemporal inseparability could provide the basis of coarse-to-fine dynamics in early and subsequent visual areas. We have investigated this possibility with three separate approaches. First, we predict spatial-frequency tuning dynamics from the spatiotemporal receptive fields of 118 cells in the lateral geniculate nucleus (LGN). Second, we compare these linear predictions to measurements of tuning dynamics obtained with a subspace reverse correlation technique. We find that tuning evolves dramatically in thalamic cells, and that tuning changes are generally consistent with the temporal differences between spatiotemporal receptive field components. Third, we use a model to examine how different sources of dynamic input from early visual pathways can affect tuning in cortical cells. We identify two mechanisms capable of producing substantial dynamics at the cortical level: (1) the center-surround delay in individual LGN neurons, and (2) convergent input from multiple cells with different receptive field sizes and response latencies. Overall, our simulations suggest that coarse-to-fine tuning in the visual cortex can be generated completely by a feedforward process.
The manner by which sensory information is encoded and transmitted is a central concern in neurobiology. In the visual system, a number of theoretical (Marr and Poggio, 1979; Watt, 1987), behavioral (Breitmeyer, 1975; Harwerth and Levi, 1978; McSorley and Findlay, 1999; Morrison and Schyns, 2001), and physiological (Ringach et al., 1997; Pack and Born, 2001; Bredfeldt and Ringach, 2002; Mazer et al., 2002; Menz and Freeman, 2003; Frazor et al., 2004; Nishimoto et al., 2005) studies have presented evidence suggesting a sequential analysis of information. Specifically, coarse features of a stimulus are processed before those of fine detail, producing a refinement in resolution as response latency increases. This coarse-to-fine process has been documented for spatial frequency (SF) tuning (Bredfeldt and Ringach, 2002; Mazer et al., 2002; Frazor et al., 2004; Nishimoto et al., 2005), orientation selectivity (Ringach et al., 1997; Chen et al., 2005) (but see Gillespie et al., 2001; Mazer et al., 2002), direction preference (Pack and Born, 2001), and disparity tuning (Menz and Freeman, 2003).
Although the presence of coarse-to-fine processing is well established, the details of where and how this sequential analysis develops are not clear. Physiological studies of the dynamics of SF tuning, perhaps the most fundamental feature of spatial vision, have all been conducted in the primary visual cortex, where it is tacitly assumed the effects originate. Intracortical mechanisms (Bredfeldt and Ringach, 2002) and convergent magnocellular-parvocellular input (Mazer et al., 2002; Frazor et al., 2004) have been proposed as the basis of the effect. However, it is plausible that a coarse-to-fine mechanism originates in subcortical pathways. A well known temporal response difference between center and surround receptive field (RF) components of neurons in the retina (Enroth-Cugell et al., 1983) and lateral geniculate nucleus (LGN) (Dawis et al., 1984; Cai et al., 1997) could form the basis of coarse-to-fine processing which propagates to higher visual areas.
We have undertaken a comprehensive set of studies to elucidate the characteristics of SF tuning dynamics in early and late visual pathways. First, we analyzed the linear spatiotemporal response properties of single-cell recordings from a large population of neurons in LGN. Second, we recorded from LGN cells to measure changes in SF tuning directly. Third, we used a feedforward model to analyze how several sources of dynamic input affect processing sequences in the visual cortex. Our experimental results show conclusively that the coarse-to-fine process begins in early visual pathways, and our model simulations suggest that tuning changes in the visual cortex can be entirely accounted for by feedforward processes. Considered together with other studies, these results point to a generalized scheme of neural processing in the visual pathway that appears to be applicable to other sensory systems.
Materials and Methods
Extracellular recordings are made from cells in the LGN of anesthetized and paralyzed mature cats in conformance with guidelines adopted by the Society for Neuroscience. Single-unit recordings are obtained using multiple tungsten microelectrodes from vertical electrode penetrations at Horsley–Clarke coordinates A6 L9 (Horsley and Clarke, 1908). After a unit is identified by its response waveform, RF parameters are measured using drifting sinusoidal gratings and random-noise stimuli presented on a cathode-ray tube (CRT) monitor (75 Hz refresh rate; 50 cd/m2 mean luminance). Because X and Y cells of the LGN exhibit similar linear RF organization (Cai et al., 1997), we have not differentiated them in this study. All cells described in this study had RFs located within 20° of area centralis. Details of recording procedures have been provided previously (DeAngelis et al., 1993; Cai et al., 1997; Anzai et al., 1999).
SF dynamics of LGN cells are first examined by analyzing spatiotemporal RFs from our large electrophysiology database. Spatiotemporal RFs selected for this analysis were mapped using a one-dimensional sparse noise reverse-correlation technique (DeBoer and Kuyper, 1968) (for details, see Cai et al., 1997). For this version of reverse correlation, the visual stimulus is a random sequence of elongated bright and dark bars displayed at 20 or 30 positions along a stimulus patch (see Fig. 1A). The stimulus patch width is adjusted to cover the entire RF, as estimated with preliminary search programs. Typically, this width is between 3 and 6°. The patch length is set to 15°; thus, each bar is ∼0.3 × 15° in size. Bars are displayed for 13 or 26 ms (one or two frames of the CRT monitor). Because some cells in the LGN are known to have an orientation bias (Vidyasagar and Urbas, 1982) the stimulus bars are adjusted to match the cell's preferred orientation.
We used Fourier analysis to convert response functions from the spatial domain to the SF domain. For each time point of the spatiotemporal map, the spatial waveform is zero padded to fill a 1 × 32 vector. We then applied the fast Fourier transform to the data and analyze the resulting amplitude spectrum. This procedure assumes linear spatial summation of the RF, which is approximately true for the majority of LGN cells (Cai et al., 1997) (but see Bonin et al., 2005). This procedure is also highly dependent on the quality of the mapped RF. Because low signal in the spatial domain can bias responses toward lower frequencies, we limited our analysis to RFs, which are adequately mapped. This is determined objectively by setting a threshold for the signal-to-noise ratio (SNR) of the RF. The SNR is estimated as the SD of the spatial response at the peak correlation delay divided by the SD of the response occurring at negative time delays.
Subspace reverse correlation.
The procedures used for subspace reverse correlation are similar to those used in previous studies of SF dynamics (Bredfeldt and Ringach, 2002; Mazer et al., 2002; Nishimoto et al., 2005). Iso-oriented sinusoidal gratings at 50% contrast, with one of 15 SFs and eight spatial phases, are flashed for 26 ms in a randomized sequence over the RF of the cell. Blanks are not inserted in the stimulus sequence. Gratings are positioned and sized to completely cover the cell's RF, as determined with a preliminary coarse mapping procedure. Most gratings are between 3 and 6° in diameter. The appropriate range of logarithmically spaced SFs and the preferred orientation are determined beforehand with standard grating tests. Reverse correlation stimulus sequences are presented 150–300 times, after which they are cross-correlated with evoked responses to obtain a map of SF and spatial phase selectivity. Maps are calculated in 6 ms bins, the highest temporal resolution we can achieve while maintaining sufficient SNR for all cells. Because response maps are highly linear over phase, we combine phase information by taking the modulation across phases at each SF and correlation delay. This procedure captures all major features of the SF response maps. Three of 35 cells show slight suppression below baseline at very high frequencies over all phases. Although this feature is not captured in the modulating component, temporal dynamics of the tuning peak or width are not affected.
To compare direct measurements of SF temporal dynamics with predictions, we also map spatiotemporal RFs of the same cells with a two-dimensional (2D) dense noise version of reverse correlation (see Fig. 6A). In this paradigm, RFs are mapped with white noise stimuli generated according to binary m-sequences (for details, see Anzai et al., 1999). The stimulus grid, composed of 16 × 16 or 32 × 32 square elements, is updated with each display frame (13 ms). All other properties of the stimulus grid (position, width, and orientation) are identical to those in the subspace reverse correlation procedure, producing stimulus elements which are ∼0.25 × 0.25° in size. The resulting spatiotemporal RF has two spatial dimensions: Y, which is parallel to the preferred orientation of the cell, and X, which is orthogonal to Y. Because we measure SF tuning over the X dimension in our direct measurements (see Fig. 4A), we integrate the RF over Y before taking the Fourier transform and examining SF tuning.
We use dense noise rather than sparse noise to map the RFs for this analysis because it better matches the stimulus energy of the subspace reverse correlation sequence. Because stimulus energy can affect the latency, duration, and magnitude of responses, sequences should be comparable if meaningful comparisons are to be made (Albrecht, 1995). For both the sequence of sinusoidal gratings and the 2D m-sequence, stimuli cover the entire RF at all times, and the luminance across the grid always sums to the mean luminance (neither of which applies in the case of sparse noise).
SF tuning analysis.
SF tuning and dynamics are characterized identically for measured and predicted spectrotemporal RFs. In cases where multiple assessments of SF are made (see Fig. 7), analyses on each set of data are performed independently. For each spectrotemporal RF, we first determine the appropriate time points over which to perform a tuning analysis. The first time point (tinitial) is defined as the slice at which response variance exceeds that of the baseline by >5 SDs (Mazer et al., 2002). Baseline variance is calculated from noncausal delays. Determining the final time point at which to perform analysis is slightly more difficult because temporal response profiles of LGN cells are highly diverse. The majority of thalamic cells exhibit two response phases, although monophasic and triphasic profiles are also observed (Cai et al., 1997). In addition, the relative strengths between phases are heterogeneous: second (and third) phases are often a fraction of initial response, but many cells show additional phases of equal or (in a minority or cells) greater strength (Cai et al., 1997). For these reasons, we limit our analysis of SF tuning to the first phase of the response. The initial phase is also the most likely input to the “initial transient” response in cortical cells (Frazor et al., 2004; Nishimoto et al., 2005) and, therefore, is the most relevant response period to use when comparing geniculate SF tuning dynamics to those in visual cortex. Qualitatively, for cells with a strong biphasic response, the tilt of the second phase in the SF–time plane is similar to the tilt of the first phase. For many cells, second-phase response contours do not extend to SFs as high as during the first phase, although this could be caused by reduced SNR. A quantitative analysis comparing response dynamics between phases has not been performed. The end of the first phase, tfinal, is defined as the time point at which response variance falls to a local minimum after the first peak. If no local minima are present, (i.e., monophasic response profiles), tfinal is defined as the time point at which variance decreases to the baseline level. Using these criteria, the mean duration of the analysis window (tinitial to tfinal) is 30.6 ms, with an SD of 7.9 ms.
To characterize SF tuning dynamics, we examine the tuning peak and width at each time point in our analysis window. Before measuring these parameters, we fit the data with a difference of Gaussians (DOG) function to reduce susceptibility to noise. We find that the least-squares best fit (Levenberg–Marquardt algorithm) accounts for a large degree of the variance in the data (r2 > 0.90 for 95% of curves). From the DOG fit, a cell's optimal SF at time t is defined as the SF at the peak of the tuning curve. Bandwidth is defined as log ratio of the high cutoff SF to the SFpeak: where SFhigh is the SF at which amplitude falls to half of the peak value. To estimate tuning changes over time, we compare parameters at initial and final time points. The change in peak is the log ratio: and the change in bandwidth is the difference, Δbw = bw(tfinal) − bw(tinitial).
To investigate the contribution of feedforward mechanisms to SF tuning dynamics in cortical cells, we model LGN–cortical connections with “push–pull” circuitry (Hubel and Wiesel, 1962; Jones and Palmer, 1987; Ferster, 1988; Reid and Alonso, 1995; Hirsch et al., 1998; Troyer et al., 1998). Other push-only models such as the structural model described in Frazor et al. (2004) could have been used, but this construction does not account for the sharpening in the low-frequency limb of the SF tuning curve, nor is it consistent with pharmacological experiments showing this sharpening involves inhibitory circuitry (Bauman and Bonds, 1991; Vidyasagar and Mueller, 1994; Pernberg et al., 1998). Our model accounts for general aspects of cortical SF tuning, although its scope is limited. We consider only responses of layer IV simple cells, receiving excitatory input from nonlagged LGN cells with central RFs and biphasic temporal structure. Simple cells are modeled as having two dominant RF subregions that do not vary systematically in position over time (i.e., nondirection selective). In addition, our model does not include known structural and biophysical mechanisms, such as expansive output nonlinearities, spike thresholds, or intracortical correlation-based excitation. We exclude these factors to reduce the number of free parameters and to keep the model as simple and interpretable as possible. Previous work (Troyer et al., 1998) has demonstrated that full incorporation of these mechanisms in a computational model produces similar behavior to a more conceptual version. This suggests that a more complex construction might alter the exact numerical results in our simulations, but would not change the general outcome.
For each simulation, we construct one excitatory and one inhibitory cortical cell whose RFs are 180° out of phase. LGN inputs are combined to form cortical cells using rules of connectivity between thalamic and cortical simple cells (Alonso et al., 2001). For simplicity, cortical RFs are modeled as having two primary subregions, although weaker flanking subregions are also present because of LGN RF structure (see Fig. 9, cortical RFs). Primary subregions are separated by 1°, the average distance between subregions from our own database of cortical RFs (data not shown). Each subregion receives input from 15 LGN cells whose positions are drawn from a normal distribution with mean equal to the center of the subregion and an SD of 0.15° (Alonso et al., 2001). The sizes of LGN RF centers are distributed around the widths of the subregions (Alonso et al., 2001), described in greater detail below. In preliminary simulations, we covaried input efficacy with the overlap of geniculate and cortical RFs and also used a “same sign” rule with a probability of 70% in accordance with Reid and Alonso (1995) (Alonso et al., 2001). However, the small number of LGN inputs led to highly variable cortical RFs with occasional atypical organization. Therefore, in the final version of the model, all LGN cells contributing to a single subregion share the same sign and have equal efficacy. Intracortical connections between cells with similar RF structure, which are not included in this model, could function to increase stability and robustness of cortical cells, as has been proposed previously (Troyer et al., 1998).
LGN RF parameters.
Spatiotemporal LGN RFs are modeled as in Cai et al. (1997). Spatial profiles are described with a DOG, and temporal profiles are described as a difference of gamma functions, with distinct center and surround components. The full expression is RF(x,t) = Fc(x)Gc(t) − Fs(x)Gs(t), where and Fs(x) is defined analogously. The temporal filter for the center is as follows: Most parameters are fixed to the geometric means of their distributions as reported in Cai et al. (1997): As/Ac = 0.3; K1 = 1.05; c1 = 0.14; n1 = 7; K2 = 0.7; c2 = 0.12; n2 = 8. During simulations which do not include space–time correlations (see Fig. 9A,C), t1 and t2 are held constant at −6 ms. This set of parameters produces a biphasic temporal profile with a fast, initial phase which peaks at 38 ms and a slower, weaker second phase which peaks at 85 ms and decays fully by 150 ms. Because it is unlikely that all geniculate inputs which converge onto a single simple cell have identical temporal profiles (Alonso et al., 2001), we performed additional simulations in which response latency was permitted to vary. As expected, this addition increases the variability in cortical output, but otherwise yields results identical to those produced with a fixed t1 and t2.
The distribution of LGN RF sizes contributing to a single cortical cell is based on previous reports (Alonso et al., 2001). These data show that LGN RF centers are typically equal to or slightly greater than the subregion width of the cortical cell, although geniculate cells with RF centers larger than two times the subregion width also contribute input, although with reduced frequency. To approximate this distribution for a subregion separation of 1°, the RF center diameter is drawn from N(0.8,0.36) ≥ 0.7° (i.e., a modified Gaussian distribution with mean 0.8° and SD 0.6°, rectified <0.7°). The median of this distribution is 1.15°, and ∼15% of inputs have a RF center that is more than two times the cortical subregion width.
From the LGN RF center, we derive the size of the RF surround and the temporal response function. The size of the surround is related to the size of the center by σs = 1.5 × σc + 0.4, a linear relationship found in previous model fits of LGN spatiotemporal RFs (Cai et al., 1997) (r2 = 0.41; p < 10−6, linear regression). To incorporate correlations between thalamic RF size and response latency (Weng et al., 2005), we can delay or advance the temporal profile by adjusting parameters t1 and t2. Note that changing t1 and t2 corresponds to a simple shift along the time axis, not a stretching or contracting of the curve. For each cell, the time shift is computed by first finding the difference in RF center area (assuming circularly symmetric RFs): where dc is the center diameter of each LGN cell, and dm is the median center diameter from the distribution, 1.15°. We multiply the difference in area, D, by a space–time slope [in milliseconds per degree squared (ms/deg2)] to obtain the temporal shift, which is added to t1 and t2. Space–time slopes are all negative, such that cells with larger RF centers have shorter latencies. For reference, a space–time slope of −3.5 ms/deg2 produces a population of LGN RFs with optimal latencies separated by ∼10 ms. This separation is similar to the measured difference in peak latencies between connected geniculate and simple cells, ∼5–15 ms (Alonso et al., 2001).
To measure the SF tuning of our model cortical cells, we calculate LGN responses to static sinusoidal gratings at different SFs, ranging from 0.01 to 1.5 cycles (c)/deg, and four different phases. Thalamic input to cortical cells is calculated as the sum of the rectified firing rates of each LGN cell, with a geniculate spontaneous firing rate of 10 spikes/s. In addition to excitatory thalamic input, the cortical cell receives input from the inhibitory cortical cell, which is antiphase inhibition parameterized by a weight, W, and a time delay, τ. The total input to the excitatory cortical cell can be expressed as follows: where is the combined output from all the LGN cells connecting to a cortical cell in response to a sinusoidal grating, S, at SF, f, and phase φ. LGNe represents the response from cells connecting directly to the excitatory cell (see Fig. 8A), whereas LGNi refers to responses from cells connected to the inhibitory cell (see Fig. 8B). For simplicity, we have treated the inhibitory response as linear: the effective inhibitory current in the excitatory cell is simply a weighted version of the output from LGN cells (LGNi), with an additional temporal shift to account for the disynaptic pathway. To obtain output from the excitatory cell, the input is integrated with a time constant of 10 ms and rectified.
Following protocols used in previous studies of cortical SF dynamics (Bredfeldt and Ringach, 2002; Frazor et al., 2004; Nishimoto et al., 2005), full spectrotemporal maps for each cortical cell are obtained by averaging responses to stimulus gratings across all phases. Tuning shifts are then calculated by comparing parameters from time points at which response variance rises or falls to 20% of the maximum value. The duration of this time period is ∼40 ms, facilitating comparisons to earlier reports (Frazor et al., 2004; Nishimoto et al., 2005). Because our model permits the precise position and size of LGN RFs to vary, we repeat simulations with each set of parameters 15 times.
Our results are organized into three sections. First, we describe the SF dynamics observed in a large population of LGN cells as predicted from spatiotemporal RFs. Second, we assess the validity of these predictions using a subspace reverse correlation technique for a sample of cells. Third, we present a simple model, based on push–pull circuitry, which examines the extent to which cortical tuning changes can be explained by feedforward input.
Predicted SF dynamics
We analyzed responses of 118 LGN cells from our electrophysiology database for which the spatiotemporal RF of each unit was mapped using one-dimensional sparse noise reverse correlation (Fig. 1A). The spatiotemporal RF for a typical LGN neuron in our sample is shown in Figure 1B. RFs are plotted as contour maps, which indicate dark (blue) and bright (red) excitatory regions. A feature evident in the RF of this OFF-center cell is the delay of the surround response with respect to the center, a property common to most LGN neurons (Cai et al., 1997). Slices through different time points in the RF (Fig. 1C) show the temporal evolution of the spatial profile more clearly. At t = 20 ms, the center response is already present, but the surround response does not fully develop until t = 40 ms.
This characteristic temporal inseparability in the spatial domain predicts corresponding changes in the spectral domain. The spectrotemporal RF, obtained by taking the Fourier transform of the spatiotemporal RF, is shown in Figure 1D. This cell is strongly biphasic, and exhibits two distinct response regions, both of which are slightly slanted in the SF–time plane. This slanting indicates dynamic SF tuning. Time slices through the spectrotemporal RF (Fig. 1E) show that SF tuning is low-pass and broad at time points during which only the center is present (Fig. 1C). The later development of the surround suppresses responses to very low SFs, and the tuning becomes bandpass.
To quantify tuning changes, we find the optimal SF (Fig. 1F) and tuning width (Fig. 1G) for each time point, and then calculate the changes in these values over time. We fit the data (Fig. 1E, blue dots) with a DOG function (Fig. 1E, black lines) to improve our estimation of tuning parameters. For cells with multiphasic temporal profiles, we limit our analysis to points spanning only the first phase of the response (∼30 ms), because some cells have a substantially weaker second phase which may not be captured well by the RF mapping procedure.
The distributions of average tuning parameters (Fig. 2A,B) and the temporal shifts in these parameters (Fig. 2C,D) are shown for our sample of LGN cells. Average tuning curves are obtained by integrating the spectrotemporal response from tinitial to tfinal, where tinitial is the first time point used in our analysis [e.g., t = 20 (Fig. 1)], and tfinal is the last. The distribution of optimal SF (Fig. 2A) is broad, ranging from 0.01 to 0.75 c/deg, with a mean of 0.26 c/deg. Figure 2B summarizes the tuning bandwidths found in this population. The low selectivity exhibited by most LGN cells (Lehmkuhle et al., 1980; Troy, 1983) is evident, as most half widths are >0.8 octaves. For comparison, cells in the primary visual cortex typically have tuning half-widths between 0.3 and 0.8 octaves (Movshon et al., 1978; Nishimoto et al., 2005).
We now examine SF tuning as a function of response latency. If SF tuning is temporally static, the parameters describing tuning curves at each time point will be similar, and the shifts in these parameters should be distributed evenly around zero. However, for our LGN sample, shifts in tuning peak (Fig. 2C) are distributed almost entirely above zero (i.e., preferred SF changes from low to high values during the course of the cell's response). On average, the optimal SF changes by over an octave (1.14 ± 0.076 octaves, mean ± SEM). Similarly, the shifts in tuning bandwidth (Fig. 2D) are nearly all distributed over negative values, corresponding to a narrowing of the curve, or a selectivity increase with greater latency. The mean of this distribution is −0.95 ± 0.072 octaves.
Direct measurements of SF dynamics
Accurate predictions of SF tuning from spatiotemporal RFs require that (1) the spatiotemporal RF of the cell is completely captured with the mapping procedure, and (2) the cell's response is linear. Requirement 1 is highly dependent on the properties of the stimulus used to map the RF. In the spatial domain, large stimuli will blur fine features, biasing results toward lower SFs. Small stimuli, however, may not excite the cell sufficiently to reach threshold. These effects are illustrated in Figure 3, which shows the spatiotemporal and spectrotemporal RFs of a single cell mapped with two different stimulus grid sizes. The RFs achieved with large (Fig. 3A,C) and small (Fig. 3B,D) stimulus grids are similar, but substantial differences are immediately obvious. The spatiotemporal RF obtained with large stimuli (Fig. 3A) exhibits a strong surround in both phases of the response. In comparison, the surround response of the RF mapped with smaller stimuli (Fig. 3B) is considerably weaker in the first phase, and almost nonexistent in the second. As a consequence, the SF response of the second phase (Fig. 3D) is predominately low-pass and shows little of the dynamics clear in the spectrotemporal map obtained with large stimuli (Fig. 3C). However, the large pixels are incapable of capturing the full narrowing of the center component over time. Compared with the fine stimuli spectrotemporal map (Fig. 3D), this leads to a noticeable decrease in the high SF cutoff (Fig. 3C).
Requirement 2, LGN cell response linearity, has been addressed previously (Derrington and Lennie, 1984; Dan et al., 1996; Cai et al., 1997). We note that nonlinear phenomena are present in LGN responses, and that the nonlinear component can be modeled with a suppressive field, preferentially tuned to low SFs (Bonin et al., 2005). Although studies suggest that nonlinear contributions to LGN responses are small, SF-specific suppression with a distinct time course might have large effects on the temporal evolution of the response, and therefore must be considered in this study. To provide a more definitive analysis of LGN SF dynamics, we have measured SF tuning directly as a function of time.
These measurements are conducted using a subspace reverse correlation procedure (Bredfeldt and Ringach, 2002; Mazer et al., 2002; Nishimoto et al., 2005) (Fig. 4A). A randomized sequence of sinusoidal gratings, flashed briefly over the cell's RF, is cross-correlated with the evoked spike train to obtain a 2D map of SF and spatial phase selectivity (Fig. 4B). For the cell shown in Figure 4 and all others (n = 35), the response is sinusoidal as a function of phase. Stimuli evoking responses at one phase (e.g., red contours at 0°) reduce those at the antiphase below baseline level (e.g., blue contours at 180°). This response property enables us to construct a single spectrotemporal RF for each cell by calculating the modulation amplitude across phase for each SF and correlation delay (Fig. 4C). Note that this procedure is analogous to taking the F1 component of a response to a drifting grating. From the modulation spectrotemporal RF, tuning dynamics are assessed using the same procedure that is applied to the predicted spectrotemporal RFs from the database.
Tuning for the example cell is clearly dynamic. Response contours in the SF–time plane tilt slightly to the right (Fig. 4C), and slices at different time points show a gradual shift from low-pass to bandpass tuning (Fig. 4D). The preferred SF (Fig. 4E, top) shifts to higher values over time, and bandwidth narrows (Fig, 4E, bottom). These tuning properties are representative of those observed for our sample of cells, shown in Figure 5. Thirty-two of 35 neurons show a change in optimal SF from low to high values, with an average shift of 1.8 ± 0.2 octaves (mean ± SEM) (Fig. 5C). Likewise, tuning selectivity increases with correlation delay for 31 of 35 cells (Fig. 5D). On average, bandwidth narrows by −0.84 ± 0.17 octaves. From these data, we conclude that SF tuning follows a coarse-to-fine processing sequence for nearly all LGN neurons.
For 28 of the 35 neurons, we also mapped the spatiotemporal RF to compare predicted and measured spectrotemporal tuning. Because stimulus energy affects temporal response characteristics (Albrecht, 1995), we used a 2D dense noise sequence (Fig. 6A) to better match the energy of the stimulus used for subspace reverse correlation (Fig. 4A). Results for three example cells are shown in Figure 6. Data for each neuron are organized into columns, with rows showing the spatiotemporal RF (Fig. 6B), the predicted spectrotemporal RF obtained from Fourier analysis (Fig. 6C), the measured spectrotemporal RF (Fig. 6D), the peak SF over time (Fig. 6E), and the tuning width over time (Fig. 6F).
A qualitative comparison of measured and predicted spectrotemporal tuning suggests that gross response characteristics are well matched. The SF ranges over which each cell responds and the temporal profiles (i.e., strongly biphasic for cell 1, monophasic for cell 2) correspond well. In addition, the parameters describing measured and predicted tuning show similar changes over time (Fig. 6E,F). However, a systematic deviation in tuning is also evident: for all three neurons, the high SF cutoff is higher in the measured case (Fig. 6, compare C, D).
A more quantitative comparison of high SF cutoffs between measured and predicted tuning is presented in Figure 7A. A scatter plot of the data reveals a strong correlation (r = 0.97), although predictions consistently underestimate measured values, particularly at higher SFs. Differences between predicted and measured high cutoffs (projected in the histogram orthogonal to the unity line) are all less than zero, with a mean difference of −0.22 ± 0.04 octaves.
A similar relationship exists between the predicted and measured average optimal SF (Fig. 7B), although differences are less pronounced. Predicted and measured peaks are well correlated (r = 0.90), but most points lie above the unity line, indicating that predictions underestimate true values. Correspondingly, the distribution of differences in the peak (Fig. 5B, top right) is skewed toward negative values, with a mean of −0.42 ± 0.16 octaves.
Predicted and measured rates of change in SFpeak are plotted in Figure 7C. Rates of change are more appropriate for comparisons than absolute shifts because analysis time points are determined independently for the different data sets and durations can differ by as much as 15 ms. The rate of change of preferred SF (in cycles per degree per millisecond) is the slope of the best fit line to the time versus the SFpeak curve (Fig. 6E). The scatter plot in Figure 7C shows that measured and predicted tuning changes are correlated (r = 0.78), and that there are no systematic deviations in the predictions. Points lie about evenly above and below the y = x line, and the distribution of differences is centered close to zero (3.6 × 10−4 ± 8.5 × 10−4 c/deg/ms).
These data are summarized in Figure 7D, which provides comparisons of predicted and measured tuning curves, averaged over all cells, for tinitial (left) and tfinal (right). Predicted tuning curves (dotted lines) peak at lower SFs than measured curves (solid lines) and underestimate responses to higher SFs. These deviations could result from nonlinear contributions or from biases in the mapping procedure (see Discussion). However, because the differences in tuning are somewhat consistent over time, predicted tuning changes are close to the measured values. From this, we conclude that linear predictions of spectrotemporal RFs provide good estimates of SF tuning dynamics for LGN neurons.
Relating coarse-to-fine dynamics in LGN and visual cortex
Previous studies have demonstrated coarse-to-fine SF tuning in the visual cortex (Bredfeldt and Ringach, 2002; Mazer et al., 2002; Frazor et al., 2004; Nishimoto et al., 2005). Our current results raise an obvious question: how much of the cortical effect can be accounted for by feedforward processing from early visual pathways? To address this, we used a model relating thalamic input to the first stage of cortical output (see Materials and Methods for full model details).
Our model uses push–pull circuitry, based on numerous studies of extracellular and intracellular recordings from LGN and the primary visual cortex (Hubel and Wiesel, 1962; Jones and Palmer, 1987; Ferster, 1988; Reid and Alonso, 1995; Hirsch et al., 1998). It is similar in structure to a model developed previously to explore contrast invariant orientation tuning (Troyer et al., 1998). A schematic of the model is presented in Figure 8. Spatially offset ON and OFF LGN cells (represented by their RFs in Fig. 8A) provide excitatory input (push) to distinct ON and OFF subregions of a cortical cell. LGN cells with identical spatial configuration and opposite phase (Fig. 8B) provide inhibitory input (pull) to the cortical cell, after routing through a cortical inhibitory interneuron. Inhibition is parameterized with a weight, W, relative to the level of excitation, and a time, τ, which describes the delay of inhibition relative to excitation.
As described previously (Ferster, 1988; Troyer et al., 1998; Lauritzen and Miller, 2003), this arrangement of inputs can filter spatial information. When inhibition has an equal to or greater weight than excitation (W ≥ 1), only stimuli that preferentially excite one phase of LGN cells (Fig. 8A,B) pass through the filter to produce a response in the excitatory cortical cell. These properties are instrumental in constructing cortical SF tuning. Stimuli at very low SFs excite neighboring ON and OFF LGN cells equally. Thus, downstream cortical neurons are simultaneously excited and inhibited and fail to fire. In contrast, stimuli at higher SFs excite the phases of LGN cells differently to produce a robust response in the excitatory cortical cell. The push–pull configuration thus sculpts the typically low-pass SF tuning of thalamic input into the more bandpass tuning characteristic of cortical neurons.
Within the structure of the push–pull model, we examine several sources of dynamic input which could lead to SF changes in layer IV simple cells. These include (1) mechanisms within single cells (the time delay between center and surround responses in LGN RFs), (2) mechanisms across a population of cells (the correlation between response latency and RF size reported previously) (Derrington and Fuchs, 1979; So and Shapley, 1979; Sestokas and Lehmkuhle, 1986; Weng et al., 2005), and (3) mechanisms within a network (the time delay between feedforward excitation and feedforward inhibition). For each of these mechanisms, we incrementally vary model parameters and measure SF tuning of the model cortical cells.
Shifts in cortical SF tuning originating from different types of spatiotemporal dynamics are depicted in Figure 9. Increasing the time delay (td) between the LGN RF center and surround produces noticeable changes in the geniculate and cortical RFs (Fig. 9A, top). The simple cell has two primary subregions (i.e., LGN centers “map” to two regions), although flanking subregions also develop because of the LGN surround responses. The timing of the weaker flanking subregions is directly affected by the LGN center-surround delay, and these subregions lag behind the dominant subregions for td > 0. This aspect of the model simple cell RF is in agreement with experimental results showing that the weakest cortical RF subregions have the slowest time courses (Alonso et al., 2001).
As the slower flanking subregions develop they narrow the primary subregions, shifting the SF tuning curve to higher frequencies (Fig. 9A, bottom). Changing td from 0 to 12 ms, the shift in tuning peak rises significantly from 0 to 0.5 octaves (p ≈ 0, one-way ANOVA). Varying the center-surround delay also affects the average SF tuning peak, although only optimal SFs from td = 0 ms (0.43 ± 0.006 c/deg) and td = 12 ms (0.41 ± 0.006 c/deg) are significantly different (p < 0.05, one-way ANOVA with Tukey's honest significant difference criterion). The average time delay observed in LGN neurons, ∼6 ms (Fig. 9A, blue circle) (Enroth-Cugell et al., 1983; Cai et al., 1997) produces a cortical SF tuning shift of 0.35 octaves. This value is consistent with measurements of SF shifts for simple cells in the visual cortex, which range from 0.2 to 0.6 octaves and average to ∼0.5 octaves (Bredfeldt and Ringach, 2002; Frazor et al., 2004; Nishimoto et al., 2005).
A second source of spatiotemporal dynamics is the correlation between RF size and response latency (Fig. 9B, top). For cells responding to overlapping regions of visual space, this relationship is strongly linear (Weng et al., 2005) and can be parameterized as a negative slope in the space–time plane with units of ms/deg2. Implementing this space–time slope into the model produces a gradual narrowing of cortical RF subunits over time because LGN cells with smaller RFs respond at longer latencies. As the space–time slope increases in amplitude, the time between responses from large and small LGN cells grows, and the shift in cortical SF tuning peak increases significantly (Fig. 9B, bottom) (p < 10−14). Changes in preferred SF as a result of space–time correlations are of a similar magnitude as those found with the center-surround time delay, reaching ∼0.3 octaves at a slope of −7 ms/deg2. As expected, increasing the slope affects the duration of the cortical response, but does not alter the average tuning peak (mean ± SEM ranges from 0.43 ± 0.006 c/deg at slope = 0 to 0.42 ± 0.006 c/deg at slope = −7 ms/deg2; p > 0.5). A typical slope relating size and latency, estimated as −3.5 ms/deg2 in a study by Weng et al. (2005), produces a cortical SF tuning shift of ∼0.15 octaves.
The final type of temporally dynamic input examined here is the time delay (τ) between excitatory and inhibitory input (Fig. 9C, top). Because the pathway for feedforward inhibition requires two synapses (thalamocortical and corticocortical), the onset of inhibition is delayed with respect to excitation. Intracellular recordings from layer IV simple cells in the primary visual cortex show this delay to be ∼5 ms (Ferster, 1988; Hirsch et al., 1998) (Fig. 9C, bottom, blue circle). Our simulations indicate that the inhibitory delay generates a significant coarse-to-fine SF shift in cortical tuning (p < 10−11), although the magnitude of this effect is quite small compared with other sources of dynamic input. At τ = 10 ms, the inhibitory time delay produces a shift of only 0.06 ± 0.005 octaves, suggesting that it is not the primary mechanism underlying SF tuning dynamics in the visual cortex.
In the previous simulations, the weight of inhibition (W) is fixed at a value of 1.25 (i.e., it slightly exceeds the level of excitation). This value of W is used because it produces SF tuning curves with realistic bandwidths (0.6–0.7 octaves), and is consistent with experimental (Hirsch et al., 1998) and theoretical (Troyer et al., 1998) studies demonstrating the presence and possible function of slightly dominant inhibition. However, because inhibition plays a critical role in shaping cortical tuning, we must also examine how the level of inhibition affects tuning dynamics. In the following simulations, we parametrically vary W while holding other variables fixed at the average values marked with blue circles in Figure 9.
Results from these simulations show that increasing the weight of feedforward inhibition (Fig. 10A) produces dramatic changes in the average SF tuning peak (Fig. 10B) and the shift in tuning peak over time (Fig. 10C). As W increases from 0 to 2, the optimal SF changes significantly from ∼0.32 to 0.42 c/deg, and the peak shift decreases from ∼1 to 0.4 octaves (p ≈ 0 for both). Full spectrotemporal RFs from sample model cells (Fig. 10D) provide an explanation of these effects. When inhibition is absent (W = 0, left), SF dynamics are similar to those observed in LGN cells (compare with Figs. 6D, 10D). The strongly low-pass response at short latencies leads to a large shift in optimal SF (>1 octave), which is comparable with changes observed in the LGN, but is considerably higher than average values observed in simple cortical cells (Bredfeldt and Ringach, 2002; Frazor et al., 2004; Nishimoto et al., 2005).
When the level of inhibition is increased (W = 1.25) (Fig. 10D, right), spectrotemporal dynamics are markedly different. SF tuning is initially bandpass and exhibits a moderate refinement and shift to higher frequencies with greater latencies. We find that for inhibition equal to or greater than the level of excitation, the average preferred SF (∼0.42 c/deg) and shift in tuning peak (∼0.45 octaves) of our modeled cortical cells are in good agreement with experimental results (Movshon et al., 1978; Bredfeldt and Ringach, 2002; Frazor et al., 2004; Nishimoto et al., 2005). This suggests that feedforward dynamic input can account for the SF tuning changes observed in the primary visual cortex, and that additional refinement from intracortical circuitry may not be necessary.
Our experiments in the LGN demonstrate that coarse-to-fine processing of spatial features originates in precortical structures. This effect can be largely accounted for by the temporal delay between center and surround components of the RF, although significant differences exist between measurements and linear predictions of SF tuning dynamics. In general, measured values of peak and high cutoff SFs are higher than their predictions. There are several potential factors that may account for this. First, the sampling grid of the spatial reverse correlation stimuli may bias the SF tuning toward lower values. If the sampling grid is not fine enough, the RF mapping procedure cannot resolve potential size changes of center/surround components. Smaller stimulus impulses can combat this problem, although they may not provide enough energy to elicit spike discharge. This is particularly true for the weaker surround (Fig. 3), which may be consistently underrepresented. A second factor that may explain this discrepancy is the contribution of SF-specific input that is activated differently by the oriented stimuli in the subspace reverse correlation procedure versus the individual pixels in spatial reverse correlation. A likely source of nonlinear input is the perigeniculate nucleus, which responds preferentially to large, low-frequency stimuli and provides inhibitory feedback to relay cells in the LGN (So and Shapley, 1981; Price and Morgan, 1987; Xue et al., 1988; Funke and Eysel, 1998).
Despite the differences between measured and predicted SF tuning, the rates of tuning changes are closely matched (Fig. 7D). This suggests that the SF tuning dynamics are largely a consequence of the spatiotemporal inseparabilities in the LGN RF structure. The center/surround temporal organization of LGN neurons is inherited from retinal ganglion cells, although LGN surrounds are typically more pronounced than those of their retinal inputs (Bullier and Norton, 1979; Usrey et al., 1999; Ruksenas et al., 2000). Thus, it is probable that coarse-to-fine processing begins at the first stage of the visual system, and is then enhanced in the thalamus.
To examine how dynamic thalamic input might affect SF tuning in the visual cortex, we incorporated spatiotemporal inseparabilities into a feedforward model. We find that both the center-surround delay in LGN RFs and the correlation between RF size and response latency are capable of producing cortical SF tuning changes similar in magnitude to those found experimentally (Bredfeldt and Ringach, 2002; Frazor et al., 2004; Nishimoto et al., 2005). Although our current study is the first to identify feedforward dynamics within LGN cells as a mechanism underlying cortical SF tuning dynamics, the space–time relationship observed across a population of cells has been proposed previously (Mazer et al., 2002; Frazor et al., 2004). Using a structural model, Frazor et al. (2004) found that the different response characteristics of magnocellular and parvocellular cells in primate LGN could account for a large portion of tuning dynamics measured in the visual cortex.
Convergence of magnocellular and parvocellular inputs, or X and Y inputs in the cat (So and Shapley, 1979; Sestokas and Lehmkuhle, 1986), can produce shifts in preferred SF as a function of response latency. However, it cannot explain the SF-specific suppression that was reported by Bredfeldt and Ringach (2002). In this study of cortical SF dynamics, we found a delayed “suppressive component” that was typically centered at low SFs. This finding is supported by several previous reports demonstrating inhibitory refinement of cortical SF tuning, particularly at the low-frequency limb of the curve (Bauman and Bonds, 1991; Vidyasagar and Mueller, 1994; Pernberg et al., 1998). Suppression at low SFs has been previously attributed to an intracortical network (Bredfeldt and Ringach, 2002). However, we find that this suppression emerges as a property of the push-pull configuration (i.e., feedforward inhibition) and, thus, it is not necessary to invoke additional cortical circuitry.
It is interesting to note that the dynamic push–pull model that we use has general applications and makes additional predictions of temporal responses in the cortex. In the simulations presented here, we consider only inputs to simple cells. However, there is evidence that some layer IV complex cells achieve phase invariance by receiving “push” from both phases of LGN cells (Hirsch et al., 2003). In these complex cells, SF tuning would be broader and more low-pass than tuning in simple cells, a difference that is reported in several studies (Movshon et al., 1978; Bauman and Bonds, 1991; Vidyasagar and Mueller, 1994). Because these cells might receive primarily antiphase excitation (as opposed to inhibition), they would undergo large SF tuning shifts over time, similar to the W = 0 case in Figure 10D. Notably, observed tuning shifts are approximately twice as large in complex cells (∼1 octave) as in simple cells (∼0.5 octaves) for both the cat and primate visual cortex (Frazor et al., 2004).
In conclusion, our results support coarse-to-fine spatial processing as a general property of information transmission in the visual system, and help to clarify the physiological processes underlying sequential analysis. Dynamic spatial processing begins early in the visual pathway, presumably within the retina (Enroth-Cugell et al., 1983), and propagates via several mechanisms to higher visual areas with more specialized functions (Ringach et al., 1997; Pack and Born, 2001; Bredfeldt and Ringach, 2002; Mazer et al., 2002; Menz and Freeman, 2003; Frazor et al., 2004; Nishimoto et al., 2005). Furthermore, recent work suggests that coarse-to-fine processing is not limited to visual processing. Spatiotemporal RFs of cells in peripheral and central structures of the somatosensory pathway exhibit inseparabilities that generate refined representation of tactile information with increased latency (Sripati et al., 2006). In addition, models of spectrotemporal RFs in the auditory cortex show that discrimination of complex stimuli is enhanced by delayed inhibition in a push–pull circuit (Narayan et al., 2005). Considered together, these studies point to coarse-to-fine processing as a fundamental coding strategy in the CNS.
This work was supported by National Institutes of Health Grants EY01175 and EY03176.
- Correspondence should be addressed to Ralph D. Freeman at the above address.