Abstract
While studying the visual response dynamics of neurons in the macaque primary visual cortex (V1), we found a nonlinearity of temporal response that influences the visual functions of V1 neurons. Simple cells were recorded in all layers of V1; the nonlinearity was strongest in neurons located in layer 2/3. We recorded the spike responses to optimal sinusoidal gratings that were displayed for 100 ms, a temporal step response. The step responses were measured at many spatial phases of the grating stimulus. To judge whether simple cell behavior was consistent with linear temporal integration, the decay of the 100 ms step response at the preferred spatial phase was used to predict the step response at the opposite spatial phase. Responses in layers 4B and 4C were mostly consistent with a linear-plus-static-nonlinearity cascade model. However, this was not true in layer 2/3 where most cells had little or no step responses at the opposite spatial phase. Many layer 2/3 cells had transient preferred-phase responses but did not respond at the offset of the opposite-phase stimuli, indicating a dynamic nonlinearity. A different stimulus sequence, rapidly presented random sinusoids, also produced the same effect, with layer 2/3 simple cells exhibiting elevated spike rates in response to stimuli at one spatial phase but not 180° away. The presence of a dynamic nonlinearity in the responses of V1 simple cells indicates that first-order analyses often capture only a fraction of neuronal behavior. The visual implication of our results is that simple cells in layer 2/3 are spatial phase-sensitive detectors that respond to contrast boundaries of one sign but not the opposite.
Introduction
The segregation of neurons in the primary visual cortex (V1) into “simple” and “complex” categories (Hubel and Wiesel, 1962) is functionally significant. Hubel and Wiesel (1962) classified neurons as simple if they had non-overlapping excitatory and inhibitory regions and if stimuli placed in inhibitory and excitatory regions had mutually antagonistic effects on the response. The receptive field structure of simple cells permitted a good qualitative prediction of the responses to flashed or modulated stimuli (Hubel and Wiesel, 1962), consistent with the idea that simple cells are approximately linear. Following Maffei and Fiorentini (1973), many scientists have grouped V1 neurons into simple and complex classes based on responses to drifting gratings. Simple cells, being more linear, modulate their firing rate at the temporal frequency of the drifting grating; complex cells respond with unmodulated firing. The ratio of modulated to unmodulated components of the response (the modulation ratio or F1/F0) has been used as a quantitative measure of linearity that correlates well with the original simple/complex criteria (Movshon et al., 1978; De Valois et al., 1982; Skottun et al., 1991; Mata and Ringach, 2005) (but see Kagan et al., 2002).
It is important to study the time course of V1 responses to a variety of temporal modulation waveforms, to understand the different visual functions of simple and complex cells. For example, phenomena such as directional selectivity in V1 simple cells can be understood by considering their spatiotemporal responses (McLean and Palmer, 1989; Reid et al., 1991). Probing the temporal dimension has also revealed cortical nonlinearities. For example, the temporal frequency sensitivity of a simple cell can vary with stimulus temporal waveform (Tolhurst et al., 1980; Dean et al., 1982; Reid et al., 1992).
This study is concerned with how linear the temporal behavior is of simple cells and how visual perception might be affected by the nonlinearities. We studied simple cells with stimuli that were similar in spatial structure but varied in their temporal modulations: (1) optimal gratings flashed for 100 ms (over a discrete set of spatial phases) and (2) a rapid sequence of gratings at random orientations, spatial frequencies, and spatial phases (Ringach et al., 1997). If temporal integration were linear, the time course of the response to a flashed stimulus could be used to predict the dynamics of the response to a stimulus of the opposite contrast polarity. Furthermore, the responses to the flashed stimuli should be well matched by the responses to the random sequence. We performed these tests on simple cells as identified by drifting grating responses with high F1/F0 ratios.
We found that laminar location influences response linearity, presumably through specific local circuitry. In layer 2/3, simple cell dynamics are poorly represented by a linear-threshold (LN) model, because responses to nonpreferred spatial phases are suppressed. The responses of simple cells in layers 4B and 4C are more consistent with an LN mechanism. These results are evidence for a dynamic nonlinearity that affects visual responses, particularly in upper layers. The functional consequence of our results is that simple cells in layer 2/3 are spatial phase-sensitive detectors that respond to contrast boundaries of one sign but not the opposite.
Materials and Methods
Preparation and recording.
Electrophysiological experiments were performed on anesthetized and paralyzed macaque monkeys (Macaca fascicularis), in compliance with regulations of the National Institutes of Health and the New York University Animal Use Committee. Animal preparation and single-unit recording proceeded as described in detail previously (Johnson et al., 2004; Henrie and Shapley, 2005). Anesthesia was induced with ketamine (30 mg/kg, i.m.; Ketaset Fort Dodge Animal Health, Fort Dodge, IA) and maintained with intravenous infusion of sufentanil citrate (6–18 μg/kg/h; Hospira, Lake Forest, IL). After surgery was complete, muscle paralysis was induced and maintained with intravenous infusion of vecuronium bromide (0.1 mg/kg/h; SICOR, Irvine, CA). Expired CO2, blood pressure, electrocardiogram, EEG, and core body temperature were monitored continuously and were used to make sure that anesthesia was maintained at a steady level. Ophthalmic atropine sulfate (1%; Bausch & Lomb, Tampa, FL) was administered to the eyes at the start of the experiment to dilate the pupils. Throughout the experiment, the eyes were protected by clear, gas-permeable contact lenses and a topical antibiotic solution (gentamicin sulfate, 3%; Alcon, Fort Worth, TX). In most experiments, fixation rings (Duckworth and Kent, Hertfordshire, UK) were used to minimize eye movements. When the fixation rings were used, an ophthalmic anti-inflammatory agent (TobraDex; Alcon, Fort Worth, TX) was also applied.
The foveae were mapped onto a tangent screen using a reversing ophthalmoscope. The visual receptive fields of isolated single neurons were then mapped onto the tangent screen with reference to the foveae. Cells were recorded in V1, typically in the region that represents 2–6° eccentricity.
Glass-coated tungsten microelectrodes (Merrill and Ainsworth, 1972) were advanced through a craniotomy over occipital cortex. Extracellular spikes were discriminated and time-stamped with 0.1 ms resolution via custom software running on a Silicon Graphics (Sunnyvale, CA) O2. Only well isolated single units were used in this study and were identified by action potentials that had a fixed shape and were separated by an absolute refractory period.
After an electrode penetration entered the white matter under the cortex, small electrolytic lesions (3 μA for 3 s) were made at several locations along the length of the penetration. At the end of the experiments, the monkeys were killed with an overdose of sodium pentobarbital and perfused through the heart. The electrolytic lesions were found in the fixed brain tissue and were used to reconstruct the electrode tracks and assign the recorded neurons to cortical layers by means of methods described previously (cf. Hawken et al., 1988).
Stimuli.
Stimuli were displayed on a Sony (Tokyo, Japan) Trinitron color cathode ray tube (CRT) model GDM-F520 at a screen resolution of 1024 × 768 and a refresh rate of 100 Hz. The calibrated monitor luminance output was linearized with a lookup table in custom-written software. The mean luminance was 90–100 cd/m2. All stimuli were displayed on a background that was set to this mean.
The dominant eye was determined for each cell, and the other eye was occluded. The receptive field of the neuron was directed to the center of the CRT screen with a mirror. The animals' eyes were 115 cm from the CRT. Near-optimal values of orientation, spatial frequency, temporal frequency, and diameter were determined by hand and fine-tuned by running tuning curves for each of these parameters. The stimuli that were used for these measurements were drifting sinusoidal gratings cropped by a circular boundary. Other parameters that were measured for each cell included sensitivity to contrast, color (using isoluminant and cone-isolating stimuli), and spatial phase (with stationary contrast-reversing gratings). Cells were classified as simple when their F1/F0 ratio was >1.0 for the optimal drifting grating (Skottun et al., 1991).
A temporal step response for each cell was measured using briefly flashed sinusoidal gratings. We took the sinusoid that had produced the largest drifting grating response (best orientation, spatial frequency, and diameter) and presented it at 8 or 16 spatial phases equally spaced through the full 360°. Each stimulus consisted of a sinusoid of randomly selected spatial phase that was displayed for 10 video frames (100 ms) and followed by the mean luminance for 20 or 30 frames (200 or 300 ms), then followed immediately by the next stimulus. At least 16 repeats of each stimulus phase were shown, and additional blocks of 16 repeats were added as needed to achieve a good signal. Because this experiment takes very little time and conveniently can be used to check eye stability, the entire step experiment was sometimes re-run at multiple times on a cell between other experiments. In these cases, statistics were calculated on each run separately and averaged.
The spatiotemporal receptive field was measured using reverse correlation (Ringach et al., 1997). In this experiment, images were drawn randomly from a low-pass subset of the two-dimensional Hartley functions, so we call this experiment the “Hartley experiment” and the images the “Hartley stimuli.” The Hartley stimuli consist of an orthogonal set of sinusoids of evenly spaced orientations, spatial frequencies, and spatial phases. Spatial frequencies ranged from one cycle per stimulus width up to a maximum that was chosen for each cell to be higher than its high-frequency cutoff. Orientations were evenly spaced around the full 360°, and each stimulus in the set was matched by another that was offset by 90° in spatial phase. Stimuli were bounded by a square window, and the width was selected to be no smaller than the optimal drifting grating size. The width was also always at least as large as one cycle of the optimal spatial frequency determined using drifting gratings. Each sinusoid was presented for two consecutive video frames (20 ms) as part of a continuous 15 min stream.
Analysis.
Neurons were identified as simple or complex based on their responses to drifting gratings, and the designations were checked against spatial phase sensitivity to the 100 ms step stimuli. For each neuron, the F1/F0 modulation ratio was calculated from its response to a high-contrast drifting grating that had been optimized for orientation, spatial frequency, and size. Neurons with F1/F0 modulation ratios above 1 were classified as simple; the rest were complex. This classification was also checked against an F1/F0 modulation ratio taken from the 100 ms step responses (the degree of modulation was measured across spatial phases at the time of the peak response), and the two classification schemes were found to match for 96 of the 106 cells for which significant step responses and layer data were available. The distributions of both versions of the F1/F0 modulation ratio were found to be significantly different from a unimodal distribution by Hartigan's dip test (Hartigan and Hartigan, 1985). Because of the good agreement between the two classification methods, the more traditional drifting grating classification was used.
To analyze responses to the 100 ms step stimuli, spikes at each spatial phase were first averaged across repeats in 1 ms bins and then smoothed with a 20 ms boxcar. The spatial phase that produced the peak amplitude was identified, and from its response, the response at the opposite phase (180° away) was subtracted. This procedure removes any baseline activity and makes it convenient to use the opposite spatial phase response (inverted) as an estimate of the inhibitory response to the preferred spatial phase; in a linear system, the two would be equal.
To judge whether the results of the step were compatible with linear temporal integration, we compared the decay of the step response at the preferred spatial phase with the response at the opposite spatial phase. For a cell that is integrating its inputs linearly in time, the amplitude of the opposite-phase response should equal the amplitude of the decay at the preferred phase (see Appendix). The two measures used for the quantitative comparison were calculated from the smoothed 100 ms step responses. The first statistic, the phase ratio (PR) of the step responses, indicates the size of the opposite-phase response compared with the preferred-phase response. PR is the magnitude of the negative peak of the step response divided by the positive peak magnitude (see Fig. 3). The second statistic, the transience (TR), measured the amount of decay from the preferred-phase peak. TR is 1 minus the ratio of the mean of the smoothed response from 80 to 90 ms after response onset divided by the positive peak response (see Fig. 3). Because there is a lag between the stimulus onset and the onset of the response, we must estimate the actual response onset and offset times. The 80–90 ms range was used to avoid capturing any response evoked by the stimulus offset. Response onset was taken as the last point in the response time course (before the peak) that deviated by <1 SD from baseline. In other words, we stepped one point (1 ms) backward in time from the first point 1 SD away from baseline because that first point was often well up the side of the peak as a result of its rapid rise. The baseline level was the average response in the first 30 ms after each stimulus presentation and the last 30 ms of the blank time between stimuli. These times were chosen because all of the simple cell latencies in this sample appeared to be at least 30 ms. The point we calculated as response onset matches well the point one would choose by eye to estimate the response onset.
With the responses to the Hartley experiments, two things were calculated for each cell: an estimate of the first-order spatiotemporal receptive field and an average temporal response to near-optimal stimuli. The reverse correlation algorithm was used to estimate the first-order spatiotemporal receptive field at 1 ms time resolution. The algorithm implicitly smoothed the correlations with a 20-ms-wide boxcar because that was the duration of the individual stimuli. To calculate the average temporal response to the best stimuli, stimulus-triggered averages were calculated for each grating in the Hartley stimulus sequence as follows. Each time a given grating appeared in the stimulus sequence, the next 400 ms of spikes were counted as one trial. For each grating, the mean was taken across all such trials, the result was smoothed with a 20 ms boxcar, and the responses from gratings that were 180° apart in spatial phase were subtracted from one another. Note that the resulting time courses exactly match the ones that are used to weight each orthogonal image in constructing the reverse-correlation estimate of the first-order receptive field. The time courses were sorted by peak height, and the 10 with the highest amplitude peaks were averaged to obtain the mean response to near-optimal stimuli. To determine whether a neuron should be included in the subsequent analysis, the SD of the noise was calculated from the 60 time points in the spikes in the 60ms preceding the grating onset, and the neuron was accepted only if its time course reached a peak that exceeded 7 SDs above 0 (the expected mean of the noise). One of 34 simple cells was dropped because of this criterion.
To measure the relative strengths of response to the preferred- and opposite-phase stimuli, a rebound ratio (RR) was calculated from this average temporal response. The RR was defined as the integral of the negative rebound of the temporal response divided by the integral of the positive peak of the response. The positive peak was summed between the peak onset and the first zero crossing; the negative rebound was summed between the zero crossing and the following 100 ms, to match the integration time of the step stimulus. If there was no negative rebound, the RR was set to 0.
Results
Simple cells used in impulse and step experiments
This study focused on V1 simple cells. Except where noted otherwise, the V1 neurons used in this study were restricted to cells that responded to an optimal drifting grating with an F1/F0 modulation ratio >1. Cells were recorded from all depths (Fig. 1). We displayed at least one of our two stimulus sets to 138 well isolated single units during the course of this study. Of this group, 110 could be positively assigned to a cortical layer; 50% of these neurons had F1/F0 ratios >1. The F1/F0 ratios for the simple cells spanned similar ranges in all layers (Fig. 1), and the overall mean F1/F0 for the simple cells in this study was 1.5.
We used two types of stimuli in this study to test the linearity of temporal integration in V1: sinusoids that changed every 20 ms in a 15 min sequence (the Hartley stimuli; see Materials and Methods) and an optimally chosen sinusoid that was flashed for 100 ms between longer blank periods. This latter stimulus, which we call the “step” stimulus because it consisted of a 100-ms-long step from zero contrast to high contrast and back to zero, tested the neurons as they integrated their inputs during a temporally isolated stimulus presentation. Significant step responses were recorded from 51 simple cells, and significant Hartley responses were recorded from 33 simple cells for which layer information was also available (Fig. 1). We first consider the responses to the step stimulus and how well they can be modeled with a linear temporal integrator plus a half-wave rectifier. We then explore how other nonlinear functions might explain the deviations from linear responses, and, finally, we examine how the results compare with the responses of the Hartley stimuli.
Step responses
Figure 2 shows spike rasters and histograms collected from an example simple cell. All step stimuli in this study were displayed for 100 ms at the orientation, spatial frequency, and spatial extent that had produced peak responses in drifting grating experiments. Each sinusoid was repeatedly presented at 8 or 16 spatial phases equally spaced through the full 360°, and successive stimuli were separated in time by 300 ms for 32 neurons and 200 ms for 19 neurons. Stimulus contrast was high (80–99%), and the background was maintained at the mean luminance value. In Figure 2, the response to flashed sinusoids is displayed at two of the 16 spatial phases that were shown to this example cell. At the spatial phase (337°) that had the largest peak spike rate, the neuron started spiking vigorously around 40 ms but slowed during the stimulus presentation to a rate of ∼15% of the peak value (Fig. 2, left). At the opposite spatial phase (157°), the neuron responded, after the stimulus was shut off, with an average spike rate that reached ∼70% of the preferred-peak response (Fig. 2, right). We quantified both the preferred-phase decay and the opposite-phase response (Fig. 3). These two measures, the TR and the PR, are useful because their magnitudes are equal in a linear system (or a linear system plus a half-wave rectifier), as shown in the Appendix. In such a system, a big decay at the preferred phase is associated with a large “off” response at the opposite phase; both the decay and the off response occur because the impulse response that is integrated is strongly biphasic with a large negative rebound. Conversely, a small decay at the preferred phase predicts that the off response at the opposite phase should be correspondingly small.
The cell shown in Figure 2 (bottom) has a TR of 0.84 and a PR of 0.71. The values are fairly close, as we would expect if the responses were generated by the linear integration of a single underlying impulse response plus a threshold at zero. But the TR and PR are not exactly matched. Given the strong decay at the preferred phase (TR, 0.84), a linear model would predict a slightly bigger opposite-phase response than was measured (PR, 0.71).
Figure 4 consists of examples of the diversity of step responses produced by V1 simple cells. The first example neuron (Fig. 4A) is a robustly responding layer 4B cell. The top box depicts the average response to each spatial phase, shown in separate rows; time progresses from left to right. This cell is obviously phase sensitive, as expected of a simple cell. Near the preferred spatial phase (Fig. 4, lower rows), it produces a response that peaks shortly after response onset and decays while the stimulus is still on. At the opposite spatial phase (Fig. 4, upper rows), spikes appear after the sinusoid is turned off. This cell responds to all spatial phases, but the time course of its responses varies with phase.
Figure 4 also displays for each cell a combined step-response time course (response at preferred phase minus opposite). The values of the TR and PR are recorded above and to the right of the time course. The tight match between these two measures in the Figure 4A example cell (TR, 0.53; PR, 0.52) is consistent with the hypothesis that this cell is linearly integrating its inputs (see Appendix).
As indicated from the set of examples in Figure 4, most V1 cells had transient step responses (cf. Müller et al., 2001). For many of the transiently responding cells, responses at the opposite spatial phase were smaller than expected from a linear device. The example cell in Figure 4C from layer 4C has a response around the preferred phase that decays from its peak almost completely to zero and then is reliably followed by a small increase in spiking. The cell responds to the opposite spatial phase after stimulus offset, resulting in a negative peak in the calculated time course. But the opposite-phase response is smaller (PR, 0.51) than expected given the big decay from the positive peak (TR, 0.78). Example cells from layer 2/3 (Fig. 4E,F) and layer 5 (Fig. 4G) are even more extreme examples of this mismatch between onset transients and opposite-phase responses. Cells in Figure 4E–G have very transient step responses at the preferred spatial phase but have little or no opposite-phase responses. In such cells, the TR and PR are quite dissimilar, and a dynamic linear filter followed by a half-wave rectifier is not an adequate model for the responses.
Figure 4 also contains examples of V1 cell responses that were sustained through most of the 100 ms step. The cell in Figure 4B, recorded in layer 4C, is one example of a cell with a sustained step response. Its response decayed only slightly while the stimulus was still on the screen (TR, 0.38), and there was very little response from the opposite spatial phase at any time (PR, 0.10). The cell fired at a steady background rate, but no obvious suppression is visible at either the preferred or opposite spatial phases. The examples in Figure 4, D and H, likewise show sustained responses from cells selected from layer 2/3 and layer 6. These latter two cells have no background spiking. We do not mean to suggest that cells in Figure 4, B, D, and H, would have completely sustained responses with a stimulus of any duration (it may be that a longer-lasting stimulus would uncover slow decay rates or would activate additional suppressive mechanisms). But as measured, the responses in Figure 4, B, D, and H, are consistent with a monophasic impulse response and linear temporal integration. We note that the examples in Figure 4, B, D, and H, are among the most sustained of all simple cells we studied. Such sustained cells were found much less often than transient cells, as can be seen in the laminar analysis below. Some previous reports of cells with sensitivity to a single contrast polarity found only sustained responses, similar to cells in Figure 4, B, D, and H (Kagan et al., 2002).
Laminar analysis of the step responses
Next, we offer an analysis of step responses across the V1 population we studied. To find out the typical strength of the response at the opposite phase, we measured the PR in a population of 51 simple cells (Fig. 5). Clearly the opposite-phase response varies in a layer-specific way. The layer 4 cells have a broad range of PRs, with especially large opposite-phase responses found in or near layer 4B. Small and moderate opposite-phase responses are found in layers 4B and 4C as well, and the median PR for layer 4 is 0.50. However, there is a striking pattern in the distribution of opposite-phase responses in layer 2/3. In contrast with layer 4, cells in layer 2/3 mostly have no response at the opposite spatial phase (median PR, 0.00). In one case in layer 2/3 (PR, 1.2) and one in layer 4 (PR, 1.8), the rebound response was larger than the onset response at any phase; this is usually called a “lagged” response (cf. Saul et al., 2005). Excluding the one lagged cell in layer 2/3, the two largest PR values in layer 2/3 were 0.23 and 0.15; all of the remaining values were <0.1. In layers 5 and 6, the population appears not to be as restricted as in layer 2/3 (one-third of these cells have a PR of at least 0.2). We next consider the actual measured degree of the TR of the preferred-phase responses.
Figure 6 presents the TR statistic calculated for the same 51 simple cells as shown in Figure 5. The histograms in Figure 6 show that macaque V1 contains a broad collection of sustained and transient cells, and the TR values vary substantially less by layer than do the PRs. The overall median TR is 0.67, but values range from 0.15 (highly sustained) to 1 (completely transient during the 100 ms stimulus). Layer 4 cells in this collection (Fig. 6, right, middle row) are slightly more transient (median TR, 0.76) than the overall average but generally possess a broad distribution of values, which is unsurprising given that both magnocellular (4Cα) and parvocellular (4Cβ) input layers are included. Cells in layers 2, 3, 5, and 6 likewise span the range from transient to sustained.
Comparison between TR and PR
For layer 2/3 particularly, the measured TR values are strikingly incongruent with the linear prediction based on the PR. Comparing Figures 5 and 6, it can be seen that nearly all layer 2/3 cells have a tiny opposite-phase response (Fig. 5), but that these same cells do not all have correspondingly small decays at the preferred phase (Fig. 6); in fact, more than half have TR values >0.5. Recall that in the linear model, a small opposite spatial phase response (PR) corresponds to a small decay in the preferred-phase response (TR). Many of these V1 neurons do not fit that expectation.
To judge whether the step stimuli produce responses that are consistent with linear temporal integration on a cell-by-cell basis, the PR and TR for the step responses are compared directly in Figure 7. In layer 2/3 (Fig. 7A), the values of the TR and PR are poorly matched, indicating that a strong nonlinearity resides in those layers. Although there is a broad range of TR values as seen previously in Figure 6, there is no correspondence with the low PR values measured from the opposite spatial phase. Some suppressive mechanism appears to be keeping these neurons from spiking when stimulated by the opposite spatial phase stimulus.
Figure 7B shows that many layer 4 neurons are not far off from the prediction of a model that consists of a dynamic linear filter followed by half-wave rectification. Note that there are cells at both ends of the scale: there are some very transient cells that have a correspondingly strong response at the opposite spatial phase (both the TR and PR are near 1), whereas other cells are fairly well matched in the mid-range and low end of these measures. Thus, many simple cells in layer 4 seem to come close to a purely linear response by this measure, although there is an overall bias toward having a higher TR than predicted by the opposite-phase response. Four layer 4 neurons have more obviously nonlinear responses, with a low PR but a high TR, like the layer 2/3 neurons. Some neurons in layers 5 and 6 (Fig. 7C) have a poorly matched PR and TR, whereas others are approximately consistent with linear behavior; we studied too few neurons from these layers to make a strong conclusion. Note that two cells in layers 5 and 6 that have high TR values also have fairly high PRs, a phenomenon that was not observed in layer 2/3.
Effect of other static nonlinearities
The PR equals the TR value for neurons that are well modeled by a dynamic linear filter, followed by a zero-threshold half-wave rectifier, the model shown in the Appendix. Perhaps a different form of static nonlinearity would be better able to explain the discrepancies between the PR and TR shown in Figure 7. In most neurons, the TR was larger than the PR. A nonzero threshold or accelerating function could exaggerate the higher response rate at the preferred spatial phase over the opposite and simultaneously make the preferred-phase response more transient. As we argue below, a higher threshold is a reasonable explanation for cases in which the TR value is slightly greater than the PR, but it is an unlikely cause for large discrepancies.
We examine the effect of a nonzero threshold in Figure 8. Consider first a neuron with a linear output that is quite transient (Fig. 8A). When the static nonlinearity is a zero-threshold half-wave rectifier, the PR and TR are equal (PR, 0.71; TR, 0.71). With a small increase in threshold, the measured TR goes up (TR, 0.91) and the PR decreases somewhat (PR, 0.63). So, the layer 4 cells with large TR values may have been shifted above the 1:1 line (Fig. 7B) by a nonzero threshold. At the other end, consider a neuron with only a slightly transient linear output (Fig. 8B). A small increase in threshold has little effect on the TR value but decreases the PR. This would cause a small shift to the left of the 1:1 line for the neurons with low TRs (Fig. 7). Small increases in threshold can explain the small discrepancies between TR values and the PR observed in many neurons in layers 4–6 (Fig. 7B,C).
Larger discrepancies between the PR and TR are not so plausibly explained with higher thresholds. The layer 2/3 opposite-phase responses are nearly all zero, whereas their TR values vary widely (Fig. 7A). Figure 8A shows that for linear outputs that are sufficiently transient, a threshold that is high enough to eliminate the opposite-phase response (right) will also lead to a completely transient preferred-phase response (left). Only one neuron, a layer 2/3 cell, in our sample fit this combination of a PR of 0 and a TR of 1 (Fig. 7A). Starting instead with a linear filter that produces a less transient response (Fig. 8B), the response at the opposite spatial phase can be eliminated with a fairly low threshold. But the TR value will also be small unless threshold is set very high. To summarize, to create the large discrepancies between PR and TR values observed particularly in layer 2/3, an LN model requires specific outputs from the linear filter (weak decay) as well as very high thresholds. Such high thresholds nearly eliminate the preferred-phase response as well. Any other form of static nonlinearity that is capable of suppressing the opposite-phase responses will have these same issues. For these reasons, we believe that a static nonlinearity is not a compelling explanation for the layer 2/3 results. One possibility is that layer 2/3 simple cells are more susceptible than most layer 4 neurons to a delayed suppression that affects the off responses.
Responses to rapid random sinusoids
Because the step responses reflect a dynamic nonlinearity, perhaps a delayed suppression, it is important to use other stimulus sequences with different time courses to determine whether they uncover the same nonlinearity. The signature of the nonlinearity that we observed in the step responses was a smaller-than-expected response to the opposite-to-preferred-phase stimuli. Excitatory responses to opposite-phase stimuli were uniformly lacking in layer 2/3 neurons, whereas many layer 4B and 4C neurons responded robustly to both preferred- and opposite-phase stimuli. To see whether this same pattern of results would hold for a stimulus sequence with a different time course, we presented Hartley stimuli, a steady stream of briefly displayed sinusoidal gratings.
We used the Hartley stimuli (see Materials and Methods) to measure preferred- and opposite-phase responses in a sequence that was very different from the time course of the 100 ms step stimuli. The Hartley stimuli consist of orthogonal sinusoids of many orientations, spatial frequencies, and spatial phases, presented continuously at high contrast. This stimulus sequence can be used to estimate the first-order spatiotemporal receptive fields of simple cells using reverse correlation (Ringach et al., 1997). Here, its primary purpose is to provide a sequence that includes near-optimal sinusoids in a sequence that is very different from the 100 ms step stimuli. Although the step stimuli are presented between periods of the mean gray, the Hartley stimulus sequence maintains a high-contrast stimulus on the screen at all times. Many non-optimal stimuli are included in the sequence, but because the individual sinusoids are each good at driving some V1 neurons, the local network is presumably more continuously active than with the step stimuli.
We analyzed the average responses to near-optimal stimuli in the Hartley sequence, as shown in Figure 9. Figure 9A shows the responses of a layer 4Cα simple cell to a single grating taken from the Hartley experiment. The responses are aligned at the onset of the grating every time it occurs in the long random sequence. The top half of the plot shows the responses at the preferred spatial phase, and an elevation in the spike rate is evident shortly after the stimulus appears. At the opposite spatial phase, the spike rate also increases, although at a longer delay than for the preferred-phase stimulus. The stimulus-triggered average time courses for the two spatial phases, smoothed with a 20 ms boxcar, are shown at the bottom of the plot. Both preferred- and opposite-phase stimuli produce elevations in the spike rate, as was the case in the most of the layer 4 100 ms step responses.
If the dynamic nonlinearity uncovered by the 100 ms step stimuli is active in the responses to the Hartley stimuli, its effects should be visible in the responses to individual components of the Hartley stimulus set. Figure 9B shows the responses of a layer 2/3 simple cell aligned to the appearance of a single grating from the Hartley experiment. At the preferred spatial phase, the spike rate increases shortly after the stimulus appears; at the opposite phase, there is no increase above the average firing rate. This pattern is similar to what was seen in the step responses particularly in layer 2/3. Figure 9C shows the responses from a complex cell for comparison. As expected, responses are not sensitive to the spatial phase of the stimulus.
Responses to near-optimal gratings in the Hartley sequence were averaged, and opposite phases were subtracted for quantitative comparison to the step responses. In Figure 10A, the responses of a simple cell to the 10 best gratings have been averaged to show the typical response to stimuli that are close to the preferred spatial frequency and orientation. These time courses were the 10 smoothed stimulus-triggered averages that had the highest response peaks. Opposite-phase responses have been subtracted from the preferred-phase responses, following the same format as for the step stimuli. This neuron is the same 4Cα neuron for which the response to a single Hartley component was shown (Fig. 9A). Like the response in Figure 9A, this average time course is biphasic, reflecting the vigorous response to both preferred- and opposite-phase stimuli at near-optimal spatial frequencies and orientations. The RR for this neuron, defined as the area of the negative peak divided by the area of the positive peak, is 0.50. This value falls within the range reported for similar measures in the lateral geniculate nucleus (LGN) (Alonso et al., 2001; Reid and Shapley, 2002), although it turns out to be atypically high for the cortical values that we measured. In macaque LGN, the average RR for parvocellular neurons is 0.4–0.5; for magnocellular neurons, the average value is 1–1.1 (Reid and Shapley, 2002). Figure 10B shows the first-order spatiotemporal receptive field estimated from the responses to the entire Hartley stimulus sequence. The full two-dimensional spatial response is shown at 10 sample times, with the response peak occurring in the third frame at 56 ms. The frames are ordered in the forward direction, so the times indicate the number of milliseconds passed since stimulus onset. After the peak at 56 ms, the initial response decays and then reverses polarity and reaches a new peak between 96 and 106 ms, as expected from the average time course of the largest components (Fig. 10A).
The layer 2/3 simple cell from Figure 9B shows a complete lack of opposite-phase response in its average time course from near-optimal stimuli. Figure 11A shows the average time course of the top 10 components of the Hartley experiment for the neuron from Figure 9B. The time course is monophasic, meaning that for each of the stimuli included in the average, there is no response to the opposite-phase stimulus. Because the response lacks a negative dip, the RR is low (RR, 0). The estimate of the first-order spatiotemporal receptive field, shown in Figure 11B, likewise exhibits a monophasic temporal response. The activity of this cell builds to a peak at 71 ms and tapers off without ever reversing polarity. A completely monophasic response of this sort is almost never seen in response to reverse correlation stimuli in the LGN; even the most extreme neurons in cat and macaque LGN still have negative rebounds that are on the order of 20% of the area of the positive lobe (Alonso et al., 2001; Reid and Shapley, 2002). But monophasic impulse responses are observed in input layers in cat and macaque V1 (Alonso et al., 2001; Saul et al., 2005). Note that as with the step stimuli, the lack of an excitatory response from opposite-phase stimuli does not by itself indicate the presence of a nonlinearity. However, because the results of the step experiments indicate that most layer 2/3 neurons do have a dynamic nonlinear contribution to their responses, it suggests that the first-order receptive field shown here is an incomplete characterization of the properties of this neuron, and furthermore, if the first-order characteristics were measured with some other stimulus set, such as drifting grating tuning curves or a sparse noise stimulus set, very different first-order receptive fields might be obtained.
RR by layer
Across the simple cell run with the Hartley stimuli, it is clear that RRs are small, particularly in layer 2/3. The RRs from 33 simple cells are shown in Figure 12. It is immediately apparent from this figure that the RRs are all biased toward low values. The median RR in layer 4 cells (RR, 0.24) is considerably lower than the reported LGN values cited above. This reduction from the LGN to V1 agrees qualitatively with the data in cat, in which the RR drops from an average LGN value of 0.8–1 to a V1 input layer average of 0.4–0.6 (Alonso et al., 2001). Although the average RR in layer 4 cells is already lower than LGN levels, what is really striking about these results is that cells outside of layer 4 are particularly strongly skewed toward the monophasic end of the scale. In layer 2/3, almost all neurons have nearly monophasic responses: all cells have rebounds that are <20% the size of the positive lobes (Fig. 12), meaning that cells in these layers are more monophasic than any of the most extreme LGN neurons. Because of differing stimulus sets, contrast levels, frame rates, and smoothing windows, one should be cautious in comparing the impulse responses of previous studies of the LGN to our results in V1. But our V1 results by themselves reveal a substantial shift in temporal properties from layer 4 to upper layers in the cortex.
Suppression of the opposite-phase response
The low RR values seen with the Hartley stimuli in layer 2/3 mirror the low PRs obtained from the 100 ms step stimuli (Fig. 5). Both types of stimuli fail to evoke excitatory opposite-phase responses in layer 2/3, whereas neurons in other layers can generate opposite-phase responses of varying magnitudes (Figs. 5, 12). It has been noted in reference to sparse-noise stimuli that unequal responses to opposite-polarity stimuli would create deviations from linearity (Palmer et al., 1991; DeAngelis et al., 1993; McLean et al., 1994). DeAngelis et al. (1993) observed in some neurons in cat V1 a discrepancy between drifting grating temporal frequency tuning and receptive fields measured using sparse noise. They proposed that the mismatch was a dynamic nonlinearity that might have reflected unequal activation by bright and dark stimuli. Here, we have directly observed that layer 2/3 neurons typically produce spikes to stimuli of only one contrast polarity, and with the 100 ms step responses, we have shown that this single-polarity response reflects a dynamic nonlinearity.
Discussion
We have shown here that simple cells in layer 2/3 typically respond strongly at one spatial phase of an optimal sinusoidal grating, but not at the opposite spatial phase. For cells with sustained responses (only 8% of the population we studied), this result is the linear expectation. However, for most V1 cells with transient step responses, the absence of responses to the opposite spatial phase implies a nonlinearity. In layer 2/3, the monophasic time courses of the first-order spatiotemporal kernel reflect this same nonlinear response suppression at opposite phases. Thus, in V1, simple cells in layer 2/3 and some in layer 4 possess spatial phase sensitivity but not linearity.
Because high values of the F1/F0 modulation ratio are associated with linearity and are used to define simple cells, it may seem paradoxical that simple cells can exhibit nonlinear behavior. But drifting gratings, which are used to calculate the F1/F0 modulation ratio, measure a steady-state response. Even if a neuron possesses time-varying nonlinearities, if its responses vary with spatial phase, it will produce high F1/F0 values. The dynamic nonlinearity observed in our results actually enhances spatial phase sensitivity in instantaneous and steady-state responses. Our results emphasize the point that the F1/F0 ratio only indirectly gauges linearity via spatial and temporal phase sensitivity. Although linear devices generate high F1/F0 ratios, not all devices with high F1/F0 ratios are linear.
Contrast polarity and perception
Sensitivity to contrast polarity is an important early step in building the perception of surface brightness and color. Consider, for example, a neuron that transiently responds to one spatial phase but not its opposite (Fig. 13). The cell is signaling, in this instance, “dark left–bright right” and not the reverse, because there is no response to the opposite spatial phase. Thus, this cell can take part in encoding a bright surface that lies to its right. If this neuron were linear, it would respond to both contrast polarities. Part of the off response to the opposite-phase stimulus would get integrated into the response to the next fixation, making the relationship between spatial phase and firing rate ambiguous. The neuron could signal the border formed by the local contrast change, but it would not provide information that could be used for surface perception unless it made use of a temporal code (e.g., Aronov et al., 2003). The observed off-response suppression operates without any need for a high-level inhibitory signal linked to saccades.
The usefulness of contrast-polarity sensitivity is not restricted to receptive fields with only two subunits. One theory of how surface features are reliably extracted and represented in early vision [MIRAGE model (Watt and Morgan, 1985; Morgan and Watt, 1997)] requires separate contrast-polarity channels for all spatial frequency filters. To achieve a high degree of localization in the presence of noise, this model divides the rectified responses from individual spatial frequency filters into positive and negative (opposite spatial phase) channels; summing the filters within the polarity-sensitive channels helps explain human psychophysical results on the accuracy of spatial localization (Watt and Morgan, 1985). Our analysis of the nonlinear responses of layer 2/3 cells suggests that they could serve as elements in such a model (Morgan and Watt, 1997).
Relationship to previous work on cells responsive to a single contrast polarity
Previous reports of cells that respond to a single contrast polarity have been inconsistent regarding their prevalence, laminar distribution, and spatial and temporal properties. Some studies used only bright stimuli and observed cells that produced excitation or suppression but not both (Toyama and Takeda, 1974; Gilbert, 1977; Palmer and Davis, 1981), but without responses to dark stimuli, it is impossible to know whether these cells were sensitive to only one polarity. In other studies that used bright and dark stimuli to document single-polarity responses, only sustained responses were observed. Such a result could be explained if these were linear cells with monophasic temporal impulse responses [T cells (Schiller et al., 1976)] (Kagan et al., 2002). But nonlinear response patterns have been reported in cells with only a single receptive field subunit [S1 cells (Schiller et al., 1976)] (Bishop et al., 1971; Hirsch et al., 2002; Martinez et al., 2005). One group previously addressed the ability of V1 and V2 neurons to use edge attributes to represent surface properties and found a nonlinear response to contrast polarity from many upper-layer V1 neurons as we observed (Zhou et al., 2000; Friedman et al., 2003). Some of the inconsistency in previous reports could stem from the challenges in using single bars to identify receptive field subunits (Schiller et al., 1976; Tolhurst and Dean, 1987) or from weak responses of V1 cells to flashed spots (Martinez et al., 2005). We used optimal flashed sinusoidal gratings because they drive most cells (Müller et al., 2001; Albrecht et al., 2002; Aronov et al., 2003) and because gratings can be used to map the first-order component of the receptive field efficiently (Ringach et al., 1997). A previous study by Aronov et al. (2003; their Fig. 2A) used flashed gratings and reported neurons that were unresponsive to opposite spatial phases like the layer 2/3 neurons that we studied.
Laminar data about simple/complex properties are available from Hirsch et al. (2002), Martinez et al. (2005), and Schiller et al. (1976). Our data agree more with Hirsch et al. (2002) and Martinez et al. (2005) than with Schiller et al. (1976) on the laminar localization of nonlinear responses to contrast polarity. Schiller et al. (1976) found cells responsive to a single contrast polarity in all layers of monkey V1. They reported that single contrast-polarity cells had only a single subunit in their receptive field, although the authors acknowledged that it would have been difficult to observe multiple single-polarity subunits with their methods. Martinez et al. (2005) used flashed squares to map cat V1 receptive fields while recording intracellularly and found that nearly all cells in layers 2, 3, 5, and lower 6 had either polarity-insensitive responses (“push–push”) or single-polarity responses (“push–null”). Their single-polarity cells always had only a single receptive field subunit. Although we agree with Martinez et al. (2005) that single polarity cells are more prevalent in layer 2/3 than in layer 4, the single polarity cells we found in macaque V1 usually had first-order spatial receptive fields with multiple subregions as in Figure 11B. Zhou et al. (2000) and Friedman et al. (2003) documented that contrast-polarity-specific responses are common in macaque layer 2/3 cells. They presented layer 2/3 cells with color- and luminance-modulated borders and reported a median value of 0.2 for the ratio of integrated off to integrated on responses (Friedman et al., 2003). This is comparable to the degree of opposite-phase suppression that we observed. However, in V1, Zhou et al. (2000) and Friedman et al. (2003) only studied layer 2/3 cells, so one cannot draw conclusions about interlaminar differences from their data.
We believe that it is important to document the presence of multiple receptive field subregions in the macaque layer 2/3 cells we studied. The number of zero-crossings of the first-order response (as in Fig. 11B) at the peak response time is a measure of the multiplicity of subregions, and the frequency distribution of this quantity is what is shown in Figure 14. Figure 14 indicates that layer 2/3 simple cells are at least as spatially structured as simple cells in layer 4C. But the responses of layer 2/3 neurons are more typically nonlinear than layer 4 responses. The discrepancy between our receptive field measurements and other studies that found only single subregions in contrast-polarity-sensitive neurons (Bishop et al., 1971; Schiller et al., 1976; Hirsch et al., 2002; Kagan et al., 2002; Martinez et al., 2005) illustrates an important point about nonlinear responses: neurons with dynamic nonlinearities in their responses may have different-looking first-order receptive fields when different stimulus sets are used (Victor et al., 2006). We believe that the first-order receptive field structure that we measured here is different from previous studies, not because we recorded from a new class of neurons but because we used a different stimulus set to calculate first-order responses from nonlinear neurons. First-order responses even from simple cells need to be interpreted cautiously if nonlinear responses are common in V1 (David et al., 2004; Rust et al., 2005; Victor et al., 2006). It seems likely that even in simple cells, second-order (Jacobson et al., 1993; Rust et al., 2005) and higher-order interactions will need to be analyzed to fully characterize V1 neurons, particularly in layer 2/3.
Nonlinear mechanism
Because the opposite phase is strongly represented in the LGN signal, and to a lesser degree in layer 4 cells, its near absence outside of layer 4 requires a mechanistic explanation. Perhaps a single mechanism suppresses the opposite spatial phase response in both the Hartley and the step experiments. Many complex cells respond to all spatial phases at both the onset and offset of a 100 ms step and also respond at all spatial phases to components of the Hartley set (Fig. 9C). Thus, inhibitory complex cells with projections to neighboring simple cells could be responsible for the lack of response at opposite spatial phases of the simple cells. Figure 15 illustrates two example time courses of complex cell step responses, together with simple cells from the matching layers. Complex cells could be a source for intracortical inhibition to neighboring simple cells at both stimulus onset and offset. Layer 4 cells receiving strong direct feedforward LGN input would have sufficient excitation to overcome the inhibition at both preferred and opposite phases. In layer 2/3, where the percentage of complex cells is higher than in layer 4C, complex cell-driven intracortical inhibition might be strong enough to overcome most responses at offset without eliminating onset responses. The details of response dynamics in different cell types and in different layers may be crucial for determining the amount of nonlinear suppression from such a mechanism.
Appendix
We show here how the linearity of a neuron can be tested by evaluating its responses to briefly flashed opposite-polarity stimuli (e.g., 100 ms presentations of two sinusoidal gratings that have spatial phases 180° apart). Figure 16 shows the responses of a hypothetical linear neuron to a temporal impulse and to a temporal step. Because spike rates are rectified (only rates above zero can be measured), the illustrated responses were obtained by taking the response from one contrast polarity and subtracting the response to the opposite contrast polarity. Linear devices respond to opposite-signed stimuli with opposite-signed responses, so this subtraction recreates the unrectified original response. The rectification threshold is taken to be zero. For other thresholds or other static nonlinearities, the results discussed here are modified (see Fig. 8 and the accompanying text in Results).
The example impulse response in Figure 16 is biphasic; there are two quantifiable features of the step response that should reflect this if the neuron being tested is linear. We measure the biphasicness of the impulse response with the RR, which we define to be the area of negative lobe of the impulse response divided by the area of its positive lobe. In Figure 16, regions R1, R2, and R3 in the impulse response are all the same duration as the step that produced the step response. Because the step response can be formed by convolving the impulse response with this step, the heights P1, P2, and P3 are equal to the impulse response summed over regions R1, R2, and R3. If the step is longer than the impulse response, then the impulse response summed over regions R1 and R3 gives the areas under the positive and negative lobes of the response. Thus, the RR is equal to the sum over R3 divided by the sum over R1. There are two ways to use the step response to calculate the RR. The first is the PR, which is given by the following: This is immediately seen to be equal to the RR because P3 is the sum over R3 and P1 is the sum over R1. Our second measure, TR, is given by the following: If the step duration is at least as long as the impulse response, then the sum over R2 equals the sum over R1 minus the sum over R3. Then, P2 = P1 − P3 and P3 = P1 − P2. In the definition of PR, P3 can be replaced by P1 − P2. Thus, PR = TR. If these two quantities are measured from the step response of a real neuron and are found to be equal, then the responses of that neuron are consistent with linear temporal integration.
We used the Hartley experiments to measure a temporal impulse response. If the impulse response is longer in duration than the step stimulus, then these estimates could be corrupted. However, the responses to the Hartley stimuli generally lasted <100 ms or were very small in amplitude beyond the first 100 ms of their response; that is why the duration of the step was chosen to be 100 ms.
Footnotes
-
This work was supported by National Eye Institute Grant R01 EY01472 and Core Grant EY-P031-13079. This work was included in part in a PhD thesis submitted by P.E.W. in 2006. We thank M. Hawken, J. A. Henrie, S. Joshi, and D.-J. Xing for helpful comments on a previous version of this manuscript and for help in collecting the physiological data and D. Tranchina and J. Victor for helpful comments on a previous version of this work. We also thank F. Mechler for code for Hartigan's dip test.
- Correspondence should be addressed to Dr. Patrick E. Williams, New York University Center for Neural Science, 4 Washington Place, Room 809, New York, NY 10003. patrickw{at}cns.nyu.edu