Abstract
How do cortical responses to local image elements combine to form a spatial pattern of population activity in primate V1? Here, we used voltage-sensitive dye imaging, which measures summed membrane potential activity, to examine the rules that govern lateral interactions between the representations of two small local-oriented elements in macaque (Macaca mulatta) V1. We find strong subadditive and mostly orientation-independent interactions for nearby elements [2–4 mm interelement cortical distance (IED)] that gradually become linear at larger separations (>6 mm IED). These results are consistent with a population gain control model describing nonlinear V1 population responses to single oriented elements. However, because of the membrane potential-to-spiking accelerating nonlinearity, the model predicts supra-additive lateral interactions of spiking responses for intermediate separations at a range of locations between the two elements, consistent with some prior facilitatory effects observed in electrophysiology and psychophysics. Overall, our results suggest that population-level lateral interactions in V1 are primarily explained by a simple orientation-independent contrast gain control mechanism.
SIGNIFICANCE STATEMENT Interactions between representations of simple visual elements such as oriented edges in primary visual cortex (V1) are thought to contribute to our ability to easily integrate contours and segment surfaces, but the mechanisms that govern these interactions are primarily unknown. Our study provides novel evidence that lateral interactions at the population level are governed by a simple contrast gain–control mechanism, and we show how this divisive gain–control mechanism can give rise to apparently facilitatory spiking responses.
- contrast gain control
- lateral interactions
- population coding
- striate cortex
- visual cortex
- voltage-sensitive dye imaging
Introduction
Primary visual cortex (V1) is organized topographically; neurons with similar tuning properties cluster together, forming several overlaid maps across the cortical surface. At the large (millimeter) scale, V1 contains a map of position in visual space (retinotopic map; Hubel and Wiesel, 1974; Van Essen et al., 1984; Tootell et al., 1988; Adams and Horton, 2003). Because the tuning properties of neighboring V1 neurons overlap extensively, a small isolated stimulus produces activity spreading over several square millimeters (Hubel and Wiesel, 1974; Van Essen et al., 1984; Grinvald et al., 1994). This spread [termed the “cortical point image” (CPI)] has a full width (at 10% of the peak response) of up to 8 mm for membrane potential and up to 4 mm for spiking activity (Palmer et al., 2012). Natural scenes usually contain many elements at nearby spatial locations. Thus, when viewing natural scenes, V1 responses to the component elements overlap substantially. This raises a fundamental question: How do cortical responses to local image elements combine to form the spatial pattern of population activity in primate V1?
In previous studies using voltage-sensitive dye imaging (VSDI), we measured V1 population responses to single oriented elements (Sit et al., 2009). We discovered that V1 responses to single elements are predicted by a simple computational model, termed “population gain control” (PGC). Our goal here was to determine whether this model also predicts V1 population responses to multiple local elements. Specifically, the PGC model assumes that local gain is controlled by a normalization pool that is independent of the orientation and direction of a flanking element relative to a central element, but this assumption has not been fully tested. For example, it is possible that flanking elements collinear or parallel with a central element make a larger contribution to the normalization pool, thereby leading to stronger subadditivity than equally distant elements orthogonal to the central element. A central goal of the current study was to test this possibility.
Although gain control mechanisms may explain some interactions in V1, facilitatory mechanisms may also be at play. Perception of a visual element can be profoundly modulated by nearby visual elements. Under some circumstances, detection of a small contour element is facilitated by nearby flanking elements (Polat and Sagi, 1993, 1994). Similarly, there is behavioral evidence of grouping mechanisms that integrate local contour elements into extended contours (Field et al., 1993). The first stage of this process may involve local mechanisms that facilitate the representations of two visual elements if they are likely to belong to the same contour in natural scenes (Field et al., 1993). This facilitatory associative mechanism may be partially implemented in V1 and mediated by its long-range horizontal connections (Gilbert and Wiesel, 1979; Ts'o and Gilbert, 1988; Hirsch and Gilbert, 1991) and/or by feedback connections into V1 (Angelucci et al., 2002). Alternatively, facilitatory mechanisms underlying perceptual grouping may be implemented downstream of V1, and V1 retinotopic-scale interactions may be fully explained by the PGC model.
Studies of single V1 neurons reveal diverse effects of flanking stimuli on spiking responses to a central element. Some studies using large-surround stimuli show clear suppression consistent with gain control mechanisms (Sceniak et al., 1999; Cavanaugh et al., 2002; Levitt and Lund, 2002), whereas others show configuration-specific facilitation consistent with an associative grouping mechanism (Maffei and Fiorentini, 1977; Kapadia et al., 1995; Polat et al., 1998). Given the diversity of these effects, it is unclear how flanking stimuli will affect neural responses at the population level. Although several previous optical imaging experiments demonstrate subadditive lateral effects at the population level in V1 (Kinoshita et al., 2009; Meirovithz et al., 2010; Reynaud et al., 2012), it is possible that suppressive effects dominate at the population level for some flanker positions and orientations while facilitatory effects dominate for others.
We addressed these questions using VSDI to measure population responses from V1 of three fixating monkeys while presenting individual and pairs of oriented elements.
Materials and Methods
The results reported here are based on methods that have been described in detail previously (Chen et al., 2006, 2008). Here, we focus on details that are of specific relevance to the current study. All procedures have been approved by the University of Texas Institutional Animal Care and Use Committee and conform to NIH standards.
Experimental design and statistical analysis
The experiments described below incorporated a within-subjects design, with different stimulus configurations presented in pseudorandom sequences of trials within individual experimental sessions. In characterizing our results, we do not use traditional null-hypothesis significance tests. Instead, we report point and interval estimates of the relevant response parameters (population response amplitudes and facilitation indices). Details of these analyses are described below (see Analysis of imaging data) and in Results.
Behavioral task and stimuli
Three male adult monkeys (Macaca mulatta) were trained to maintain fixation while a small stationary Gabor, or a combination of Gabors, was presented on a uniform gray background. Each trial began when the monkey fixated on a small spot of light (0.1°) on a CRT display. After an initial fixation, the sine-phase Gabor stimuli (σ = 0.167°, f = 2 cpd) were flashed at 5 Hz for 1000 ms (60 ms on, 140 ms off) at visual eccentricities between 2.40 and 3.82°. Throughout the trial, the monkey was required to maintain gaze within a small window (<2° full width) around the fixation point to obtain a reward. The contrast of the flanking stimulus element, when presented, was 100%, whereas the central stimulus element appeared at 10 or 100% contrast. Stimuli were displayed at a mean luminance of 30 cd/m2, a resolution of 1024 × 768 pixels (subtending 20.5 × 15.4° of visual angle), a viewing distance of 108 cm, a 30-bit color depth, and a refresh rate of 100 Hz. In each session, trials representing the different visual stimulus conditions, including blank trials, were randomly interleaved. On average, the monkeys successfully completed 10 trials per condition per session.
Analysis of imaging data
Imaging data were collected at 110 Hz at a resolution of 512 × 512 pixels. The size of each pixel was ∼32 × 32 μm2. Our basic analysis was divided into four steps. First, we normalized the responses at each pixel by the average fluorescence at that pixel across all trials and frames. Second, we removed from each pixel a linear trend estimated on the basis of the response in the 100 ms interval before stimulus onset for each trial. Third, we removed trials with aberrant VSDI responses (generally fewer than 1% of the trials). Finally, we subtracted the average response to the blank condition from the stimulus-present conditions. In a supplementary analysis not presented here, we checked for fixation effects by additionally removing trials in which the monkey's gaze deviated >0.5° from the fixation point. However, we found that this manipulation did not meaningfully affect our results.
After this analysis, the spatial properties of the responses in each trial were determined. To preserve our ability to characterize the amplitude of each response independently of its temporal properties (e.g., response onset and offset latency, rising edge slope, falling edge slope), we fit a parametric temporal impulse response model to the response time course measured at each pixel. The temporal response to a single flashed stimulus rpulse(t) is well described by a scaled gamma distribution function:
where A is the response amplitude, t0 is the response onset latency, r0 is the baseline response, and γ(x; α, β) represents the (un-normalized) gamma distribution function with shape parameter α, rate parameter β, and a maximum value of 1.
Measured responses to the five-pulse sequences were well fit by a model that used a common impulse function for each of the five flashes and combined the individual pulse responses using a max rule (Fig. 1c), such as follows:
For each experiment, we computed the average spatial response amplitude Ā at each pixel by fitting Equation 3 to the average time course at that pixel. We computed the variance of this estimate using a delete-1 jackknife procedure (Efron and Stein, 1981). For each condition, we estimated mean spatial amplitudes for n subsamples of the trials in which each sample was formed by omitting one of the observations. For example, the ith mean was computed as follows:
The jackknife estimate for the variance of the mean amplitude was then computed from these means as follows:
Computation of pooled responses.
For the summary spatial response plots in Figures 4 and 5, we pooled the response amplitudes by computing a weighted average across all animals (n = 3) and experiments (n = 14; five for Monkey 1, six for Monkey 2, and three for Monkey 3). The weighting served two purposes. First, the experiments were of variable quality and we wanted to give greater weight to the more reliable experiments. Second, the cortical magnification factor (CMF) varied across monkeys and across stimulus locations, so that the cortical distance associated with a particular visual interelement cortical distance (IED) also varied across experiments. Because the size of the CPI is roughly constant across V1 (Palmer et al., 2012), we expected the interactions among visual elements to depend primarily on the cortical distance between their elicited responses (the cortical IED) rather than on their distance in visual coordinates (the visual IED). Our analyses, therefore, focused primarily on characterizing the interactions as a function of the cortical IED, which is equal to the separation in the visual field times the CMF. In computing the pooled responses, we assigned two weights to each experiment and condition, one (u) representing the reliability of the amplitude estimate for that experiment and one (w) representing an interpolation weight for the desired cortical IED. The pooled response amplitude for a particular location x was computed as follows:
where ε represents the cortical IED for which we want to interpolate the responses and N is the combined number of different experiments and visual IED conditions. The reliability weights were computed as follows:
where σAi is the SE associated with the amplitude estimate for experiment/condition i. This weight has the effect of reducing the contribution of variable (unreliable) experiments to the pooled amplitude estimate. The average value of these reliability weights varied across monkeys, with responses from Monkey 2 (across six experiments) and Monkey 3 (across three experiments) receiving greater weight than those from Monkey 1 (across five experiments). As a result of the differing reliability weights and numbers of experiments, each monkey contributed differently to the pooled responses (Monkey 1, 23%; Monkey 2, 48%; Monkey 3, 29%). However, supplementary analyses (data not shown) confirmed that response patterns did not differ meaningfully across individual monkeys.
The interpolation weights were Gaussian (i.e., a Gaussian radial basis function):
where εi represents the cortical IED for the ith experiment/condition and σRBF = 1 mm is the SD of the radial basis function. This weight has the effect of attenuating the contributions of experiments/conditions as their cortical IEDs deviate from the desired cortical IED ε. We computed variances for the pooled amplitude estimates by linearly combining the jackknife variance estimates from individual experiments obtained in Equation 5 using weights identical to those in Equation 6, which we then used to compute the symmetric 95% confidence bounds shown in Figures 4 and 5, a and b.
Finally, we found that the resulting spatial responses had a small positive baseline response equal, on average, to ∼8% of the maximum response. We removed this baseline when plotting the summary responses (Figs. 4, 5) and when computing the corresponding facilitation indices. This had the effect of raising the facilitation indices slightly, making the combined responses appear somewhat less subadditive than when the baseline was included.
Computing pooled facilitation indices.
Average facilitation indices were computed by applying Equation 12 to the pooled population response amplitudes. We obtained interval estimates around the average values using a simple parametric bootstrap procedure (Efron, 1985). For each stimulus configuration, we created 10,000 bootstrap replicates of the spatial response to the central, flanker, and combined stimuli by sampling from normal distributions with average amplitudes and variances computed as described above (see Computation of pooled responses). We then computed a distribution of facilitation indices by applying Equation 12 to each of the bootstrapped replicates. Finally, we obtained 95% confidence intervals by determining the facilitation indices corresponding to the 2.5 and 97.5 percentiles of this distribution.
PGC model definition and simulation
The structure of the population gain control model is identical to that defined by Sit et al. (2009). Briefly, the model consists of an input layer representing the raw visual input; a second layer representing the local spatial summation and gain control arising in the retina, LGN, and the input layers of V1; and a third layer that represents the local spatial summation and gain control in the superficial layers of V1, where the VSDI responses are measured. Each unit in the second and third layers computes the weighted sum of the input I(x, t) by cross-correlation with a Gaussian spatial receptive field:
where G(x) is a zero-centered Gaussian function with an SD of σG. The summation then passes through an RC circuit to produce the response, which is characterized by the following differential equation:
where C is the capacitance, A(x, t) is the receptive field summation activity, and g(x, t) is the conductance of the resistor for the stage. The conductance at each unit is defined as follows:
where b is a scaling factor that represents the strength of normalization and H(x) is the Gaussian weighting function (with SD σH) defining the normalization pool.
The parameters for all units within the same layer are identical, but they can differ between stages. Additionally, because the response in each model stage represents average membrane potentials, responses in the first stage are converted to spikes by a power function before being sent to the second stage.
The PGC model is characterized by a total of nine parameters. Four parameters (σG1, σH1, C1, b1) characterize the first response layer, another four parameters (σG2, σH2, C2, b2) characterize the second response layer, and one parameter (n) represents the spiking nonlinearity. In simulating the PGC model, we fixed the values of four of these parameters (C1, C2, b2, and n) to those used by Sit et al. (2009). In addition, like Sit et al. (2009), we assumed that the receptive and normalization fields in the second response layer were twice the size of those in the first input layer (i.e., σG2 = 2σG1 and σH2 = 2σH1). This left us with three free parameters: the sizes of the receptive σG1 and normalization σH1 fields in the first input layer and the strength (b1) of the normalization in the first layer. We estimated the values of these parameters by adjusting them such that (1) the normalized response to the 10% center stimulus matched that obtained for the pooled VSDI responses, (2) the width of the PGC responses matched the width of the pooled VSDI responses, and (3) the facilitation index obtained for the PGC responses (Fig. 6d) approximately matched that obtained for the VSDI responses in the 1° IED condition (Fig. 4, top right). The obtained parameters were σG1 = 0.87 mm, σH1 = 1.07 mm, and b1 = 500. The model was simulated for a 20-mm-long strip centered on the response to the center element.
Results
We used VSDI to measure population responses to briefly flashed individual or pairs of spatial Gabor stimuli (sine phase; spatial frequency, 2 cpd; σ = 0.167°) in the primary visual cortex of three male macaque monkeys performing a fixation task.
Two features of our measurements are important to consider. First, VSDI signals are directly proportional to membrane voltage and reflect aggregate synaptic inputs to neurons in layers 1–3. We consider the potential implications of our results for understanding lateral interaction at the level of the spiking output in the Discussion section. Second, whereas VSDI in behaving monkeys can measure reliable signals at the columnar scale (∼1.2 cycles/mm; Chen et al., 2012), such high-resolution imaging requires zooming in on a small cortical area. Here, instead, we focus on maximal coverage, which is necessary to image the entire retinotopic-scale population response to local elements (the subthreshold CPI). The nature of lateral interactions at the orientation columns scale will be examined in future studies. Our previous VSDI studies in behaving monkeys have demonstrated that retinotopic-scale V1 signals contain enough information to account for detection performance (Chen et al., 2006, 2008) and may contribute to shape perception (Michel et al., 2013). Therefore, it is important to characterize the nature of lateral interactions in V1 at this fundamental scale.
In preliminary measurements, we used a mapping procedure (Yang et al., 2007) to determine the precise layout of the retinotopic map in the imaged area (Fig. 1a,b). We then positioned the Gabor stimuli such that the VSDI response to a target element, termed “center,” fell entirely within the imaged area.
Cortical retinotopy and temporal response profile in an example experiment. a, Schematic illustration of the visual field with a 1 × 1° rectangular grid. The colored lines represent the approximate limits of visual space represented in our imaging windows. b, Image of the cortical vasculature across a 10 × 10 mm region of V1 (Monkey 1, left hemisphere) with overlaid scale marker, landmarks, and approximate retinotopy for the rectangular grid shown in a. c, The time course of the stimulus presentation (with gray bars representing periods of stimulus presentation) and of the corresponding neural responses. Stimulus presentations consisted of repeated 60 ms presentations of a Gabor pattern separated by 140 ms presentations of a blank gray screen. The blue curves represent the average VSDI response in a 1 × 1 mm2 window centered on the peak spatial response to the 100% (solid) and 10% (dashed) contrast Gabor stimuli, as a function of time, whereas the dashed/dotted black curve represents the fitted response (see Materials and Methods) for a single 60 ms pulse of the 100% contrast stimulus. Response amplitudes were well described by a model in which individual pulse response functions were combined using a maximum (max) rule (solid black curve).
The primary goal of the current study was to measure the spatial properties of the VSDI response amplitude (Fig. 1c, peaks of solid blue curve) rather than the temporal dynamics. Because we are interested in subtle interactions, it was important to maximize our signal-to-noise ratio (SNR). The variability in the VSDI signal is dominated by low temporal frequencies. Therefore, to improve the SNR of our VSDI measurements, each 1 s stimulus presentation consisted of five on/off cycles (Fig. 1c, gray bars). This improves the VSDI SNR by one order of magnitude (Chen et al., 2012). To estimate the peak amplitude, we assumed that each pulse, if presented alone, would produce a response that is well described by a scaled gamma distribution function (Fig. 1c, dashed curves). The simplest hypothesis is that responses to the individual pulses combine linearly. However, we found that the maximum of the individual responses in time (Fig. 1c, solid black curve) provide a slightly better description of the combined response. Therefore, the VSDI response amplitude was defined as the peak height of the best-fitting (in a least-squares sense) scaled gamma distribution function (Fig. 1c, peak height of the dashed black curve in). Nonetheless, we found that the results described below are essentially the same using the amplitude of the gamma distribution function in the linear temporal model (data not shown).
Population responses to single Gabor stimuli
Consistent with our previous results (Chen et al., 2006; Sit et al., 2009; Palmer et al., 2012; Michel et al., 2013), single Gabor elements activated a localized ellipsoidal region, well described by a 2D Gaussian surface, that subtended several square millimeters of V1 (Fig. 2, first and second columns). The response to the target was anisotropic, with the major axis of the response ellipsoid oriented parallel to the V1/V2 border (Figs. 1b, 2; σmajor ≈ 1.9 mm, σminor ≈ 1.2 mm). Because the Gabor stimuli are small relative to the subthreshold receptive fields of V1 neurons at these eccentricities, the spatial extent of the activation across the cortical surface reflects the subthreshold CPI (Chen et al., 2012). Because of the membrane potential-to-spikes nonlinearity, the spatial extent of the VSDI response is about twice as large as the spatial extent of the spiking cortical response (Chen et al., 2012). Responses were measured to several different stimulus contrasts. The response amplitude depended strongly on contrast; however, as in previous experiments (Chen et al., 2006), the spatial profile of the response was mostly contrast invariant.
Visual stimulus configurations (top) and corresponding spatial response profiles (bottom) measured from V1 of Monkey 2 and averaged across four experiments. Response amplitudes (ΔF/F) were computed as described in Figure 1c. The blue and green dots indicate the mean locations of peak of the 2D Gaussians fit to the response profiles elicited by 10% contrast center and 100% contrast flanker Gabor patterns, respectively. The dashed white lines denote the boundaries of a 1-mm-wide strip centered on the line passing through the center of these mean locations. The first and second columns from the left represent the center-only and flanker-only conditions, respectively, whereas the third and fourth columns represent conditions in which both elements were presented simultaneously. The fourth column [center & flanker (ortho)] represents the stimulus configuration and elicited response when the orientation of the high-contrast flanker is orthogonal to that of the low-contrast center element.
Population responses to paired Gabor stimuli
Our primary goal was to characterize the population response to multiple visual elements. Previous single-unit studies in V1 reported that elements outside the classical receptive field can facilitate or suppress the responses to oriented stimuli in the center of the receptive field. When the contrast of the center element is low, collinear flanking elements that fall outside of the classical receptive field tend to facilitate the response to the center element (Kapadia et al., 1995; Polat et al., 1998). This has led to the hypothesis that this neurophysiological facilitation underlies some of the flanker facilitation effects reported in the psychophysical literature (Polat and Sagi, 1993; Kapadia et al., 1995). Therefore, we were particularly interested in measuring population responses to collinear stimulus configurations similar to those used in the psychophysical studies. The experimental stimulus conditions (Fig. 2) consisted of the following: (1) a “center-only” condition, in which a low-contrast (10%) vertical target element was presented alone; (2) a “flanker-only” condition, in which a high-contrast (100%) vertical element, offset from the position of the center element, was presented alone; (3) a combined “center and flanker” condition, in which the low-contrast center element and high-contrast flanker were presented simultaneously; (4) a combined “center and flanker (ortho)” condition where the flanker was orthogonal in orientation; and 5) a “blank” (uniform gray) stimulus.
Psychophysical (Polat and Sagi, 1993, 1994; Kapadia et al., 1995) single neuron (Kapadia et al., 1995) and neural population (Kinoshita et al., 2009; Meirovithz et al., 2010; Reynaud et al., 2012) studies have observed that the modulatory effects of flanking stimuli can depend on interelement spacing; therefore, we included several different separations between the center and flanker elements. In a given experiment, the center element always fell at the same retinal location. In the collinear conditions, the flanker elements were presented at positions corresponding to vertical separations ranging from 0.5 to 2.0° from the center element. In the orthogonal conditions, the separation was always 0.75°.
To analyze the response amplitude at different cortical locations in V1, we defined a narrow 1-mm-wide strip (Fig. 2, dashed lines) around the line connecting the peaks of the spatial responses to center (blue dot) and flanker (green dot) elements. Within this strip, we divided the imaging pixels into small (∼0.5 mm) bins according to their locations along the strip, averaging the response amplitudes within each bin. Example measurements from one monkey at five different positions of the flanker are shown in Figure 3. The response to the center element alone is shown in blue and is the same in all panels. The response to the flanker alone is shown in green, and the combined response is shown in red. Because of the spatial extent of the neural activity noted above, the flankers typically elicited substantial activity at the cortical site corresponding to the peak of the center stimulus (Fig. 3, green curves). Our analysis focused on examining how the responses elicited by the flanker elements interacted with the response elicited by the center element. The black curves show the sum of the responses to the center alone and flanker alone. Note that the red curves are substantially below the black curves except at the largest separation, indicating strong subadditivity. Another way to visualize the degree of additivity is to subtract the response to the flanker alone from the response to the combined stimuli (Fig. 3, dashed blue curve), which is equivalent to asking what is the response to the center in the presence of the flanker. If the responses to the center and flanker element were additive, then the dashed blue curve would be superimposed on the solid blue curve. Figure 4 shows the combined measurements from all the experiments in all three monkeys (see Materials and Methods for details about how the measurements were combined). As can be seen, the pattern of results is quite similar to the example measurements in Figure 3.
Comparison of spatial response profiles across conditions (Monkey 2, n = 4 experiments). Colored curves represent cross sections of the 2D spatial response amplitudes, averaged across the 1 mm strips indicated in Figure 2. The curves represent amplitudes of the VSDI responses to the center element presented alone (blue curves), the flanker element presented alone (green curves), and the simultaneous presentation of both elements (red curves) for stimulus configurations having four IEDs and two center–flanker alignment conditions. The blue and green triangles at the bottom represent the peak response locations for the center and flanker elements, respectively, computed using 2D Gaussian fits to the cortical responses (Fig. 2). Black curves represent the responses expected under linear combination (summation) of center and flanker responses. Dashed blue curves represent the net response to the target in the combined condition, after subtracting the response to the flanker (i.e., the red curve minus the green curve); if the responses are additive, the dashed blue curve must overlay the blue curve. The gray text insets indicate the computed facilitation index (see Equation 12), with IF values smaller than 1 indicating a sublinear combination.
Comparison of spatial response profiles across all conditions and experiments (n = 14) in all three monkeys. See the legend for Figure 3.
We performed several different analyses to quantify the interactions between the center and flanker elements. First, we computed the response at the center location as a function of the cortical separation between the center and the flanker, for center alone, flanker alone, and center plus flanker (Fig. 5a). To do this, we averaged the data across all experiments after normalizing the responses in each experiment by the response to the 100% contrast center element presented alone. The response to the center element at 10% contrast (Fig. 5, blue horizontal curve) was ∼40% of that to the center element at 100% contrast. The response at the center location to the flanker element alone was larger than that to the center element when the interelement separation was <2.5 mm and fell to zero at about 6 mm separation (Fig. 5, green curve). The response to the combined stimulus was equal to the response to the flanker alone at small separations and became equal to the response to the center alone at a separation of ∼4 mm (Fig. 5, red curve). At every separation where the flanker was still producing a significant response at the center location, the combined response was strongly subadditive (Fig. 5, compare red and black curves).
Summary of spatial response properties across all experiments. a, Normalized average response amplitude in a 1 × 1 mm cortical patch representing the location of the peak response to the center element (Fig. 2, blue dot) as a function of the interelement cortical distance between the center and flanker elements. As in Figure 4, the blue curves represent the amplitude of the VSDI responses to the center element alone, the green curves represent the responses to the flanker alone, the red curves represent the observed responses to the combined (center and flanker) stimulus, and the black curves represent the responses expected under a linear combination. Markers represent pooled responses computed at each of the five stimulus IEDs shown in Figure 4. Shaded regions and error bars represent 95% confidence intervals for the mean response amplitude. b, Same as a but for a cortical patch representing the midpoint between the peak responses to the center and flanker elements. c, The value of the facilitation index IF computed for the pooled spatial responses as a function of cortical interelement distance (bottom axis) and as a function of the interelement separation measured in wavelengths λ of the Gabor carrier (top axis). The solid black curve represents the facilitation index computed for collinear stimulus configurations, whereas the dashed magenta curve represents the facilitation index computed for orthogonal stimulus configurations. The shaded areas represent 95% confidence intervals determined using a bootstrap procedure. The unhatched region on the right side of the plot represents the range of distances over which significant behavioral threshold facilitation has been demonstrated (Polat and Sagi, 1993).
Next, we used the same procedure to characterize the interactions at the midpoint between cortical locations of the peak responses to the center and the flanker (Fig. 5b). Note that in this case, the cortical separation from the midpoint to the center of each stimulus is half the cortical separation between the elements. Again, the combined response is equal to the flanker-alone response at small interelement separations and is strongly subadditive at all interelement separations out to ∼3 mm from the midpoint, beyond which the combined response becomes fairly linear.
Finally, to summarize the interactions at all spatial locations along the line connecting the peak responses, we used a simple facilitation index. To compute this index, we took the difference between the response to center plus flanker and flanker alone (Fig. 4, dashed blue curves) and divided it by the response to the center alone (Fig. 4, solid blue curves). The value of the index was defined to be the average over a spatial strip measuring 4 × 1 mm, centered on the peak response to the target alone. Formally, the index is defined by the following equation:
where rC+F(x) is the response to the center plus flanker, rF(x) is the response to the flanker alone, rC(x) is the response to the center alone, ΩC is the set of spatial locations over which the average is computed, and nC is the number of points in the set of spatial locations. An index value >1.0 indicates facilitation, an index value equal to 1.0 indicates additivity (independence), and an index value <1.0 indicates subadditivity. The facilitation index values are given in Figures 3 and 4 and are plotted for all separations between center and flanker in Figure 5c. Strong subadditivity is observed when the flanker element is near the center element and approaches additivity when the flanker exceeds a distance of 6 mm. Facilitation was not observed at any separation. This pattern is broadly consistent with previous studies (Grinvald et al., 1994; Kinoshita et al., 2009; Meirovithz et al., 2010) that used different stimuli and recording techniques to examine lateral interactions in V1 population activity.
We next examined the responses when the flanker element was orthogonal to the center element. Figure 3 (top right) and Figure 4 (bottom right) show the results for the orthogonal condition. Clear subadditivity was also observed in the orthogonal conditions. Furthermore, the average facilitation index across all monkeys was slightly lower (more subadditive) for the collinear conditions than for the orthogonal conditions (Fig. 4).
Population gain–control model and simulations
The primary goal of the present study was to use VSDI to begin characterizing the population responses in V1 to multiple visual elements. Our choice of small oriented elements was motivated in large part by psychophysical studies that have found strong facilitatory detection effects and contour-grouping effects for such stimuli. If the mechanisms underlying these effects are located in V1, then we should have observed facilitatory interactions for certain configurations of the elements in our experiments. Instead, the interactions we observed were either additive or subadditive, suggesting that the mechanisms underlying the facilitatory-detection and contour-grouping effects are primarily downstream of the membrane potential activity in V1 (however, see Discussion).
A stronger test of this hypothesis is to determine whether the multielement interactions we measured can be accounted for by the well known suppressive mechanisms that have been identified in V1.
In previous work, we found that VSDI responses to single-element stimuli were well predicted by a PGC model (Sit et al., 2009) based on the contrast gain–control mechanisms that have been identified in single-unit studies (Albrecht and Geisler, 1991; Heeger, 1992; Carandini and Heeger, 1994; Carandini et al., 1997; Cavanaugh et al., 2002). In the PGC model, the gain of the response of a group of neurons at a given cortical location is controlled divisively by the pooled activity of a (potentially more spatially extended) population of neurons (the “normalization pool”). Thus, according to this model, a flanking stimulus that does not directly activate a group of V1 neurons responding to a central element may, nevertheless, affect their response if it activates neurons in the normalization pool. Such an effect is expected to be suppressive or subadditive, as flanking stimuli will reduce the gain of the response to the central element.
The model (Fig. 6a,b) consists of an input layer representing the raw visual input; a second layer representing the local spatial summation and gain control arising in the retina, LGN, and the input layers of V1; and a third layer that represents the local spatial summation and gain control in the superficial layers of V1, where the VSDI responses are measured. The blue and red regions in Figure 6 represent the spatial summation and gain–control (normalization) pools, respectively, for each location in the subsequent layer. As indicated in Figure 6, the normalization pool has a larger spatial extent than the summation pool, in agreement with the single-unit literature (Cavanaugh et al., 2002). Each unit computes two weighted sums of the units in the projecting layer. One of these sums, G(x), controls the input current, and the other, H(x), controls the conductance and thus the gain of the circuit. The output voltage of this simple RC circuit represents the unit's response. We note that this is a functional/computational model that is not meant to represent any specific biophysical mechanism. More details of the model can be found in the study by Sit et al. (2009).
PGC model and response properties. a, The model of Sit et al. (2009) consists of an input layer representing the raw visual input; an intermediate layer representing the nonlinearities arising in the retina, LGN, and the input layers of V1; and a third layer that represents the superficial layers of V1, where the VSDI responses are measured. The blue and red regions represent the receptive fields and normalization pools, respectively, of individual units in the subsequent layers. b, The processing in a model unit (intermediate and output layers). Each unit computes two weighted sums [G(x) and H(x)] of the units in the projecting layer that feed into a parallel RC circuit, with G(x) controlling the input current, H(x) controlling the conductance and thus the gain of the circuit, and the voltage across the capacitor representing the unit's response. [a and b are modified from the study by Sit et al. (2009)]. c, A sample of the PGC model's temporal response profile for the stimuli used in the current study (symbols as in Fig. 1c). d, A sample of the PGC model's spatial response profile (cross section) for stimuli used in the current study. Here, the Gabor elements are spaced 1° apart at an assumed CMF of 3.7 mm/° (symbols as in Fig. 4).
Figure 6c shows the PGC model's response to a stimulus consisting of a single element. As can be seen, the temporal dynamics predicted by the model are primarily consistent with the observed dynamics (compare Figs. 6c, 1c). For example, the model's responses are also better described as the maximum of the individual responses in time rather than the sum of the responses. To generate predictions for the spatial responses to single and multiple elements, we applied the same method used to analyze the voltage-sensitive dye responses. Specifically, the model's response amplitude was defined as the peak height of the best-fitting (minimum squared error) scaled gamma distribution function as in Figure 1c. In Figure 6d, the blue curve shows the predicted response across space to the center element alone, and the green curve shows the response to the flanker element alone at a distance of 1°. The red curve (Fig. 6d) shows the predicted response to the sum of the two elements, and black curve (Fig. 6d) shows the linear sum of the responses to the two elements. Again these predictions are generally consistent with our results (Fig. 4, top right).
Figure 7a–c summarizes the predicted response for all separations of the center and flanker elements. Again, the predicted results are generally consistent with the data (Fig. 5). When the element separation in cortex is small (<4 mm), the combined responses are strongly subadditive and become more linear at larger separations. The gain–control in the PGC model is not orientation tuned, and hence it also predicts the equally strong suppression for the conditions with parallel and orthogonal flankers (Fig. 4). In summary, the results of the modeling are consistent with the hypothesis that the interactions we observed in membrane potential responses are primarily attributable to overlapping receptive fields and contrast gain control. In the Discussion, we consider the potential implications for spike responses.
Spatial response properties for the PGC model of membrane potential activity, for comparison with the VSDI results presented in Figure 5. a, A cross section of the PGC model's spatial response profile for Gabor elements whose peak responses are spaced 5.0 mm apart (symbols as in Fig. 4). b, c, f, g, Normalized responses computed at the location of the center (b, f) and at the midpoint between the center and flanker elements (c, g). d, h, Facilitation index of the PGC model as a function of cortical interelement separation, computed using the cross sections of the spatial response patterns. e, A spatial response profile corresponding to the same stimulus configuration after the PGC responses have been passed through a spiking nonlinearity. a–d represent simulated subthreshold responses whereas e–h represent the corresponding responses after application of a spiking nonlinearity.
Discussion
We used VSDI to measure the spatiotemporal neural population responses in a large region of primary visual cortex (V1) to pairs of collinear and orthogonally oriented stimulus elements. The voltage-sensitive dye responses are proportional to the real-time summed membrane potential responses in the upper cortical layers (i.e., the output layers that project to other cortical areas) and can be measured over the entire area activated by the center stimuli (Fig. 2), making them ideal for characterizing interactions at the whole-population level. Our primary finding is that nearby contour elements combine in a subadditive fashion (mutual suppression) and that distant elements combine in a more additive fashion (simple summation). We also found slightly less suppression when the flanker element was orthogonal to the test element, consistent with the study by Cavanaugh et al. (2002). We found no evidence of superadditive interaction (facilitation). Indeed, the results as a whole were qualitatively predicted by a population contrast–gain–control model consistent with the contrast–gain–control (normalization) mechanisms that have been identified in single-unit studies of the retina, LGN, and visual cortex. Thus, our results would seem most consistent with the hypothesis that most of the (presumably) facilitatory neural interactions that underlie behavioral contour detection and contour grouping performance occur downstream of primary visual cortex.
Nonetheless, VSDI measures membrane potentials rather than spiking activity, and it is the spiking activity in the upper cortical layers that is being transmitted to other cortical areas. Intracellular recording of cortical neurons shows that there is a nonlinear relationship between membrane potential and spike rate. Specifically, spike rate is approximately a power function of the membrane potential with an exponent that averages around 3.0 (Priebe and Ferster, 2008; Tan et al., 2014). A power exponent >1.0 can lead to superadditivity. For example, if stimulus a and stimulus b alone produce a membrane potential response of 2 (units arbitrary), then a power exponent of 3.0 will produce a spike response proportional to 8. If membrane potentials are additive, then the membrane potential response to a simultaneous presentation of both stimuli would be 4 (2 + 2 = 4), and the spike response would be proportional to 48 (43 = 48). This spike response is much larger than the sum of the individual spike responses, which would be proportional to 16 (8 + 8 = 16). Cardin et al. (2010) have empirically demonstrated this phenomenon (i.e., subadditive response combination in membrane potentials leading to superadditive response combination in the spiking responses) in intracellular recordings of individual V1 neurons.
To assess the potential effect of the spiking nonlinearity, we applied a power exponent to the predicted population membrane potential responses of the PGC model. However, directly applying the average exponent measured in intracellular single-unit recording to the predicted voltage-sensitive dye responses is not justified because applying a power exponent to individual membrane potentials and summing (which would estimate the spiking population activity) is not algebraically equivalent to summing the membrane potentials and then applying a power exponent. To evaluate this issue, we simulated a large population of neurons having a distribution of half-saturation contrasts (values of c−50) similar to those measured in macaque V1 (Albrecht and Hamilton, 1982; Sclar et al., 1990). To simulate population spiking activity, we applied a response exponent of 3.0 to each individual neuron's membrane potential response and summed across the population. In comparison, we first summed the membrane potential responses of all the neurons and then applied a response exponent. We found that applying a response exponent of 2.7 to the summed membrane potential responses closely matched the simulated summed spiking activity. Thus, applying an exponent of 2.7 to the predicted population membrane potential responses should give us approximately the results for population spiking activity.
A summary of this analysis is shown in Figure 7e–h, which is analogous to the membrane potential predictions in Figure 7a–d. When the element separation in cortex is small (<2 mm), the combined responses are subadditive, but they become facilitatory at separations between 2 and 7 mm and additive beyond that. As mentioned above, the gain–control in the PGC model is not orientation tuned, and hence it also predicts the same suppression and facilitation for the parallel and orthogonal flankers (Fig. 4). The predicted facilitatory and suppressive effects are weak in Figure 7e, which shows the predicted spiking responses at the location of the center element. Thus, the PGC model predicts that, on average, it should be relatively difficult to observe facilitatory or suppressive effects in the spiking activity of neurons with receptive fields (RFs) centered on a low-contrast element when the flanker is >1.5 mm distant. Figure 7f suggests that facilitatory responses should be relatively easier to observe by recording from neurons that are between the center and flanker elements. Figure 7g suggests that for spacing between center and flanking elements of 2–6 mm, there should be strong facilitation in total spiking response pooled over a 4 × 1 mm region centered on the 10% contrast element.
The analysis in Figure 7e–h makes the important point that an entirely local spiking nonlinearity can create facilitatory interactions in spiking activity from purely suppressive (or additive) interactions at the level of membrane potentials. Indeed, for every V1 neuron, we would expect there to be some separation of the two elements that could produce some superadditivity because of the spiking nonlinearity. Thus, our VSDI results and the PGC model are not necessarily inconsistent with some of the facilitatory effects reported in the single-unit literature.
Polat and Sagi (1993; Fig. 3) reported that, for psychophysical detection experiments with colinear Gabor elements of similar spatial frequency and bandwidth to those tested here (note that they tested a variety of spatial frequencies), suppression (threshold elevation) is observed up to distances equal to approximately the spatial wavelength λ (1/spatial frequency) of the elements, and that facilitation is observed from there up to more than 8λ. This range of distances corresponds to a range of 0.5–4° of visual angle for the stimuli used in the current experiment, which extends well beyond the maximal cortical IED at which we predict facilitation of spiking population responses (i.e., ∼8 mm; Fig. 7 h). Furthermore, they find that when the flanking high-contrast element is orthogonal to the test element, there is little effect on threshold, whereas we find weaker subthreshold suppression (and therefore stronger spiking facilitation) for the orthogonal flanker (Figs. 4, 5c). Thus, their results are not very consistent with our VSDI results and with the predictions of the population spiking activity in Figure 7e–h.
It is also important to note that predicted facilitation in mean spiking activity in Figure 7, g and h, does not directly imply better detection performance (i.e., better signal-to-noise ratio). Whether one should expect better detection performance depends on the sources of the noise. For example, if the dominant noise is before the spiking nonlinearity, then the nonlinearity should have little effect on detection performance.
A final question to consider is how our results might be related to the mechanisms of contour grouping. A standard view of contour grouping is that (1) local pattern-detection mechanisms identify oriented contour elements, (2) local grouping mechanisms bind/associate nearby elements that are consistent with the statistical properties of natural contours, and (3) global grouping mechanisms form representations of extended contours. There is little doubt that the receptive field properties of neurons in V1 contribute to step (1), but it is not so clear that they contribute to step (2). A recent study showed a clear grouping-related population signal in V1 of monkeys performing a grouping task (Gilad et al., 2013). However, these grouping-related signals were significantly delayed relative to the visual responses, suggesting that these signals could reflect post-grouping attentional effects. The facilitatory lateral effects on detection performance and in V1 neural responses occur primarily when one of the contour elements has low or near-threshold contrast (Polat and Sagi, 1993, 1994; Kapadia et al., 1995). On the other hand, perceptual contour grouping is typically measured when the elements belonging to a contour are of high and equal contrast (Field et al. 1993). Single-unit studies suggest that under such circumstances, lateral interactions are generally suppressive (Polat et al., 1998; Sceniak et al., 1999; Cavanaugh et al., 2002; Levitt and Lund, 2002). It is still possible that there is a specialized subset of V1 neurons that contribute to contour grouping or that the elongated RFs of V1 neurons contribute to local contour grouping (De Valois et al., 1982; Webster and De Valois, 1985; Ringach, 2002; Michel et al., 2013), but it seems likely that local and global contour grouping occur primarily in later cortical areas.
Footnotes
This work was supported by NIH Grants NEI-EY016454 (E.S.), NEI-EY024662 (W.S.G. and E.S.) and NEI-EY11747 (W.S.G). We thank Tihomir Cakic for technical assistance, Bill Bosking for initiating this project and collecting some of the preliminary data, and current and former members of the Seidemann laboratory for assistance with this project.
The authors declare no competing financial interests.
- Correspondence should be addressed to Melchi M. Michel at the above address. melchi.michel{at}rutgers.edu