Abstract
In a typical visual scene we continuously perceive a “figure” that is segregated from the surrounding “background” despite ongoing microsaccades and small saccades that are performed when attempting fixation (fixational saccades [FSs]). Previously reported neuronal correlates of figure-ground (FG) segregation in the primary visual cortex (V1) showed enhanced activity in the “figure” along with suppressed activity in the noisy “background.” However, it is unknown how this FG modulation in V1 is affected by FSs. To investigate this question, we trained two monkeys to detect a contour embedded in a noisy background while simultaneously imaging V1 using voltage-sensitive dyes. During stimulus presentation, the monkeys typically performed 1–3 FSs, which displaced the contour over the retina. Using eye position and a 2D analytical model to map the stimulus onto V1, we were able to compute FG modulation before and after each FS. On the spatial cortical scale, we found that, after each FS, FG modulation follows the stimulus retinal displacement and “hops” within the V1 retinotopic map, suggesting visual instability. On the temporal scale, FG modulation is initiated in the new retinotopic position before it disappeared from the old retinotopic position. Moreover, the FG modulation developed faster after an FS, compared with after stimulus onset, which may contribute to visual stability of FG segregation, along the timeline of stimulus presentation. Therefore, despite spatial discontinuity of FG modulation in V1, the higher-order stability of FG modulation along time may enable our stable and continuous perception.
- contour integration
- figure-ground
- fixational saccades
- primary visual cortex
- visual stability
- voltage-sensitive dye imaging
Introduction
When attempting to perform visual fixation, small saccades and microsaccades change the gaze position a few times per second. We will refer to these collectively as fixational saccades (FSs) (Martinez-Conde et al., 2004; Rolfs, 2009). Because FSs cause a stimulus displacement over the retina, neuronal activity is modulated in relation to FSs. Neuronal responses in primary visual cortex (V1) are strongly influenced by FSs, exhibiting mainly enhancement and, to a lesser extent, suppression (Wurtz, 2008; Martinez-Conde et al., 2013). Recently, we showed that neuronal population activity in V1 is “hoping” within the retinotopic map of V1 after each FS, following the stimulus retinal shift (Meirovithz et al., 2012). However, all above effects were reported while the animals were performing fixation and presented with various visual stimuli (Martinez-Conde et al., 2013). Therefore, the direct influences of FSs on neuronal activity related to perceptual processing (e.g., figure-ground [FG] segregation) remained unknown.
In contour integration, similarly oriented elements are perceptually grouped to form a coherent contour (“figure”) that is segregated from a noisy background (“ground”) (Field et al., 1993; Lamme, 1995; Li et al., 2006). We recently found that the population response of the contour was increased simultaneously with suppressed activity in the noisy “background” (Gilad et al., 2013). The activity difference between the “circle” and “background” was defined as FG modulation. FG modulation is developed relatively late (>100 ms) and probably reflects top-down influences. However, it was not clear whether FG modulation persisted throughout the entire stimulus presentation because FSs (starting typically ∼250 ms after stimulus onset) are displacing the stimulus over the retina. Specifically, it is not known whether FG segregation is recomputed after each FS.
Here we used voltage-sensitive dye imaging (VSDI) to study how population responses in V1, related to FG segregation, are affected by FSs. We used eye position measurement along with an analytical 2D retinotopic model to predict the cortical coordinates of the “circle” and “background” elements before and after each FS for each trial. Thus, we could compute the FG modulation before and after each FS.
Materials and Methods
Behavioral task.
Two adult male monkeys (Macaca fascicularis; Monkey L and Monkey T) were trained on a contour detection task (Gilad et al., 2013). After a random fixation interval (3–4 s), either a contour or noncontour stimulus was displayed for 1 s on a uniform gray background (see Fig. 1A). After stimulus offset, the monkeys reported whether a contour or noncontour stimulus appeared. Performance level was 91% for Monkey L and 89% for Monkey T.
Eye position recording and FS detection.
Throughout the trial, the animals were required to maintain fixation. Eye position was monitored by an infrared eye tracker (Dr. Bouis Device, Kalsruhe, Germany), sampled at 1 kHz, and recorded at 250 Hz. Throughout stimulus presentation, the monkeys made 1–3 small FSs (see below). An algorithm for microsaccade (amplitude < 1°) and saccade (amplitude > 1°) (Martinez-Conde et al., 2009) detection (Engbert and Mergenthaler, 2006; Meirovithz et al., 2012) was implemented on the eye position data.
Surgeries and VSDI imaging.
The surgical procedure and VSD staining have been reported previously (Slovin et al., 2002). All experimental procedures were performed according to the National Institutes of Health guidelines, approved by the Animal Care and Use Guidelines Committee of Bar-Ilan University, and supervised by the Israeli authorities for animal experiments. Imaging was performed with Micam Ultima system: 10 ms/frame; 10,000 pixels/frame; pixel imaged area: 1702 μm2. The protocol of data acquisition in VSDI was described previously (Slovin et al., 2002).
Data analysis.
Data analysis was performed on 15 and 13 recording sessions from Monkey L and Monkey T, respectively (data from Monkey L were partially reanalyzed from Gilad et al., 2013). Here we focus on population response in relation to FSs, typically starting ∼250 ms after stimulus onset. In Gilad et al., 2013, the data were analyzed only in relation to stimulus onset and within first 250 ms after stimulus onset). MATLAB (version 2010b; MathWorks) software was used for analyses.
Basic VSDI analysis.
The basic analysis of the VSDI signal is detailed previously (Slovin et al., 2002; Ayzenshtat et al., 2010, their supplemental Fig. 12). In addition, we removed pixels exceeding a threshold of intertrial SD, which excluded mainly pixels in the vicinity of large blood vessels. VSDI maps were low-pass-filtered with a 2D Gaussian filter (σ = 1–1.5 pixels) for visualization purposes only.
Retinotopic mapping of the “circle” and “background” Gabors.
To analytically transform the visual stimulus into cortical coordinates, we implemented a 2D spatial transformation according to an analytical retinotopic model (Schira et al., 2007; Ayzenshtat et al., 2012). This model enables to reliably map a visual stimulus from its visual field coordinates (i.e., eccentricity and polar angle) into its cortical coordinates (see Fig. 1C). Registration and fitting quality were done using a separate set of experiments with point/Gabor stimuli and are described previously (Ayzenshtat et al., 2012). Using eye position data, after each FS, we transformed and registered the stimulus onto its new cortical coordinates (see Fig. 2B).
Criteria for trial analysis and FS characteristics.
To compute the FG modulation (FG-m) (see below) in relation to FS, the imaged V1 area before/after an FS needed to include pixels belonging to the “circle” and to the “background.” Only part of the trials with FSs adhere to this constraint. We set two criteria for a trial to be further analyzed: (1) A threshold of minimum 25 pixels in the “circle” and the “background” ROIs (pixels within each Gabor outline). This was used to achieve a reasonable signal-to-noise ratio. (2) A threshold for the minimal FS amplitude (0.5°) to minimize the overlap between the “circle” ROIs before and after an FS. The number of trials passing these criteria was as follows: after the first FS (see Figs. 3 and 4A, second from top), 264 and 133 contour trials for Monkey L and Monkey T, respectively; after the second FS, 165 and 106 trials for Monkey L and Monkey T, respectively (see Fig. 4A, third from top). The number of trials was insufficient to compute the responses in the new location after the third FS. FS onsets histograms are shown in Figure 4B. Median amplitude was 2.91°/3.32°, 3.10°/2.79°, and 0.72°/0.31° for the first, second, and third FS for Monkey L and Monkey T, respectively.
FG-m in relation to FS onset or stimulus onset.
We defined FG-m as the mean population response in the “circle” ROI minus the mean population response in the “background” ROI. This was computed for each time frame. The FG-m was aligned to the onset of the first (see Figs. 3A and 4A,B), second, or third FS (see Fig. 4A,B) in each trial and then averaged over all trials. We also computed the FG-m aligned on the stimulus onset (see Fig. 3B). The FG-m was normalized as follows: (1) For each FG-m computed in a single contour trial, we subtracted the average FG-m in the noncontour condition. This was done to discard nonhomogeneous VSD staining. (2) The FG-m in the new retinotopic location was set to 0 at FS onset time. This was done to eliminate FG-m residuals from ROIs in the old location. Nonnormalized FG-ms show similar results (data not shown).
Statistical analysis.
Paired signed rank test and Mann–Whitney U test were used for statistical significance. We computed FG-m on two control datasets (104 iterations): (1) label shuffling between contour and noncontour trials (see Fig. 3A, gray dashed lines); and (2) random onset times for FSs in each trial.
Results
We trained two monkeys on a contour detection task where they had to discriminate between contour and noncontour stimuli (see Materials and Methods) (Gilad et al., 2013). Using VSDI, we measured population responses at high spatial and temporal resolution from V1, during task performance. The main goal of this study was to investigate how FSs, performed when attempting fixation and typically starting ∼250 ms after stimulus onset, affect FG processing in V1. The voltage-sensitive dye signal measures the sum of membrane potential from all neuronal elements in the imaged area, emphasizing subthreshold synaptic potentials (Shoham et al., 1999; Grinvald and Hildesheim, 2004). Data analysis was performed on 15 and 13 recording sessions from Monkey L and Monkey T, respectively.
Retinotopic mapping of the “circle” and “background” elements, before and after each FS
The contour stimulus was comprised from a circular contour of similarly oriented Gabor elements, embedded in a noisy background (an array of randomly oriented Gabors; Fig. 1A, left). The noncontour stimulus was comprised from the background alone (Fig. 1A, right) where the contour elements were randomly rotated along their circular path axis. The stimulus part that is mapped onto the V1 imaged area is approximately outlined in Figure 1A, B (yellow rectangle). We imaged V1 population responses that were evoked by few Gabor elements comprising the “circle” (C1-C3) and the “background” (Bg1-Bg3; Fig. 1B). Using an analytical 2D retinotopic model, we transformed the stimulus in Figure 1B, from its location in the visual field (based on the monkey's eye position) onto the imaged area (Fig. 1C) (Ayzenshtat et al., 2012; Schira et al., 2007) (see Materials and Methods). The predicted Gabors position fitted well with the activation patches of the response map evoked by the contour stimulus (Fig. 1D). To further quantify the fit, in additional experiments (see Material and Methods), the monkeys were performing fixation alone and presented with 1–2 Gabor elements, comprising the “circle” (Fig. 1E, top) or “background” and the predicted Gabors' position were compared with the activation patches (Fig. 1E, bottom). The root-mean-square deviation values between predicted and empirical Gabor/point stimuli coordinates are 0.42 ± 0.10 and 0.77 ± 0.48 mm for Monkey L and Monkey T, respectively (mean ± SD). This means that the accuracy of our retinotopic mapping is on average ∼3–5 pixels (Ayzenshtat et al., 2012).
Visual stimuli and retinotopic mapping of the “circle” and “background” elements. A, Examples for contour and noncontour stimuli. fp, Fixation point. For behavioral paradigm, see Materials and Methods. The yellow rectangle (not visible to the animal) represents the approximated part of the visual stimulus that is mapped to the imaged area in V1. B, The part of the contour stimulus (including 3 “circle” and 3 “background” Gabors) that is approximately mapped onto the imaged area. C, A 2D spatial transformation of the stimulus part in B onto the imaged area (see Materials and Methods). The predicted positions (ROIs) of 6 “circle” and “background” Gabors are outlined in black and white, respectively, on the imaged area. D, The 6 Gabor ROIs shown in C superimposed on the average population response map evoked early (60–80 ms) after stimulus onset (contour trials). E, Maps of population response (bottom), obtained by presenting 1–2 “circle” Gabor elements (top), along with the corresponding Gabor ROIs.
Next, we implemented an algorithm for microsaccade and saccade detection (Engbert and Mergenthaler, 2006; Meirovithz et al., 2012) on the monkeys' eye position data and used the retinotopic model to predict the cortical coordinates of the Gabor elements in the “circle” or “background,” before and after each FS. This was done for each trial and each FS separately.
The effects of FSs on figure-ground processing
To investigate the effects of FSs on FG processing, we analyzed single contour trials. Figure 2A–C shows a representative single-trial analysis. As expected, after stimulus onset a clear figure-ground modulation (FG-m) is developed (i.e., increased activity at the “circle” Gabor elements and a lower activity at the background area) (Fig. 2B, left; FG-m is the activation difference). After the first FS occurring after stimulus onset (Fig. 2A), the contour image is shifted over the retina and the FG-m follows this shift. The FG-m in the old location (defined by eye position before the FS; Fig. 2B, left) is eliminated and then reinitiates in the new location (defined by eye position after FS; Fig. 2B, right). This is further quantified in Figure 2C where the VSDI population response in the old “circle” location (black line) and in the new “circle” location (gray line) is plotted as function of time. The “circle” in the old location becomes (after the FS) part of the “background” area in the new location; similarly, the “circle” in the new location is part of the “background” area in the old location (Fig. 2B). Thus, Figure 2C shows the FG-m in the old location (area shaded blue) and in the new location (area shaded pink). Similar results are shown in two additional single contour trials with a microsaccade (Fig. 2D) and a small saccade (Fig. 2E). Therefore, the FG processing in V1 is recomputed after each FS and follows the stimulus retinal displacement, thus showing spatial discontinuity after each FS.
FG-m shifts from one cortical site to another after an FS. A–C, An example from one contour trial. A, Left, Horizontal and vertical eye position displaying one FS (onset = 343 ms after stimulus; magnitude = 0.55°). Right, Gaze position (red point) before and after the FS (arrow), superimposed on the contour stimulus. B, Activation maps showing FG-m (i.e., increased activity at the “circle” elements and decreased activity in the background) in the old (left; averaged at 300–340 ms after stimulus onset) and new (right; averaged at 560–620 ms after stimulus onset) retinotopic locations. The “circle” elements (C1, C2, C3) are depicted in black (old location) and gray (new location). The dashed black line indicates the border between the “circle” area (above the line) and the “background” area (below the line) in the old location. The “circle” elements in the new location mostly fall within the general background area of the old location. C, Time course of the population response for all “circle” elements in the old (black curve; also the background in the new location) and new (gray curve; also the background area in the old location) retinotopic locations. The FG-m is shaded blue in the old location and pink in the new location. D, E, Examples from two additional contour trials, with a microsaccade (D; amplitude < 1°; see Materials and Methods) and a saccade (E; amplitude > 1°).
The above results were confirmed for the pooled data, across all trials, from two monkeys as depicted by the normalized FG-m (see Material and Methods) in Figure 3. FG-m in the old location was eliminated (Fig. 3A, blue curve) and the FG-m in the new location is reinitiated (Fig. 3A, red curve). A significant FG-m was defined by exceeding mean ± 2 SD of the FG-m calculated for label shuffling control (gray lines; see Materials and Methods). Interestingly, the FG-m elimination in the old location is overlapping in time with the FG-m reinitiation in the new location (Fig. 3A). At the time of intersection (100/90 ms after FS onset for Monkey L/T), there was a significantly positive FG-m in both old and new locations for each monkey separately (FG-m = 1.17 ± 0.07 × 10−4 and 1.38 ± 0.09 × 10−4 (mean ± SEM) in the old and new locations, respectively, in Monkey L; FG-m = 0.52 ± 0.05 × 10−4 and 0.47 ± 0.05 × 10−4 in the old and new locations, respectively, in Monkey T; p < 0.001); each FG-m was compared with FG-m computed from label shuffling control. To test whether FG-m dynamics (elimination in the old location and reinitiation in the new location) are specifically related to the FS onset, rather than to nonspecific temporal changes along contour trials, we did the following analyses. First, FG-m was maintained high throughout stimulus presentation (p < 0.05; data not shown) when we removed the FS effects (trials were truncated at the FS onset). Second, FG-m extinction in the old location and reinitiation in the new location was significant compared with the random FS onset control (see Material and Methods; p < 0.01; data not shown). These results suggest that the overlap in time between FG-m in the old and new locations is specific to the FS onset.
Elimination and reinitiation of FG-m after an FS, grand average. Data are pooled for both monkeys (n = 264 trials from Monkey L; n =133 trials from Monkey T). A, The FG-m in the old retinotopic location (blue) and the new retinotopic location (red) both aligned on the first FS. Error bars indicate SEM over trials. Gray dashed lines indicate the FG-m from shuffled trials labeling (mean ± 2 SD; see Materials and Methods). B, The FG-m in the old retinotopic location aligned on the stimulus onset (blue) and the FG-m in the new retinotopic location aligned on the first FS (red; as in A). Error bars are as in A.
Next, we compared the time course of the FG-m evoked by stimulus onset with the FG-m reinitiated after the first FS, in the new location (Fig. 3B; data pooled for both monkeys). The FG-m after FS onset (Fig. 3B, red trace; new location) developed faster compared with after stimulus onset (Fig. 3B, blue trace). There is a higher FG-m value after the first FS compared with the FG-m after stimulus onset in each monkey separately (FG-m = 2.52 ± 0.23 × 10−4 and 0.98 ± 0.26 × 10−4 (mean ± SEM) aligned to the first FS and stimulus onset, respectively, in Monkey L; FG-m = 0.83 ± 0.18 × 10−4 and 0.27 ± 0.22 × 10−4 aligned to the first FS and stimulus onset, respectively, in Monkey T; p < 0.01; Mann–Whitney U test; FG-m averaged at 110–170 ms; Fig. 3B, gray bar). Therefore, it seems that the FG-m after an FS may reflect prior information concerning the new retinotopic position of the “circle” and the “background,” thus allowing a faster recomputation of FG-m.
To study whether the elimination and reinitiation of FG-m occur also for subsequent FSs that are performed throughout stimulus presentation, we computed the FG-m before and after each FS along the timeline of each contour trial as well as the FG-m evoked by stimulus onset (see Material and Methods). Similar analysis was done for each contour trial, and the results were then averaged across trials and monkeys (Fig. 4). Figure 4A displays the FG-m aligned on stimulus onset (top); first FS, old and new locations (second from top; same as in Fig. 3A); second FS, old and new locations (third from top); and third FS, old location (bottom). It is clear that, for each FS, the FG-m is eliminated from the old retinotopic location (defined before each FS; solid curves with circle symbols) and reinitiated in the new retinotopic location (defined after each FS; solid curves), and this is occurring repetitively for all successive FSs. In Figure 4B, the separated panels in Figure 4A are joined together, along the timeline of the contour trials. The medians of FS onset times for the first, second, and third FSs are as follows: 287, 507, and 790 ms, respectively (Fig. 4B, bottom). The elimination and reinitiation of the FG-m aligned for each FS are overlapping in time; that is, there is a significant positive FG-m in the intersection between blue and red curves (first FS) and between red and green curves (second FS; p < 0.01 for both monkeys compared with the label shuffle control). Importantly, the maximal value of FG-m along time is continuously high throughout the contour stimulus presentation (Fig. 4C), despite the occurrence of successive FSs.
Continuity of FG-m along the timeline in contour trials, grand average. Data are pooled from two monkeys (see Materials and Methods). A, The FG-m as a function of time aligned to stimulus onset or successive FSs (from top to bottom): stimulus onset, first, second, and third FSs. For each FS, FG-m is shown in the old location (defined by the gaze position before the FS; solid curves with filled circles symbols) and in the new location (defined by the gaze position after the FS; solid curves). In each curve, the FG-m peak was normalized to 1. To mark temporal discontinuities on the time axis, we added // on the x-axis, denoting the first point in the curves aligned on FS onset (curves with filled circle). Error bars indicate SEM over trials. B, The separated panels in A are joined together, along the common timeline of contour trials. Vertical arrows point to the FS onset-time histograms, across all trials, for the first, second, and third FSs. C, The black line indicates the maximal FG-m value for each time point.
Discussion
In this study, we asked how FSs influence FG processing in V1. In general, most neurophysiological studies combined with demanding visual task make an effort to discard the effects of FSs. In contrast, studies investigating the effects of FSs on neuronal activity use mostly tasks of fixation alone (Martinez-Conde et al., 2013). This is mainly because FSs are unpredictable, making it difficult, for example, to place a small stimulus inside a neuron's receptive field after each FS. Using a 2D analytical model, eye position measurement, and VSDI we were able to compute FG-m in relation to FSs on single trials and perform a reliable investigation of the relation between FSs and perceptual processes, such as FG.
FG-m probably involves top-down influences and is recomputed in V1 after each FS. This can be considered as a “wasted” processing because we previously showed that FG-m at early times (<250 ms), before any FSs are executed, could discriminate well between contour and noncontour trials (Gilad et al., 2013). Thus, it is possible that the post-FS FG reprocessing reflects an automatic process that occurs during long fixation over the same stimulus, as in the current task (Nachmias, 1961; Poletti and Rucci, 2010). Nevertheless, FSs performed under natural viewing conditions and ocular drifts were shown to play an important role in visual tasks involving analysis of higher spatial details (Ko et al., 2010; Ahissar and Arieli, 2012).
In contrast to the spatial discontinuity of FG-m in V1, we found that the FG-m initiated in the new retinotopic location (defined by eye position after the FS) is overlapping in time with the elimination of the FG-m in the old retinotopic location (defined by eye position before the FS). This overlap is facilitated by a faster development of FG-m after an FS compared with after stimulus onset. This result is in accordance with a previous report on the effects of instructed saccades on attentional modulation (Khayat et al., 2004). Our results suggest that, after an FS, V1 may receive prior information concerning the postsaccadic location of the “circle” and the “background,” thus allowing a faster recomputation of FG-m. This information may arrive directly or indirectly from other areas involved in predictive response to saccades (Duhamel et al., 1992; Umeno and Goldberg, 1997). It also supports theories proposing that information about segmentation is maintained across FSs (Duhamel et al., 1992; Melcher, 2007; Wurtz, 2008).
The temporal overlap of FG-m between the old and new locations suggests that, for a specific delay after the FS, the figure is represented simultaneously in two different locations. It is unlikely that both representations are integrated within single small receptive fields in V1. This dual representation might be forwarded to higher visual areas with larger receptive fields for spatial integration, or the representation in the old location can be suppressed in higher areas. Thus, V1 response may reflect a stable representation of FG segregation, originating from higher areas. These areas may hold a stable FG-m representation across FSs.
In conclusion, our results show that FG-m in V1 is recomputed after each FS; it follows the stimulus retinal displacement and “hops” within the V1 retinotopic map as dictated by the FS parameters. This part of the results clearly suggests spatial discontinuity of FG-m in V1. However, we also found that the FG-m was kept high along the timeline and stimulus presentation of the contour trials. This part of the results suggests a stable representation of FG-m across FSs in V1, possibly originating from high areas. In conclusion, our results suggest spatial discontinuity combined with higher-order temporal stability of FG segregation, which might be involved in our stable and continuous perception.
Notes
Supplemental material for this article is available at http://neuroimag.ls.biu.ac.il/supp/SuppGilad.pdf. There are two supplemental figures in the Supplemental Material file: Figure S1. FG-m is not enhanced during ocular drift. Figure S2. An example of the different gaze positions, ROIs, and population response across a full-length example trial. This material has not been peer reviewed.
Footnotes
This work was supported by Deutsche Forschungsgemeinschaft: Program of German–Israeli Project cooperation (DIP Grant 185/1-1) and the Israeli Center of Research Excellence (I-CORE) in Cognition (I-CORE Program 51/11). We thank Elhanan Meirovithz and Uri Werner-Reiss for help with the experiments and Guy Zurawel for help with the implementation of the 2D retinotopic model.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Hamutal Slovin, Gonda Multidisciplinary Brain Research Center, Bar-Ilan University, Ramat Gan, 52900 Israel, Hamutal.Slovin{at}biu.ac.il