Abstract
When stimulus information enters the visual cortex, it is rapidly processed for identification. However, sometimes the processing of the stimulus is inadequate and the subject fails to notice the stimulus. Human psychophysical studies show that this occurs during states of inattention or absent-mindedness. At a neurophysiological level, it remains unclear what these states are. To study the role of cortical state in perception, we analyzed neural activity in the monkey primary visual cortex before the appearance of a stimulus. We show that, before the appearance of a reported stimulus, neural activity was stronger and more correlated than for a not-reported stimulus. This indicates that the strength of neural activity and the functional connectivity between neurons in the primary visual cortex participate in the perceptual processing of stimulus information. Thus, to detect a stimulus, the visual cortex needs to be in an appropriate state.
- cortical state
- multiunit recording
- neurophysiology
- V1
- primary visual cortex
- figure–ground
- perception
- visual processing
- monkey
Introduction
Vision depends on the cerebral cortex. Retinal information enters the primary visual cortex (V1) via the thalamus and, from there on, is transferred to the higher visual areas of the cerebral cortex. At these higher levels, more and more elaborate processing of the visual information occurs. In addition, feedback connections (Mignard and Malpeli, 1991; Salin and Bullier, 1995) and horizontal connections (Gilbert, 1993) provide recurrent interactions between localized, low-level information and more global, high-level information (Lamme and Roelfsema, 2000). These kinds of interactions have been identified as the basis for various modulatory influences on the neuronal activity in V1 (Gilbert and Wiesel, 1990;Zipser et al., 1996; Hupé et al., 1998). The observed modulations often reflect relatively high-level perceptual attributes of the stimuli that fall within the small receptive fields of the neurons. For example, perceived brightness (Rossi et al., 1996), perceptual grouping (Kapadia et al., 1995), or figure–ground segregation (Lamme, 1995) may modify the response to an otherwise identical receptive field stimulus.
We reported recently a direct link between these modulations and the animal's percept. More specifically, we showed that, when figure–ground modulation in V1 does not occur, the animal does not perceive the figure (Supèr et al., 2001). Assuming that these modulations depend on recurrent interactions within V1 and between V1 and higher areas (Lamme et al., 1998), this suggests that the proper occurrence of recurrent interactions determines whether a stimulus is processed up to a perceptual level. The question that we address here is what prevents the normal evolution of such interactions, resulting in the animal's failure to report the stimulus.
Psychophysical studies in humans show that a stimulus remains unnoticed during specific states of the subject, such as inattention or absent-mindedness (Rock et al., 1992; Block 1996). This implies that the success of stimulus detection depends on the state of the subject, i.e., on the internal state of the visual cortex. In this study, we tested monkeys in a figure–ground detection task and we analyzed the neural activity in the primary visual cortex before the appearance of the stimulus. This allowed us to determine the influence of the state of the cortex on stimulus detection without the interference of stimulus-evoked activity. We show that, for a detected stimulus, the preceding neural activity was stronger than for a not-detected stimulus. In addition, the amount of synchrony between neurons was stronger before correctly reported stimuli than before not reported stimuli. Thus, the strength of neural activity and the functional connectivity between neurons in the primary visual cortex predict whether a stimulus will be perceived or not. Apparently, an appropriate internal state of the primary visual cortex is essential for the processing of stimulus information up to a perceptual level.
Materials and Methods
Stimulus and task. Stimuli were presented on a 21 inch monitor screen driven by TIGA software. The display resolution was 1024 × 768 pixels, and the refresh rate 72.3 Hz. The monkey was seated in a primate chair and placed in a dark room 75 cm from the monitor screen. The screen subtended 28 × 21o of visual angle. In each trial (Fig.1), a textured figure of (3 × 3o square), defined by a difference in line orientation, was randomly presented at one of three possible locations at an eccentricity of 2.74–4.4ofrom the fixation point (a central red spot of 0.2o). Before the appearance of the stimulus screen, the screen consisted of randomly orientated line segments (prestimulus screen). Onset of figure-present trials consisted of the abrupt transition from this texture of randomly oriented line segments into a texture of oriented line segments with a 90o orientation difference between figure and ground. On catch trials, all line segments had the same orientation, so that no figure appeared. Line segments were 16 × 1 pixels (0.44 × 0.027o), and the density was five line segments per square degree.
Two monkeys (Macaca Mulatta) were trained to fixate at the fixation point on the monitor and to make a saccade toward the figure location, as soon as it appeared (300 msec after fixation onset) or to maintain fixation when no figure appeared (catch trials; 20% of the trials). Trials in which the eye left the fixation window (1 × 1o) before figure appearance were aborted and discarded. After stimulus appearance, figure-present trials were considered correct (“reported”) when a saccade entered the figure window (at the approximate size and position of the figure) within 500 msec. Otherwise the trial was incorrect (“not reported”). Catch trials were considered correct when the animal maintained fixation for 500 msec after stimulus onset and incorrect when the fixation window was left before 500 msec. Eye movements were monitored using scleral search coils with the modified double-magnetic induction method and digitized at 400 Hz. (Bour et al., 1984).
Recordings and data analysis. Multiunit neural activity was recorded through microwire electrodes (16 electrodes per animal, selected out of ∼40 implanted ones, impedances of 100–350 kΩ, at 1000 Hz) that were surgically implanted into the operculum of area V1. The obtained signals were amplified (40,000×), bandpass filtered (750–5000 Hz), full-wave rectified, and then low-pass filtered (<200 Hz). The resulting low-frequency signal represents the amount of spiking activity (Legatt et al., 1980). Before the experiments, aggregate receptive field size and position at each electrode was determined, using moving bars. Receptive field size ranged from 0.4 to 1.0°, and eccentricity ranged from 1.25 to 5.7°. For each monkey, figure positions and electrodes were chosen such that the figure in one location covered the receptive fields of the 16 electrodes simultaneously (“figure” condition). Therefore, many recorded neurons had overlapping receptive fields (Supèr et al., 2001). In the other two figure locations, the receptive fields were covered by ground (“ground” condition). For the analysis, we averaged the data obtained from all figure positions, i.e., both figure and ground conditions.
We subtracted the DC component (average baseline activity from 0 to 30 msec after stimulus onset) from the responses. Moreover, the average responses at each electrode were normalized; at each electrode, the responses were divided by a constant factor, which was the maximum response found for any of the conditions (i.e., correct, incorrect, figure, ground, etc.), obtained within a 500 msec recording period, starting from 300 msec before stimulus onset until 200 msec after stimulus onset (to avoid contamination attributable to saccades). This way, each electrode contributed equally to the population average, yet relative differences between conditions were maintained despite the normalization. Data were obtained during several sessions, and figure-present trials were randomly interleaved with 20% catch (figure-absent) trials.
For the analysis of coherent activity, two-dimensional cross-correlograms with time versus lag on the x-axis andy-axis and correlation strength on the vertical (color) axis [joint peristimulus time histograms (J-PSTHs)] were calculated. The activity from an electrode i is represented asSi r (t) for the r-th trial (Brody, 1999). P represents the averaged response or PSTH. We calculated a matrix of covariances for all combinations of electrodes averaged over all trials. Shuffle-corrected covariance matrices are represented as follows: This denotes the averaged (over all trials, r) cross-product of the responses from electrode i andj, minus the cross-product of the averaged responses. The cross-product of the PSTHs has been termed the shuffle predictor and is used to reduce common input attributable to the stimulus (Palm et al., 1988). This equation is also known as the un-normalized JPSTH (Aertsen et al., 1989; Brody, 1999) and was used in this study for the calculation of the correlations. To reduce covariances attributable to common changes in excitability, we also subtracted the DC from each individual response. The time-dependent SE of the electrode response i is deduced from the auto-covariance matrix ofi: This is the square root of the values on the main diagonal of the auto-covariance matrix. Normalized covariance matrices or normalized J-PSTHs can then be defined as follows: In this equation, division with the cross-product of the time-dependent SDs of the i-th and j-th electrode is used to normalize the covariance matrix to obtain a cross-correlation matrix.
Results
Stimulus detection
Monkeys were trained to report the presence or absence of a figure in a figure–ground detection task. Animals fixated a small central red dot on the prestimulus screen, which consisted of a texture of randomlyorientated line segments (Fig.1A). At stimulus onset (t = 0 msec), a textured figure–ground display (figure-present trials) appeared, and the monkey was rewarded after making a saccade toward the location of the figure (Fig. 1B). In catch trials, no figure was presented, and the monkey was rewarded when maintaining fixation for 500 msec. This way, the monkey could report that he either perceived or did not perceive the stimulus (Moore et al., 1995; Supèr et al., 2001).
On average, the detection performance as measured by d′ was 2.35. The animals made a correct eye movement (hits, reported condition) in 84% of the figure-present trials and failed to report the figure (misses, not-reported condition) in 16% of the figure-present trials. Total number of hits was 4948, and total number of misses was 954. In these not-reported figure-present trials, i.e., the misses, the monkey either maintained fixation or made an incorrect eye movement (and, in that sense, the term miss is not used in the standard Signal Detection Theory connotation). In the figure-absent (catch) trials, the animals scored 88% correct (correct rejections) and 12% incorrect (false alarms). Although the location of the figure varied across trials, important factors for detection, such as its eccentricity or shape, or the difference in texture orientation between figure and background, were the same on every trial. Therefore, failure to detect the figure did not relate to the stimulus (e.g., the saliency of the figure) but was attributable to the subject.
Prestimulus responses and stimulus detection
To understand how the internal state of the subject affects sensory processing, we analyzed neural activity of the primary visual cortex just (300 msec) before the onset of the figure–ground stimulus for both reported and not-reported trials. Analyzing the prestimulus responses allowed us to study the state of the cortex in relation to stimulus detection without the interference of stimulus-evoked activity. During the first 200 msec after the start of fixation, the average strength of neural activity was similar for reported and not-reported figure-present trials (p = 0.72; Mann–Whitney U test) (Figs.2, 3A). However, starting 100 msec before the appearance of the stimulus, the activity for the reported figure-present trials increased (compared with the activity during the 200 msec after fixation) and, moreover, was significantly stronger than for the not-reported figure-present trials (p < 0.01; Kruskal–Wallis test) (Figs. 2, 3B). Thus, the strength of the neural activity just before the presentation of a stimulus relates to whether that stimulus will be reported or not.
In a previous study, we showed that the late part of the stimulus-evoked response also correlates with the detection of the figure–ground stimulus, whereas the early, transient response does not; more specifically, we showed that, when figure–ground modulation (which is a difference in response to figure elements compared with ground elements, starting at ∼80 msec) is absent, animals do not detect the figure. Here, we averaged over figure and ground responses (because we used all figure positions), and figure–ground modulation cannot be observed (for that analysis, see Supèr et al., 2001). To know whether the prestimulus responses are equally strong for all stimulus conditions, we analyzed the responses for each stimulus condition separately. These results show that the strength of the prestimulus activity is not significantly different between the figure and ground figure present trials (p = 0.68; Mann–Whitney U test) (Fig.4A,C).
Analysis of the figure-absent (catch) trials showed that, for one animal (“Uri”), the prestimulus activity for correct catch trials (in which fixation was maintained) was significantly stronger than for incorrect catch trials (p = 0.001; Mann–Whitney U test) (Fig.4B,C). For the other animal, we observed no difference. A possible explanation could be that the latter animal frequently went through periods of low arousal or inattention and then merely maintained fixation. This would have resulted in many correct catch trials and, correspondingly, in many not-reported figure-present trials. Assuming that higher than default prestimulus activity is needed for proper perception, this would then weaken the prestimulus effects for the catch trials but not for the figure-present trials.
To further support the contingency between the strength of the prestimulus activity and behavioral performance, we sorted prestimulus activities of all trials into ascending order and then divided them into eight bins, each containing the same number of trials. Each trial within a bin is associated with both the strength of prestimulus activity and the task-related behavioral data (i.e., figure-present or figure-absent trials, and correct or incorrect responses). For each bin, we computed the average prestimulus activity and computedd′ (Green and Swets, 1988; Ress et al., 2000). These results show that the prestimulus activity was quantitatively related to behavioral performance (Fig. 5). Linear regression showed a significant positive relationship (for each animal,p < 0.05) between strength of prestimulus activity andd′. This only applied to the activity immediately (−100 to 0 msec) preceding stimulus onset. The early prestimulus activity (−300 to −100 msec before stimulus onset) showed no correlation with performance.
A relationship between prestimulus and poststimulus activity
Our present and previous (Supèr et al., 2001) findings combined suggest that there are periods before stimulus onset (−100 to 0 msec) and after stimulus onset (>100 msec) that both correlate with figure detection, with a period in between (the response transient, 30–100 msec) that shows much less correlation with perception. This suggests that there might be a relationship between these two periods of neural processing. We therefore conducted a correlation analysis between the different prestimulus and poststimulus intervals. We computed, per trial, the average prestimulus activity (−100 to 0 msec) and the activity during two poststimulus periods. The findings show (Fig. 6) that the strength of the prestimulus activity correlates better with the late part of the stimulus-evoked response (100–200 msec) than with the early, transient response (0–100 msec). Thus, the early part of the response is dominated by the stimulus, whereas the late response reflects the influence of both the stimulus and the internal state of the animal.
Motivation of the animal and stimulus detection
It is possible that motivation of the animal influenced performance. Animals could be better motivated at the start of the recording session than at the end, when satiated with rewards or tired. If that were influencing our results, one would expect most of the not-reported trials at the end of a recording period. Therefore, we analyzed the performance of the figure-present trials throughout the recording sessions. We divided each recording session (34 in total) into 20 equal time bins, counted the number of hits (reported figure-present trials) and misses (not-reported figure-present trials) per bin, and calculated percentage correct. Through these data points, we fitted a linear regression line to determine a possible correlation between accuracy and session time. These results show that the performance remained at a constant level throughout the recording session (Fig. 7A), which indicates that the motivation of the animal remained constant during the experiment. This finding therefore indicates that the failure to detect the stimulus is not attributable to monotonic changes in the motivational state of the animal. In addition, we analyzed the strength of the prestimulus activity in data collected during the first, middle, and final thirds from each session (Fig. 7B). These results show that the increase in prestimulus activity for reported figure-present trials, compared with the prestimulus activity for not-reported trials, is not significantly different for the three periods (p > 0.1; Kruskal–Wallis test). The performance remained constant during these three periods (83, 86, and 82% correct, respectively). These observations agree with the suggestion that the difference in prestimulus activity between reported and not-reported figure-present trials is not attributable to the monotonic changes in the animal's motivation.
Eye movements
An additional potential concern was that the position or movements of the eyes could differ in some subtle respect during the fixation period and that this caused the differences in the prestimulus response. To control for differences in fixation behavior, we separated the eye movements of the figure-present trials according to response type (reported and not-reported) and analyzed the 100 msec interval before stimulus onset. We calculated the SDs of the x andy coordinates of the eye position during each trial. The higher this value, the less accurate fixation was maintained. The results show that the average SDs for reported and not-reported figure-present trials were not different [horizontal, 0.073 (reported trials), 0.064 (not-reported trials); vertical, 0.087 (reported trials), 0.089 (not-reported trials); p > 0.2; Kruskal–Wallis test]. In addition, no significant difference in the mean eye position and the SD of the average eye position was found between correct and incorrect responses [mean ± SD; horizontal, 0.056 ± 0.036 (reported trials), 0.059 ± 0.052 (not-reported trials); vertical, 0.080 ± 0.050 (reported trials), 0.086 ± 0.057 (not-reported trials); p > 0.05; ANOVA]. Therefore, these observations indicate that differences in prestimulus activity between reported and not-reported trials are not the result of differences in eye movements.
Synchrony and stimulus detection
To test whether the changes in prestimulus firing rate were accompanied by other manifestations of a change in cortical state, we investigated the functional connectivity between neurons by calculating the strength of correlated firing over time. We constructed auto- and cross-covariance matrices for each possible pair of recording sites. By normalizing these matrices with the cross-product of the SDs of the PSTHs, we obtained J-PSTHs. These J-PSTHs show the variation in time of the strength of the correlation between two neurons (Aertsen et al., 1989; Vaadia et al., 1995). The diagonal of such a matrix represents the correlation strength at zero-time lag, and the points above and below this diagonal represent positive and negative time delays between the two neurons. Analysis of these J-PSTHs revealed that before the presentation of the stimulus, correlated activity between neurons was observed for both the reported and not-reported conditions, whereas after stimulus onset, these correlated responses tended to reduce or disappear (Fig. 8). Thus, correlated firing of neurons in the primary visual cortex was present before the onset of the figure–ground texture.
To quantify the strength of these prestimulus correlations, we calculated the average cross-correlation functions of the 100 msec period before stimulus onset. The strength of the correlation was estimated by the area under the peak within a window ranging from −12.5 to 12.5 msec lag. Of all possible electrode pairs (120 per animal), 96% had significant (p < 0.01) positive correlation (mean ± SEM correlation strength; monkey “Toni,” 0.065 ± 0.003; monkey “Uri,” 0.039 ± 0.002) before stimulus onset in correctly reported figure-present trials. Before not-reported trials, only 85% of the electrode pairs showed significant (p < 0.01) positively correlated activity (mean ± SEM correlation strength; monkey “Toni,” 0.052 ± 0.005; monkey “Uri,” 0.034 ± 0.005). In addition, the strength of prestimulus correlation was lower for not-reported trials than for reported trials (Fig.9). Of all the electrode pairs, 73% showed stronger correlation in the reported condition than in the not-reported condition (Toni, p < 10−11; Uri, p < 10−6; paired t test). The average cross-correlation functions thus show that, during the prestimulus period, neural activity is more synchronous in the reported trials than in the not-reported trials.
Discussion
In summary, our present results indicate that both the strength and the amount of synchrony of neural activity in the primary visual cortex during a ∼100 msec period before the appearance of a behaviorally important stimulus predicts whether that stimulus will be detected or not. Note that these are independent results, because we calculated the amount of synchrony by normalizing the correlation coefficients for overall activity level. Together, our results provide direct evidence for the idea that the internal neural state of the subject at the moment of stimulus arrival affects subsequent stimulus detection.
It has been shown that states of attention and expectancy (Egeth and Yantis 1997; Pashler, 1998) are accompanied by a general increase of synchronous neural activity and neural responses (Cardoso de Oliveira et al., 1997; Riehle et al., 1997; Steinmetz et al., 2000; Ress et al., 2000; Fries et al., 2001). In addition, different states of arousal have been associated with different firing patterns of cortical neurons (Evarts, 1964; Steriade et al., 1993), and dynamical switches between these states have been shown to occur in the awake animal (Nunez, 1995;Sherman, 2001). The present results are in accordance with these findings, in the sense that attention, arousal, or an interaction between these two (Coull, 1998) may explain our observations. Note, however, that our effects are not location specific: enhanced and more synchronous activity at the site of the recording electrodes promotes the detection of a stimulus at any location. Variations in focused attention therefore do not explain our results. It is more likely that spontaneous variations in the general state of the cortex underlie our findings.
We found that the activity immediately (∼100 msec and not earlier) preceding the onset of the stimulus was related to the animal's perception of that stimulus. This indicates that a switch in cortical state occurs within a relatively short time frame (much shorter than a single trial or else we should have found activity to be related already at fixation). Apparently, the visual cortex has to quickly attain an appropriate state before the stimulus information enters the cortex. Enhanced and synchronized activity preceding stimulus presentation results in correctly detected stimuli in which the failure to detect the stimulus follows a moment of reduced and less synchronized activity. The failure to develop an appropriate cortical state may thus represent a neurophysiological correlate of a moment of inattention or reduced expectancy or a state of low arousal of the subject. This deviates a little from the standard concept of arousal. Changes in states of arousal as measured by EEG recordings, for example, generally last for longer time periods (seconds to hours) (Atlas Task Force of the American Sleep Disorders Association, 1992;Coull, 1998; Drinnan et al., 1998). This is quite at odds with the fast changes in cortical state that are shown here. These are more in line with temporal changes in EEG activity that have been associated with changes in attention and discrimination (Vogel and Luck, 2000; Arnott et al., 2001; Bastiaansen and Brunia, 2001) or with the dynamical switches in neural spiking behavior (bursting vs tonic) that have been shown to occur in the thalamocortical circuit of awake animals (Sherman, 2001).
Recently, we reported about the neural activity recorded after onset of the figure–ground display in this paradigm (Supèr et al., 2001). We discerned two stages of processing after stimulus onset: the one dominated by the early (<100 msec) response transient, the other occurring at relatively longer latencies (> 100 msec). The early stage is associated with feedforward processing and early feature extraction, and the later stage is associated with recurrent processing and higher-level visual processes such as perceptual grouping and segmentation (Lamme and Roelfsema, 2000). For example, at a latency of ∼100 msec, V1 single and multiunit responses are stronger when the line segments within the receptive field of the neurons belong to the figure compared with when they belong to the background, a phenomenon termed contextual modulation (Lamme, 1995; Zipser et al., 1996; Lamme et al., 1999).
In our previous study, we found that early stimulus driven activity (0–100 msec) did not relate to whether the figure was seen or not seen. However, when contextual modulation was absent, animals did not see the figure (Supèr et al., 2001). Also, contextual modulation is selectively suppressed in anesthetized animals, although responses remain selective for low-level features such as orientation of texture bars (Lamme et al., 1998a). Like the prestimulus activity reported here, late-onset contextual modulation thus relates to the processing of visual information up to a perceptual stage. A difference between the two, however, is that higher and more synchronous prestimulus activity promotes figure detection at any location, whereas figure–ground contextual modulation is confined to the region of the figure (Lamme, 1995; Lamme et al., 1999). In contextual modulation, the response to figure elements is enhanced compared with background elements when a figure is perceived, although this enhancement is absent when a figure is not perceived (Supèr et al., 2001). Apparently, V1 activity related to detection performance is initially not confined to a particular spatial region but becomes spatially selective during the late period of the stimulus-evoked response. Whether these prestimulus and poststimulus response modulations represent similar or related neural mechanisms remains to be investigated.
The late stimulus-driven response modulations representing figure–ground segregation have been conjectured to depend on horizontal connections within V1 and feedback connections between V1 and higher visual areas (Payne et al., 1996; Lamme et al., 1998b; Wang et al., 2000). On that basis, we suggested that perception depends strongly on recurrent interactions between visual areas (Lamme, 2000;Supèr et al., 2001). Taking these and the present results together, it appears that the different states of the brain preceding stimulus onset (receptive vs unreceptive, so to say) have little or no effect on the early activity that is evoked by the stimulus but are specifically associated with the occurrence of later recurrent interactions between areas, reflecting figure–ground perception. This idea is supported by the finding that anesthesia has relatively little effect on feedforward responses in V1, reflecting receptive field tuning properties, whereas figure–ground modulation is abolished by anesthesia (Lamme et al., 1998a).
Footnotes
H.S. is supported by a grant from Medical Sciences, which is subsidized by the Netherlands Organization for Scientific Research. We thank Kor Brandsma and Jacques de Feiter for biotechnical support and Peter Brassinga and Hans Meester for technical assistance.
Correspondence should be addressed to Hans Supèr, Graduate School Neurosciences Amsterdam, Department Visual System Analysis, Academic Medical Center, University of Amsterdam, P.O. Box 12011, 1100 AA Amsterdam, The Netherlands. E-mail: h.super{at}ioi.knaw.nl.