Object recognition might be achieved by the recreation of a meaningful internal image from visual fragments. This recreation might be achieved by neuronal synchronization that has been proposed as a solution for the perceptual binding problem. In this study, we evaluated synchronization between the occipitotemporal regions bilaterally using electroencephalograms during several visual recognition tasks. Conscious recognition of familiar objects spanning the visual midline induced transient interhemispheric electroencephalographic coherence in the α band, which did not occur with meaningless objects or with passive viewing. Moreover, there was no interhemispheric coherence when midline objects were not recognized as meaningful or when familiar objects were presented in one visual hemifield. These data suggest a close link between site-specific interregional synchronization and object recognition.
- object recognition
- neuronal synchronization
- interhemispheric coherence
- perceptual binding
- visual recognition task
Neuronal synchronization among the spatially distributed cortical visual areas has been proposed as the mechanism for integrating (perceptual binding) the parallel pathways that process the different features of visual objects (Eckhorn et al., 1988; Gray and Singer, 1989; Gray et al., 1989). Previous studies suggested that the coherence between neuronal activities is the physiologic correlate of the segmentation of a unitary object from a visual scene (Engel et al., 1991b; Singer and Gray, 1995). In a cognitively demanding task, it is conceivable that only successfully bound features might be further processed as meaningful information. This selective enhancement of synchronization has been proposed as a substrate for visual recognition (Crick and Koch, 1990; Singer, 1998). Previous studies using human electroencephalograms (EEGs) showed the importance of EEG synchronization in various frequencies depending on the task (for review, see Basar et al., 1997; Andres and Gerloff, 1999;Klimesch, 1999; Tallon-Baudry et al., 1999). We investigated the role of EEG synchronization in an object detection task in humans by using frequency domain analysis.
In the first midline presentation experiment, subjects viewed either familiar or meaningless objects randomly presented at the center of the visual field with variable stimulus duration (see Fig. 1 A). To recognize familiar objects successfully, interhemispheric communication must occur to bind images between two visual hemifields that are processed separately in the two cerebral hemispheres; this may be achieved by synchronization. To test this hypothesis, we measured the amount of interhemispheric synchronization by the event-related analysis of EEG coherence after presentation of familiar and meaningless objects. Second, to determine whether this interhemispheric coherence, if any, is associated with perception or recognition, the effect of active attention was assessed by comparing the object judgment with the passive-viewing task. Third, to investigate whether this coherence correlates with the internal image or the physical attributes of external stimuli, the effect of stimulus duration on this coherence was systematically compared.
To understand the functional relevance of this coherence, it is important to know whether the interhemispheric coherence reflects the task-specific binding of visual information separated in the two hemispheres or just represents the cognitive and attentional demand of the object detection task itself. To exclude the latter possibility, we performed a second experiment using the same set of stimuli but presented either on the right or left side of the visual field. We expected that the interhemispheric coherence would not be necessary for the early stage of object recognition, when stimuli are presented in a hemifield and when one hemisphere handles all of the visual information.
MATERIALS AND METHODS
The experimental protocol was approved by the Institutional Review Board, and written informed consent was obtained from subjects before the experiment.
Midline presentation experiment. Eight subjects (three males and five females; age range, 25–58 years) viewed black-and-white drawings of either familiar or meaningless objects randomly presented at the center of the visual field (Fig.1 A). The target stimuli were presented between checkerboard-type masks (from −2 sec before the stimulus to 1.5 sec after the stimulus) to prevent an afterimage, and pseudo-randomized stimulus durations were 17, 34, 50, and 100 msec. Subjects were asked to tell whether or not they recognized a familiar meaningful object by pressing one of the two buttons with their right hand after the GO stimulus (judgment task, Fig. 1 C). Subjects were instructed to judge the target as a familiar object only when they were sure about what they saw. These instructions were given to minimize the number of false-alarm trials (meaningless objects recognized as familiar ones), and these trials were not analyzed further. For a control condition using the same set of stimuli, subjects were asked to respond to the GO stimulus by pressing the button, regardless of the preceding target (passive viewing).
Target stimuli were ∼9° of visual angle centered at the fixation point on a computer screen 1–1.2 m in front of the subject. Sixty familiar objects and corresponding meaningless objects were used. Meaningless objects were produced by swirling the familiar objects. Stimuli were presented randomly with four different durations. In the preliminary analysis, we confirmed that repetition of the same stimuli did not elicit any systematic difference in either hit rate or reaction time. The GO signal (duration, 100 msec) was given 1.5 sec after the target stimulus onset. The interval between trials was 3–5 sec. A total of 480 trials was divided into four sessions. Subjects performed eight randomized sessions of judgment and passive-viewing tasks. A delayed reaction time task was used to separate the sensory processing from the motor preparation.
Lateral presentation experiment. The second experiment using the same set of stimuli was performed on seven subjects (three males and four females; age range, 28–56 years), but stimuli were presented either on the right or left side of the visual field. Target stimuli, which were the same as those in the midline presentation experiment, were presented between the 9 × 18° rectangle masks either at the right or left side in randomized order. The stimulus duration was 50 msec. Other conditions were essentially the same as in the first experiment. Subjects were asked to fixate the center of the monitor and to discriminate the target irrespective of the stimulus location (60 meaningless and familiar objects on each side).
Recording and analysis. An EEG was DC–50 Hz filtered and recorded at 250 Hz from 29 surface electrodes over the scalp (Neuroscan, Neuroscan, Inc., Herndon, VA; Fig. 1 B). Digitally linked earlobe electrodes served as the reference. Trials were analyzed separately for correctly recognized objects (object recognition), correctly rejected ones (meaningless object), and missed ones (failed object recognition). For the midline presentation experiment, one subject who had a very high false-alarm rate was excluded from the analysis. Object recognition for the 17-msec-duration trials was not analyzed because the hit rate was not significantly different from the false-alarm rate (p = 0.31). Signals were segmented into −1 sec before to 3 sec after the target stimulus onset. Trials with eye movement or excessive muscle artifacts were removed by a careful visual inspection. For each condition, 15–20 (mean, 17.8) trials were analyzed.
For quantitative analysis of interhemispheric coherence, four electrode pairs of interest connecting the right and left occipitotemporal areas (Fig. 1 B, TP7, TP8, T5, and T6 electrodes) were selected on the basis of an a priori hypothesis because previous studies showed the importance of the lateral occipital area and the ventral “what” pathway for object recognition (Tanaka, 1993;Ungerleider and Haxby, 1994; Malach et al., 1995). EEG signals at the same four electrodes were analyzed for the quantitative power analysis. EEG segments from −512 to −256 msec and from −256 to 0 msec preceding the stimulus were pooled and used for the baseline. EEG segments were processed by a fast Fourier transform algorithm using a Hanning window at 10% of the edge. To observe the temporal pattern of the EEG power change, power spectra were log-transformed and normalized by setting the prestimulus baseline to zero [event-related power change (ERPow)]. Cross-spectra were calculated in the same way and normalized by the autospectra to compute coherence, according to the following equation: In this equation, fxx(j),fyy(j), andfxy(j) are values of auto- and cross-spectra at a given frequency j. Coherence is expressed as a real number between 0 and 1; 1 indicates a perfect linear association between two signals, and 0 indicates a complete absence of linear association. Coherence was normalized with an arc hyperbolic tangent transformation to stabilize the variance (Halliday et al., 1995). The vent-related coherence change (ERCoh) was computed by setting the prestimulus baseline to the zero level. Power and coherence spectra were estimated for each subject for each condition. ANOVA (main effects of object, task, and duration) was used for the statistical comparison. Where appropriate, a t test was also used.
Midline presentation experiment
On average, 79 ± 17% (mean ± SD) of meaningless objects were correctly rejected as meaningless, irrespective of the stimulus duration (Fig. 2). In contrast, the number of correct object recognitions was linearly correlated with stimulus duration (Pearson correlation r = 0.779;p < 0.001). For a duration of 17 msec, the correct detection rate for familiar objects were not significantly different from the false-alarm rate for meaningless objects (p = 0.31). Reaction time was 605 ± 173 msec that did not significantly vary for the different conditions.
Preliminary analysis of averaged event-related potentials (ERPs) revealed that the earliest cortical activity after the stimulus was a positive potential localized over the occipital cortex at 115 ± 14 msec (Fig. 1 C). The ERP difference between object recognition and meaningless objects was widely observed over the bilateral central, parietal, and temporal areas. ERPs showed larger positivity for object recognition (onset of the statistically significant difference determined by the t-map, 460 msec; the mean peak difference, 570 msec). Starting from the time of the initial positive potential, we computed the power and coherence spectra for every 256 msec time window; this enabled a 3.98 Hz frequency resolution. For the coherence analysis, the prestimulus baseline level was not significantly different among conditions (ANOVA, object,p = 0.1; task, p = 0.4; duration,p = 0.3). Compared with that of meaningless objects, the early phase (117–373 msec) of object recognition showed a coherence increase within the α band (8–12 Hz) predominantly over the posterior part of the brain, connecting the right and left sides (Fig. 3). A significant coherence increase during object recognition was not observed in other frequencies, although we computed the coherence spectra up to 50 Hz with a finer frequency resolution at 2 Hz (Fig. 1 D). Therefore, only the results in the α band are reported.
For the object recognition trials, average interhemispheric coherence at 117–373 msec increased significantly compared with the baseline (p < 0.001). Trials with meaningless objects, failed object recognition, and passive viewing of familiar objects did not induce similar effects (Fig.4 A). This interhemispheric coherence restricted to object recognition was transient and immediately followed the earliest visual cortical evoked potential. Stimulus duration had no effect on its magnitude (p > 0.4). Phase lag during the coherence increase (117–373 msec) was 2 ± 3 msec (mean ± SEM) from right to left, which is not significantly different from zero and is, therefore, a “near-zero” phase lag.
To investigate the possible effect of the linked earlobe reference on coherence estimate, we also computed the ERCoh by using a current source density (CSD) function of EEG (dipole derivation utility of the Neuroscan software), which can achieve a reference-free EEG signal with minimal volume conduction. Interhemispheric coherence at the electrode pairs of interest was 0.160 ± 0.013 and 0.215 ± 0.029 for baseline and early recognition, respectively (significantly different,p < 0.01).
For the power analysis, the prestimulus baseline EEG power in the α band during the judgment task was significantly smaller than that during the passive-viewing task (ANOVA, object, p = 0.9; task, p < 0.01; duration, p = 0.1). To concentrate on the change of the power induced by the task, ERPow was computed by setting the prestimulus baseline to zero. The α power significantly dropped at 373–629 msec during all conditions (p < 0.05). Because the EEG α rhythm is regarded as the “cortical idling rhythm,” we quantitatively analyzed the normalized power decrease as an index of cortical activation (Pfurtscheller and Aranibar, 1979). In both judgment and passive-viewing tasks, the decrease of power was larger for familiar objects compared with meaningless objects (main effect, object,p < 0.001; task, p < 0.001; interaction, object × task, p > 0.1). In agreement with the task performance during the judgment task, the difference between familiar and meaningless objects was significant when the stimuli were presented for 34, 50, or 100 msec in duration but not for 17 msec (post hoc test with Bonferroni correction; p < 0.05, p < 0.005,p < 0.005, and p = 0.5, respectively). For familiar object recognition trials, the magnitude of the EEG power decrease at 373–629 msec was linearly correlated with the stimulus duration (Fig. 5, p < 0.001).
Lateral presentation experiment
Task performance was not different for the side of stimuli. The mean proportion of correct object recognition was 80 ± 10%, and that of correctly rejected meaningless objects was 86 ± 13%. Reaction time was 569 ± 144 msec on average.
Event-related coherence and power analysis were performed as described in the first experiment using a 512 msec time window. Prestimulus baseline coherence and power levels during the lateral presentation experiment were not significantly different from those during the judgment task of the first experiment (p = 0.1, and p = 0.3, respectively). No interhemispheric coherence increase was correlated with object recognition in this experiment (Fig.6). In contrast, the occipitotemporal EEG power started to decrease at 373 msec, and the amount of decrease (373–885 msec) was larger for the object recognition compared with meaningless objects (p < 0.001). There was no significant laterality of the power decrease for either stimulus side.
We found that the dynamic change of interhemispheric coherence in the α band is associated with the recognition of familiar objects presented at the center of the visual field. The difference in the coherence estimate between object recognition and passive viewing of familiar objects suggests the link between a level of neural synchrony and a visual recognition. Previous animal studies suggested that the perception of a unitary object might be associated with increased coherence (Engel et al., 1991b; Singer and Gray, 1995). In our experimental paradigm, it may be argued that even the swirled object can be perceived without being recognized as a familiar object. Thus, it is possible that perceptual binding could occur within and between the primary visual areas (V1) even for meaningless objects or during passive viewing. However, detection of these localized binding signals by a noninvasive macroscale technique, such as EEG coherence, might be difficult because of the anatomical location of V1 (medial site of the occipital cortex). On the other hand, it is likely that object recognition requires synchronous activation of a larger cortical network extending to the temporal and lateral occipital areas, which might correlate with the coherence increase in the present experiment. During binocular rivalry in monkeys, neural activity at only the higher but not the lower level visual areas correlated with conscious perception (Leopold and Logothetis, 1996; Pigarev et al., 1997;Sheinberg and Logothetis, 1997). Human-neuroimaging studies that measured the cerebral blood flow change have shown that active visual attention enhances local activation in the striate and extrastriate cortex (Moran and Desimone, 1985; Shulman et al., 1997; Kastner et al., 1998). It is conceivable that visual recognition also enhances the task-specific neuronal coupling or interregional coherence.
Coherence computed from the scalp-recorded referenced EEG has often been criticized because of the possible contamination of the apparent coherence change associated with volume conduction and the common reference effect (Mima and Hallett, 1999; Mima et al., 2000). Electrodes of interest over the right and left hemispheres were, however, at least 16 cm apart in the present study (Fig.1 B), which enabled us to eliminate the volume conduction effect (Nunez et al., 1997). To remove the possible inflation of coherence caused by the common earlobe reference, we used a reference-free EEG (CSD computation). This procedure also enabled the sharp spatial filtering, excluding the possible contamination of a single widespread source affecting both hemispheres. Because the same results could be obtained by the CSD, our data cannot be explained by contamination of the earlobe reference activity or by volume conduction.
Because the conventional potential analysis showed no significant difference at the time when the coherence increase occurred (117–373 msec), contamination of the evoked potential to the computation of coherence is not likely. Thus, our study should reflect the induced but not the evoked α activity. For the same reason, the phase resetting associated with the stimuli (Jansen and Brandt, 1991; Haig and Gordon, 1998) is not the main generator mechanism of this coherence.
The coherence change was largest at the pairs connecting the right and left temporal areas. It is unlikely that a subcortical central EEG source would generate this coherence pattern connecting limited focal areas. At the time when this coherence was shown, the magnitude of the α power stayed at the same level as baseline, which excludes the possibility of the emergence of a strong widespread subcortical rhythm generator or the elimination of incoherent sources projecting to the right and left hemispheres differently. It is more likely that neuronal synchronization during object recognition was achieved by modulating the temporal structure of the oscillations without changing the power in that particular frequency band (Fries et al., 1997).
In agreement with a previous study (Vanni et al., 1997), the α power decrease probably reflects the subjective meaning of the object as well as the visual stimulus per se. Stimuli duration significantly affected this power change but not the coherence. This finding supports the idea that coherence reflects the endogenous process associated with object recognition. We also confirmed that the power change for failed object recognition was similar to that associated with meaningless objects (Vanni et al., 1997). We found that the meaning of stimuli modulates the power even during passive viewing, possibly reflecting attention capture.
The topographic pattern of the EEG power decrease in both experiments was not lateralized, suggesting the activation of a large cortical network associated with a later phase of recognition that is common for both experiments. The second experiment provides direct evidence that the transient increase of interhemispheric coherence exclusively correlates with the processing of visual stimulus spanning the midline of the visual field.
Several lines of evidence have suggested the functional relevance of α and γ band oscillation in human recognition (Basar et al., 1997; Klimesch, 1999; Knyazeva et al., 1999; Miltner et al., 1999; Rodriguez et al., 1999; Tallon-Baudry et al., 1999). Recent studies have shown the interaction between the α and γ band oscillation (Bringuier et al., 1992; Chatila et al., 1992; Young et al., 1992; Schanze and Eckhorn, 1997; Brecht et al., 1998). More recently, it was proposed that nonmoving static objects might be partly represented by the α band oscillatory activity (Sewards and Sewards, 1999). Most of the studies in neuronal synchrony within the γ band have used a moving bar as a stimulus object (Eckhorn et al., 1988; Gray et al., 1989; Gray and Singer, 1989), and the recent human EEG study using “Mooney” faces excluded the α band from the analysis (Rodriguez et al., 1999). Our study provides direct support for the hypothesis that α band coherence plays an important role in (static) object recognition. In our experimental design, it is possible that part of the γ band activity might be diminished by the low-pass filter at 50 Hz and that the transient γ band coherence might be smeared out by the analysis time window of 256 msec.
Previous human studies have suggested the importance of coherence at other frequencies such as 4–8 Hz (Sarnthein et al., 1998; Srinivasan et al., 1999) and 13–20 Hz (Classen et al., 1998; Andres et al., 1999;von Stein et al., 1999). Because these studies did not use an event-related analysis, our results, which represent the dynamic aspects of human recognition, cannot be directly compared with them.
To clarify the temporal profile further, we computed the coherence change every 64 msec using a 128 msec time window (Fig.4 B). Because temporal and frequency resolutions are trade-offs, we focused at 4–12 Hz, centering at 8 Hz. Onset latency of the coherence increase was as early as 64–192 msec, whereas the cognition-related power change began at 192–320 msec. Thus, the interhemispheric coherence might reflect the earliest stage of active attention and recognition. The processing of incoming sensory information can primarily be divided into two systems: preattentive and attentive (Neisser, 1967). The preattentive system can be activated by the rare stimuli in the absence of active attention and is preperceptual, such as echoic or iconic memory. In spite of its very short latency, interhemispheric coherence might be the first gate of an active attention system because such coherence was not observed during passive viewing. It is also important to note that familiar objects caused a larger EEG power change even during passive viewing. It is possible that the EEG α power is partly modulated by the preattentive system. Increased coherence was not followed by a transient decrease. Because it has been hypothesized that the role of active desynchronization is the neuronal correlate of a transition/punctuation from perception to action (Varela, 1995; Rodriguez et al., 1999), it is likely that this desynchronization, if any, might be smeared in the delay period between the target and the GO signal in our experiment.
In regard to the neuronal substrate of interhemispheric coherence, our observation of a near-zero phase lag and the possible relevance of a thalamocortical circuit in the generation of EEG α rhythm might suggest the contribution of subcortical structures in producing the interhemispheric coherence (Da Silva et al., 1980; Steriade and Llinas, 1988; Munk et al., 1996). There is evidence that long-range zero-phase lag cortical synchrony does not necessarily require a common subcortical drive (Konig et al., 1995; Traub et al., 1996; Roelfsema et al., 1997). A recent study showed that long-range synchronization within the thalamus is controlled by corticothalamic feedback (Contreras et al., 1996). Lesion studies in split-brain patients and animal models suggested that the corticocortical connections via the corpus callosum might be essential for generating interhemispheric coherence (Engel et al., 1991a; Munk et al., 1995; Nowak et al., 1995;Brazdil et al., 1997). Therefore, it is likely that neuronal networks including corticocortical connections, thalamocortical projections, and cortical modulation of thalamic activity are all involved in generating corticocortical coherence.
We thank Dr. H. Shibasaki for helpful comments and discussions, S. Thomas-Vorbach for expert technical assistance, and D. G. Schoenberg for skillful editing.
Correspondence should be addressed to Dr. Mark Hallett, National Institutes of Health, National Institute of Neurological Disorders and Stroke, Building 10, Room 5N226, 10 Center Drive, MSC-1428, Bethesda, MD 20892-1428. E-mail:.
T. Mima's present address: Department of Brain Pathophysiology, Human Brain Research Center, Kyoto University Graduate School of Medicine, Shogoin, Sakyo-ku, Kyoto 606-8507, Japan.