Abstract
The coherent representation of an object in the visual system has been suggested to be achieved by the synchronization in the γ-band (30–70 Hz) of a distributed neuronal assembly. Here we measure variations of high-frequency activity on the human scalp. The experiment is designed to allow the comparison of two different perceptions of the same picture. In the first condition, an apparently meaningless picture that contained a hidden Dalmatian, a neutral stimulus, and a target stimulus (twirled blobs) are presented. After the subject has been trained to perceive the hidden dog and its mirror image, the second part of the recordings is performed (condition 2). The same neutral stimulus is presented, intermixed with the picture of the dog and its mirror image (target stimulus). Early (95 msec) phase-locked (or stimulus-locked) γ-band oscillations do not vary with stimulus type but can be subdivided into an anterior component (38 Hz) and a posterior component (35 Hz). Non-phase-locked γ-band oscillations appear with a latency jitter around 280 msec after stimulus onset and disappear in averaged data. They increase in amplitude in response to both target stimuli. They also globally increase in the second condition compared with the first one. It is suggested that this γ-band energy increase reflects both bottom-up (binding of elementary features) and top-down (search for the hidden dog) activation of the same neural assembly coding for the Dalmatian. The relationships between high- and low-frequency components of the response are discussed, and a possible functional role of each component is suggested.
Oscillatory synchronization has been suggested to characterize a neuronal assembly coding for an object; neurons distributed across different brain areas would synchronize their firing in the 30–70 Hz range (γ-band) (von der Malsburg and Schneider, 1986; Singer, 1993). There is growing evidence for stimulus-specific high-frequency oscillatory events in the visual cortex of the anesthetized cat (Eckhorn et al., 1988; Gray and Singer, 1989; Gray et al., 1989, 1990, 1992; Brosch et al., 1995; Freiwald et al., 1995) and of the awake monkey (Eckhorn et al., 1993; Frien et al., 1994; Kreiter and Singer, 1996). All of these studies report oscillatory events non-phase-locked to stimulus onset (oscillatory events appearing with a latency jitter and thus disappearing in averaged data) and support the idea of a role of γ-band oscillatory synchronization in feature binding. Still, the existence and the functional role of these oscillatory synchronizations remain controversial (Ghose and Freeman, 1992; Tovee and Rolls, 1992; Young et al., 1992).
In humans, stimulus-related, non-phase-locked, high-frequency oscillatory activities have been observed both in electroencephalographic (EEG) and magnetoencephalographic (MEG) studies, in the auditory modality (Jokeit and Makeig, 1994), in motor tasks (Kristeva-Feige et al., 1993; Nashmi et al., 1994; Pfurtscheller et al., 1994), in a somatosensory task (Desmedt and Tomberg, 1994), and in a lexical decision task (Lutzenberger et al., 1994;Pulvermüller et al., 1996). In the visual modality, γ-band responses seem to correlate with stimulus coherency (Lutzenberger et al., 1995; Tallon et al., 1995; Tallon-Baudry et al., 1996), providing further support to a possible functional role of γ-band activity in feature binding.
The experiment presented here is designed to allow the comparison of two different perceptions of the same picture. We used for this purpose the well-known Dalmatian picture, slightly modified to make the dog more difficult to recognize without learning (Fig.1). In the first condition, subjects were presented a picture they considered to be made of meaningless blobs. In the second condition, they had been instructed on the presence of the hidden dog and searched actively for it. A neutral stimulus, which is presented in both conditions but the perception of which does not change, allows us to isolate effects corresponding to the active search for the Dalmatian in the second condition. Preliminary results have been published in abstract form (Tallon-Baudry et al., 1995).
The six stimuli used. In the first condition (left column), subjects were presented three types of stimuli: a neutral stimulus (meaningless blobs), an unperceived dog (Dalmatian, with its head to the right and tail to the left, unperceived as a dog because subjects were not instructed of its presence), and a target stimulus, made of meaningless twirled blobs. Before the beginning of the second recording session (Condition 2, right column), subjects were trained to perceive the Dalmatian with its head turned either to the right or to the left (the hidden outlines of the dogs are given on theright). In both conditions, the task of the subjects was to silently count the occurrences of the target stimulus.
MATERIALS AND METHODS
Stimuli. The experiment was divided into two conditions. In the first condition (Fig. 1, left column), subjects were presented a neutral picture (meaningless black blobs on a light gray background), an “unperceived dog” picture (a Dalmatian is hidden in the picture, but the subject is not aware of its presence), and a target stimulus (twirled blobs). The task of the subjects was to silently count the occurrences of the target stimulus and to report this number verbally at the end of the session. The subjects were then instructed as to the presence of the hidden Dalmatian, and were trained to perceive it with its head turned right or left (the dog outline is presented in Fig. 1). During the training session, subjects were shown Dalmatians with their head to the left or to the right and neutral stimuli in random order and were asked to name each picture. Training was performed until subjects could accurately recognize each stimulus in a block of 50 stimuli of each type. In the second condition (Fig. 1, right column), the same neutral stimulus, the “perceived dog” (head rightward) and the target dog (head leftward) were presented. The task was to count silently the occurrences of the target stimulus. The neutral stimulus is derived from the target dog (head leftward); blobs surrounding the fixation point have rather similar shapes in both pictures to prevent recognition of the target dog by one of its visual detail.
Stimuli were delivered for 500 msec in pseudorandomized order (no more than three consecutive presentations of the same stimulus) on a video monitor, subtending a visual angle of 4° × 5° at a viewing distance of 2.8 m. A fixation point remained permanently on the screen. Interstimulus interval was randomized between 2 and 3 sec.
Subjects. Thirteen right-handed subjects were recorded (8 males, 5 females, mean age 23 years). All subjects had normal or corrected to normal vision and could perceive the Dalmatian after training. The study was performed with the understanding and written consent of all subjects.
Recordings. EEG was recorded continuously at a sampling rate of 1000 Hz (0.1–320 Hz analog bandwidth) from 13 Ag-AgCl electrodes referenced to the nose. Their locations, according to the international 10-20 system are as follows: Iz, T5, O1, O2, T6, POz, P3, Pz, P4, C3, Cz, C4, and Fz. Electrode impedances were kept below 5 kΩ. Electrode placement on the head was computer assisted (Echallier et al., 1992). Horizontal eye movements were monitored, and a rejection threshold was set for each subject at a potential value corresponding to a saccade to a corner of the picture. Four blocks of ∼130 stimuli (50 neutral stimuli, 50 Dalmatian picture, and ∼30 target stimuli) were delivered in the first condition to each subject. After the training session, another four blocks of ∼130 stimuli were delivered. Epochs containing artifacts [EEG > 100 μV or electro-oculogram (EOG) > saccade threshold] were rejected off-line. Seventy-nine percent of the responses were considered artifact-free, corresponding to a mean of 158 responses to each nontarget stimulus and 98 responses to each target stimulus.
Data analysis. The method used to quantify changes in γ-band oscillatory activity is based on a time–frequency (TF) wavelet decomposition of the signal between 20 and 100 Hz (Bertand et al., 1996). This method provides a better compromise between time and frequency resolution (Sinkkonen et al., 1995) than previously proposed methods using short-term Fourier transforms (Makeig, 1993). It provides a time-varying energy of the signal in each frequency band, leading to a TF representation of the signal. The energy of each single trial can be averaged, allowing one to analyze non-phase-locked high-frequency components (TF energy averaged across single trials), provided they have a high signal-to-noise ratio. This method can also be applied to the averaged evoked potential, providing information about high-frequency components phase-locked to stimulus onset (TF energy of the averaged evoked potential). An additional factor, called the phase-locking factor, allows us to study statistically the phase locking of high-frequency components, regardless of their amplitude (TF representation of the phase-locking factor).
The signal is convoluted by complex Morlet’s waveletsw(t,f0) (Kronland-Martinet et al., 1987) having a Gaussian shape both in the time domain (SD ςt) and in the frequency domain (SD ςf) around its central frequency f0:
with ςf = 1/2πςt. Wavelets are normalized so that their total energy is 1, the normalization factorA being equal to:
A wavelet family is characterized by a constant ratio (f0/ςf), which should be chosen in practice greater than ∼5 (Grosmann et al., 1989). The wavelet family we used is defined byf0/ςf = 7 (wavelet duration 2ςt of about two periods of oscillatory activity atf0), with f0 ranging from 20 to 100 Hz in 1 Hz steps. At 20 Hz, this leads to a wavelet duration (2ςt) of 111.4 msec and to a spectral bandwidth (2ςf) of 5.8 Hz, and at 100 Hz to a duration of 22.2 msec and a bandwidth of 28.6 Hz. The time resolution of this method thus increases with frequency, whereas the frequency resolution decreases.
The time-varying energy [E(t,f0)] of the signal in a frequency band is the square norm of the result of the convolution of a complex wavelet [w(t,f0)] with the signal [s(t)]:
Convolution of the signal by a family of wavelets provides a TF representation of the signal. By averaging the TF energy of each single trial, both phase-locked and non-phase-locked activities will be added up, as well as noise; only activities the amplitude of which is high enough compared with background high-frequency EEG will emerge. Simulation studies have shown that our method can detect oscillations of 4 μV amplitude and of latency jitter as great as 80 msec embedded in real background EEG (Bertand et al., 1996). Applied to the averaged evoked potential, this method allowed us to characterize phase-locked activities. In both cases, the mean TF energy of the prestimulus period (between −200 and −50 msec) is considered as a baseline level, which is subtracted from the pre- and poststimulus energy, in each frequency band.
The phase locking of an oscillatory activity is evaluated in the time–frequency domain by adapting the “phase-averaging” method developed by Jervis et al. (1983) in the frequency domain. The normalized complex time-varying energy Pi of each single trial:
is averaged across single trials, leading to a complex value describing the phase distribution of the time–frequency region centered on t and f0. The modulus of this complex value, ranging from 0 (non-phase-locked activity) to 1 (strictly phase-locked activity), will be called the phase-locking factor. To test whether an activity is significantly phase-locked to stimulus onset, a statistical test (Rayleigh test) of uniformity of angle is used (Jervis et al., 1983).
Below 20 Hz, wavelet duration is necessarily hundreds of milliseconds. The analysis of non-phase-locked oscillatory low-frequency components thus requires very long epochs of EEG to be analyzed correctly, and has a comparatively poor temporal resolution. Such an analysis on longer EEG epochs would drastically increase the number of rejected trials as a result of eye blink or movement. Our study aims specifically at analyzing the γ-band stimulus-induced responses; time–frequency analysis of phase-locked and non-phase-locked components will focus on the 20–100 Hz frequency range, and the results will be compared with a similar temporal resolution to the standard averaged evoked response digitally low-pass-filtered at 25 Hz, 12 dB per octave (low-frequency, phase-locked activity only).
The statistical analysis of TF energy values is performed with the use of the nonparametric Quade test for related samples and Conover procedures as post hoc tests of significance (Conover, 1980). The use of a nonparametric test is necessary because the distribution of the TF energy values is far from being Gaussian. The Quade test is an extension of the Wilcoxon signed-rank test to the case of several related samples. It is performed on ranked data paired by subjects and provides an F value indicating whether a global effect of stimulation type is significant. If so, Conover procedures allow us to compare all the possible combinations of experimental condition pairs, and thus to determine in which pairs significant differences occur. The statistical analysis of latency and frequency values is usually performed with the use of the Quade test, except when two factors have to be analyzed. Because we know no two-way nonparametric test, a two-way ANOVA is used in this case.
RESULTS
Performance
Subjects easily performed the task in the first condition, with great accuracy (3.5% errors). They often reported the task in the second condition to be more difficult, but their performance was equally accurate (3.6% errors). None of the subjects reported perceiving the hidden Dalmatian before being instructed for it.
Dissociation of phase-locked and non-phase-locked high-frequency components
The representation of TF energy averaged across single trials (Fig. 2) shows the existence of a decrease in energy at 190 msec in the 20–60 Hz band, followed by an increase around 280 msec and 35 Hz, compared with baseline. Both of these components can be seen at all electrodes, with maximum over posterior electrodes. They do not appear in the representation of the phase-locking factor (Fig.3A) and in the TF energy of the averaged evoked potential (Fig. 3B); they are thus non-phase-locked to stimulus onset and will be studied with the use of TF energy averaged across single trials. The phase-locking factor shows a maximum at 95 msec, corresponding to a first peak of high-frequency activity phase-locked to stimulus onset. This first component will thus be studied with the use of the TF representation of the phase-locking factor and the TF energy of the averaged evoked potential.
Time–frequency representation of the energy averaged across single trials, grand averaged across subjects at electrode O1. Time is presented on the x-axis, stimulus onset being indicated by the vertical bar at 0 msec. Frequency between 20 and 100 Hz is presented on they-axis. Energy values are coded on a gray scale, the highest energy values appearing white. Data are baseline-subtracted, thus providing positive and negative energy values. A decrease of energy compared with baseline level can be observed in response to all stimuli around 190 msec, followed by an increase of TF energy around 280 msec and 35 Hz. This increase is much more pronounced in Condition 2. It can be observed at all electrodes but tends to be stronger at posterior locations.
A, Phase-locking factor (grand average across subjects) at electrodes Cz (anterior) andO1 (posterior) in response to the unperceived and perceived dogs. An early phase-locked component peaks at 95 msec (arrows). There do not seem to be any differences between stimulus types, but a variation in peak frequency with electrode location can be observed; the frequency of this first peak is higher at anterior sites. The continuous line at 62 Hz corresponds to the frame rate of our video monitor, which is phase-locked to the stimulus onset. B, TF energy of the averaged evoked potential (grand average across subjects) at electrodeCz, in response to the unperceived and perceived dogs. Only the 95 msec, phase-locked peak of TF energy can be observed; both the decrease at 190 msec and the increase at 280 msec of TF energy averaged across single trials observed in Figure 2 disappear completely; these components are thus non-phase-locked to stimulus onset.
First peak in high-frequency activity (phase-locking factor, TF energy of the averaged evoked potential)
We measured the peak latency and frequency of the local maximal value of the phase-locking factor at each electrode for each subject and stimulus type. In 12 subjects of 13, this value reached the 1% significance level for phase locking (Rayleigh test), at between 4 and 13 electrodes, depending on the subjects. The first component occurring at 95 msec is thus significantly phase-locked to stimulus onset.
The peak frequency of the phase-locking factor seems to vary across electrodes, being higher at Cz than at O1 (Fig. 3). The latency and frequency values of those peaks reaching the 1% significance level were averaged across electrodes Iz, T5, O1, O2, and POz (posterior group), and across electrodes P3, Pz, P4, C3, Cz, C4, and Fz (anterior group). We tested the effect of both topography (posterior/anterior group) and stimulus type on the latency and frequency at the phase-locking factor maximum (two-way ANOVA). The latency of the first peak did not vary with stimulus type (F = 1.56;p = 0.21), nor with topography (F = 0.21; p = 0.66). Its frequency did not vary with stimulus type (F = 1.03; p = 0.28) but did vary with topography (F = 19.05; p= 0.001). This effect corresponds to higher frequencies at anterior electrodes than at posterior electrodes (Table 1).
Mean ± SEM latency and frequency values of the first peak measured on the phase-locking factor representations
The mean TF energy of the averaged evoked potential was computed in the region from 70 to 120 msec and 25 to 50 Hz. Grand average values across subjects for this measure are presented as topographical maps in Figure4A. The existence of two local maxima is confirmed: a posterior one (O1, O2) and an anterior one (Cz, C4). No differences between stimulus types can be found (Quade test), even though there is an apparent increase of the evoked potential TF energy in response to the unperceived dog. This increase corresponds in fact to a larger spread in the data in this condition (Fig.4B). It must be noticed that the distribution of the TF energy values is not Gaussian, the mean and median values being different (Fig. 4B).
A, Topographic maps of the TF energy of the averaged evoked potential, averaged between 70 and 120 msec, 25 and 50 Hz (grand averaged across subjects). The montage is shown as viewed from 45° from behind the vertex of the head of the subject (electrodes Cz and O2 are indicated on top of the figure). Local maxima can be found at two different sites: one anterior (Cz, C4) and another posterior (O1, O2). B, Mean (symbol), median (horizontal line), and 75th and 25th percentiles (box) of the mean TF energy of the averaged evoked potential (70–120 msec, 25–50 Hz). The vertical line extends from the 10th percentile to the 90th percentile. This illustrates the high intersubject variability, as well as the non-Gaussian distribution of the data. No significant difference between stimulus types can be found. C, Topographic maps at 90 msec of the averaged evoked potential of one subject, 25–50 Hz filtered (left) and 0–25 Hz filtered (right). The topographies of the high- and low-frequency components of the averaged evoked potential are clearly distinct. Note the smaller amplitude of the 25–50 Hz filtered evoked potential compared with the 0–25 Hz filtered signal.
The location of the posterior maximum of the first high-frequency peak seems similar to the location of the low-frequency evoked potential at the same latency in the grand average data. Nevertheless, the intersubject topographical variability of the posterior 35 Hz component is quite strong; in some subjects, the topographies of the 35 Hz component and of the low-frequency component at the same latency are clearly distinct (Fig. 4C). Moreover, the latency of the first high-frequency peak (95 msec) is shorter than the latency of the first peak of the low-frequency averaged evoked potential (P1: mean latency, 108.8 msec; grand average across subjects and stimulus types).
Decrease in energy compared with baseline (TF energy averaged across single trials)
A decrease in energy compared with baseline can be observed in the time–frequency representation of energy averaged across single trials (Fig. 2). This decrease can be seen at all electrodes but tends to be more pronounced at posterior electrodes.
The maximal decrease in energy was measured across electrodes for each subject and in each condition. The decrease was usually maximum at electrodes posterior to Pz at a latency of 190 ± 4 msec and a frequency of 32 ± 0.9 Hz. The Quade test showed no significant differences between stimulus types in energy (p= 0.68), latency (p = 0.32), or frequency (p = 0.34).
Second peak in high-frequency activity (TF energy averaged across single trials)
The second peak in high-frequency activity (Fig. 2) can be observed at all electrodes but tends to be more pronounced at posterior electrodes. A visual inspection of single trials shows that its topography is widespread (Fig. 5). No polarity inversion between electrodes can be observed. The amplitude of the oscillations can reach 20 μV, and the responses are clearly non-phase-locked (Fig.5, up to 40 msec latency jitter across single trials). The duration of the oscillatory episodes is rather brief, usually between 100 and 150 msec.
A, Single trials at electrode POz, in response to the perceived dog, 0–25 Hz filtered, 12 dB per octave (thin line) and 25–45 Hz filtered, 24 and 48 dB per octave (thick lines), and topographic maps of the maximal positive peak of the 25–45 Hz filtered potential (latency indicated below each map). Within trials, this topography is stable over several cycles. The amplitude of the γ-band oscillations can reach 20 μV. The duration of the oscillatory events is usually comprised between 100 and 150 msec, and their jitter in time can reach 40 msec. The topography of the maximal positive peak is widespread, with a rather posterior maximum. B, Time course and peak topography of the mean 25–45 Hz energy averaged across single trials (same subject as in A). The highest energy is reached at electrodes O1 and POz.
The most striking effect is a strong increase of the second peak energy in response to any of the three stimuli in condition 2 (Fig. 2). We measured for each subject the maximal value of the TF energy of the second peak, as well as its peak latency and frequency, across electrodes (Fig. 6A). It usually peaks at electrodes posterior to Pz. A strong effect of stimulus type can be observed (Quade test, p < 10−5), corresponding to larger responses in condition 2 (Conover paired comparisons: neutral stimulus, first vs second condition,p = 0.0125; unperceived vs perceived dog:p = 0.005; target stimulus first vs second condition:p < 0.001). This increase in energy is accompanied by a decrease of frequency in the second condition (Quade test,p < 10−4; Conover procedures: neutral stimulus, first vs second condition, p = 0.021; unperceived vs perceived dog: p = 0.004; target stimulus first vs second condition: p = 0.035). The latency of the second peak remains similar in both conditions (Quade test, p = 0.57).
A, Mean (symbol), median (horizontal line), and 75th and 25th percentiles (box) of the maximal values of the second high-frequency peak measured on the TF energy averaged across single trials. The vertical lineextends from the 10th to the 90th percentile. Note the high intersubject variability and the non-Gaussian distribution of the energy. Two effects can be observed: (1) the energy of the second peak is higher in Condition 2 than in Condition 1, and its frequency is lower; and (2) its energy is higher and its frequency lower in response to target stimuli than in response to nontarget stimuli. No significant differences can be found on the latency. B, TF representation of the energy averaged across single trials of two subjects at electrode Pz, in the second condition. Subject 3 (left) shows an increase for both the perceived and the target dogs, whereasSubject 7 (right) shows an increase only in response to the target dog.
We then searched for effects of stimulus type within each condition. Within the first condition, there was a tendency for the target stimulus to elicit a stronger response than the two other stimuli [Quade test: p = 0.018; Conover procedures: neutral stimulus vs unperceived dog, nonsignificant (p = 0.31); neutral versus target stimulus (twirled blobs), p = 0.006; unperceived dog versus target stimulus, p = 0.057], at a significantly lower peak frequency [Quade test, p = 0.016; Conover procedures: neutral stimulus vs unperceived dog, nonsignificant (p = 0.77); neutral vs target stimulus, p = 0.009; unperceived dog vs target stimulus, p = 0.017].
Within condition 2, a similar effect is found: the target stimulus (dog with head leftward) elicits a stronger response than the two other stimuli [Quade test: p < 10−4; Conover procedures: neutral vs perceived dog, nonsignificant (p = 0.44); neutral vs target stimulus (dog, head leftward), p < 10−4; perceived dog vs target stimulus, p < 10−4]. The frequency of the response to the target stimulus in the second condition seems to be lower than the frequency of the response to the two other stimuli, but this effect does not reach the significance level (Quade test, p = 0.081). The intersubject variability in the TF energy is quite strong; some subjects show a markedly larger response to both the perceived and target dogs (Fig.6B, left), whereas others show an increase only in response to the target stimulus (Fig. 6B,right).
Baseline level (TF energy averaged across single trials)
Up to now, we considered the baseline-subtracted TF energy averaged across single trials. We thus had to determine whether this baseline level was modified during the experiment. We performed a Quade test on the mean 30–60 Hz energy averaged between −200 and −50 msec. No effect could be found (p = 0.90). Six of the subjects showed an increase of their baseline level in condition 2, whereas seven showed a decrease. This did not seem to be related to a performance differences between subjects. Subdivision into two groups according to the increase or decrease of baseline level did not alter any of the above results.
Low-frequency (0–25 Hz) evoked potentials
The usual sequence of three waves labeled P1, N1, and P2 is observed on the 0–25 Hz filtered evoked potential (Fig.7B). The peak amplitudes and latencies of these three waves were measured in each subject across electrodes (Table 2) and the effects of stimulus type (neutral stimulus/dog/target stimulus) and of condition (first/second) tested (two-way ANOVA). No significant differences could be found for any of the three waves, neither in amplitude nor in latency.
A, Topographic maps of the 0–25 Hz filtered averaged evoked potentials, grand average across subjects. There are no topographical differences between stimulus type at the latencies of the first three major peaks (P1, N1, andP2). At longer latencies, potentials tend to be less positive in the second condition than in the first (shaded box at 275 and 325 msec). At 275 msec, an increased occipital negativity (N2, indicated by arrows) appears in response to both target stimuli and in response to the perceived dog. B, 0–25 Hz filtered average evoked potentials, grand average across subjects, at electrodesIz, P4, and C4. Thearrow indicates the enhancement of the N2in response to the target stimuli and the perceived dog. Gray areas underline the overall difference betweenConditions 1 and 2.
Mean latency (msec) and amplitude (μV) of the P1, N1, and P2 components of the 0–25 Hz filtered averaged evoked potential ± SEM
At longer latencies, two superimposed effects can be observed: (1) a stimulus-specific, focal component at 292 msec (N2) appearing in response to both target stimuli as well as the perceived dog (Fig. 7,arrows); and (2) a long-lasting effect between 250 and 400 msec appearing at all electrodes and maximal at ∼340 msec. In the second condition, the 0–25 Hz filtered evoked response to any of the three stimuli are less positive than in the first condition (Fig. 7,gray areas).
At posterior electrodes, the specific component labeled N2 seems to appear only in response to meaningful stimuli (both target stimuli as well as the perceived dog), as can be seen in Figure 7(arrows). We measured the maximum of this peak for each subject (Table 3). It always appeared at electrodes posterior to Pz, usually at Iz, T5, or T6. We tested two factors (condition and stimulus type) on N2 peak latency and amplitude (two-way ANOVA). No effect could be found on latency (condition:F = 1.75, p = 0.21; stimulus type:F = 2.15, p = 0.15; condition × stimulus interaction: F = 2.15, p = 0.16). Amplitude was modulated by both stimulus type (F= 17.51, p < 0.001) and interaction between stimulus type and condition (F = 4.87, p = 0.034). The “condition” factor alone did not have a significant effect on amplitude (F = 3.036, p = 0.11). This double effect of stimulus type and stimulus type × condition reflects the increase of the N2 in response to the two target stimuli and a specific additional enhancement of the N2 in response to the perceived dog compared with the unperceived dog (Table 3). We did not find any difference between the two target stimuli and the perceived dog (F = 0.74, p = 0.49): the enhancement of the N2 is similar in these three conditions. The N2 is thus more pronounced in response to meaningful stimuli (perceived dog and target stimuli), regardless of the condition.
Mean latency and amplitude of the N2 (0–25 Hz filtered evoked potential)
The latencies of the low-frequency N2 and of the second high-frequency peak have been compared; the latency of the second peak of high-frequency activity (281 msec) is significantly shorter (F = 6.24; p = 0.028) than the latency of the N2 (292 msec).
The long-lasting effect does not modify the topographies of the responses in the first and second condition; they remain similar, despite a global amplitude effect (Fig. 7A). The maximal difference between condition 1 and 2 is reached at ∼340 msec. The topography of this difference is rather widespread, with a maximum at POz. A two-way ANOVA was performed on the mean amplitude of the averaged evoked potential between 250 and 400 msec at all electrodes. No effect of stimulus type can be found (F = 1.69;p = 0.21), but the condition factor yields a very significant effect (F = 12.58, p = 0.004; interaction F = 0.95, p = 0.39). This effect is mainly a result of the 0–8 Hz part of the evoked potential; it disappears when the evoked potentials are filtered between 8 and 25 Hz.
Summary of results
An early, phase-locked γ-band component peaks at 95 msec (Fig.8). It does not vary with stimulus type but can be subdivided into two subcomponents, one anterior peaking at 38 Hz and one posterior peaking at 35 Hz. The low-frequency P1 rises at occipital electrodes at the same latency, and reaches its maximum at 108 msec. Although the P1 and the 35-Hz component both have posterior maximum, there is a larger intersubject topographical variability of the 35-Hz component. The low-frequency P1, N1, and P2 components do not vary with stimulus type.
Summary of results. High-frequency components are described on top of the figure, low-frequency components on the bottom. Latencies at which significant effects occur are shaded. See the end of Results for details.
Around 190 msec, a decrease in the TF energy averaged across single trials compared with prestimulus level can be observed at all electrodes but tends to have a posterior maximum. It does not vary with stimulus type. It is followed by a second, non-phase-locked increase in γ-band activity, peaking at 281 msec. This second peak of high-frequency activity is higher in condition 2 than in condition 1 at lower frequencies. Its energy is larger and its peak frequency lower in response to a target stimulus compared with a nontarget one.
A focal low-frequency component peaking at 292 msec at posterior electrodes is enhanced in response to meaningful stimuli (target stimuli as well as the perceived dog). In the same latency range, the low-frequency evoked potentials are less positive in condition 2 than in condition 1, at all electrodes (long-lasting effect, 250–400 msec, peaking at ∼340 msec).
DISCUSSION
Phase-locked high-frequency activity
The first component of the high-frequency response at 95 msec is phase-locked to stimulus onset. Similar phase-locked, early γ-band activities have already been described in response to visual stimuli in human (Jokeit et al., 1994) and found to be insensitive to stimulus type (Tallon et al., 1995; Tallon-Baudry et al., 1996). The phase-locked γ-band response can be subdivided into two components, one central (38 Hz) and one occipital (35 Hz). This finding suggests the existence of two distinct groups of oscillating structures in the same latency range. A possible structure underlying the component peaking at Cz–C4 is the lateral geniculate nucleus; γ-band oscillatory activity has repeatedly been observed in this structure (Ghose and Freeman, 1992; Nunez et al., 1992; Funke and Wörgötter, 1995; Guido and Weyand, 1995; Wörgötter and Funke, 1995; Neuenschwander and Singer, 1996;Sherman, 1996). Nevertheless, informations about the phase-locked or non-phase-locked nature of these oscillations in cat in response to flashed dots are contradictory, one study reporting a “strongly stimulus-locked firing pattern” (Wörgötter and Funke, 1995) and another one a non-phase-locked oscillatory activity (Neuenschwander and Singer, 1996). Other possible thalamic candidates are the intralaminar nucleus (Steriade et al., 1993) and the pulvinar (Shumikhina and Molotchnikoff, 1995). The component appearing maximal at occipital electrodes may reflect activity in the visual cortex;Maunsell and Gibson (1992) reported the existence of an early phase-locked 30–60 Hz oscillatory event in the striate cortex of the macaque monkey. In any case, the neurons engaged in the two oscillatory activities (central and occipital) are at least partially distinct from those underlying the low-frequency P1 because high- and low-frequency potentials may have different topographies in some subjects.
An early, phase-locked 40-Hz activity can also be observed in the auditory averaged evoked potential (Galambos et al., 1981) and was shown to originate in the auditory cortex (Pantev et al., 1991). Other studies rather suggest it corresponds to the activation of thalamocortical loops (Ribary et al., 1991; Llinas and Ribary, 1993). It has been claimed to be modified by attention (Tiitinen et al., 1993) but not by physical stimulus features (Tiitinen et al., 1994) and was proposed to reflect temporal binding (Joliot et al., 1994). Still, a difference between the auditory and visual modalities is that the auditory phase-locked 40 Hz response and low-frequency potentials have similar topographies (Bertrand and Pantev, 1994).
Non-phase-locked high-frequency activity
The decrease of TF energy observed at 32 Hz and 190 msec does not vary with stimulus type. Such a stimulus type-insensitive decrease was observed previously in humans in the visual (Tallon-Baudry et al., 1996) and auditory stimulus-induced response (Bertrand et al., 1996). It also resembles the depression seen in the driven 40-Hz auditory steady-state response (Makeig and Galambos, 1989). In animals, a similar phenomenon of suppression of γ oscillations was observed byEckhorn et al. (1992) in the cat visual cortex, and by MacDonald and Barth (1995) after an auditory stimulation in rat. The functional significance of these suppressions remains unclear.
The latency, duration, and topography of the second peak of γ-band activity is very similar to the non-phase-locked, high-frequency component we recorded previously from the human scalp (Tallon-Baudry et al., 1996). This component is non-phase-locked to stimulus onset, as are the oscillatory events observed in cat (Eckhorn et al., 1988; Gray and Singer, 1989; Gray et al., 1989, 1990, 1992;Brosch et al., 1995; Freiwald et al., 1995) and monkey (Eckhorn et al., 1993; Kreiter and Singer, 1996). The oscillatory events recorded here last between 100 and 150 msec, in the same range as γ-band activity observed in the visual cortex of cat (Gray et al., 1992) and monkey (Freeman and van Dijk, 1987; Kreiter and Singer, 1992). Nevertheless, we do not know which structures generate the activity recorded on the scalp. Because its topography is widespread, it may correspond to either deep and/or distributed structures. The visual cortex (striate and extrastriate) is a likely candidate, but the hippocampus (Leung, 1992; Bragin et al., 1995) or the cingulate cortex (Leung and Borst, 1987) might also be involved.
The non-phase-locked oscillatory activity at 280 msec is strongly enhanced in condition 2, which is characterized by a search for the Dalmatian with its head turned leftward. This Dalmatian does not pop out of the picture but can be perceived after training; its detection probably involves top-down mechanisms, or, in other words, a representation of what is searched for. The strong γ-band activity in condition 2 might thus reflect the activation of a representation of the target dog. This interpretation does not rule out a role of oscillatory activity in binding elementary features; γ-band synchronization could be a mean to generate an object representation, either by grouping physical features and building up a neural assembly (bottom-up process), as suggested by a previous experiment (Tallon-Baudry et al., 1996), or by activating the assembly corresponding to the attended object (top-down process) (Milner, 1974). This convergence of bottom-up and top-down mechanisms was theorerically predicted by Singer (1994), who states that “shifting attention by top-down processes would be equivalent with biasing synchronization probability of neurons at lower levels by feed-back connections from higher levels. These top-down influences could favor the emergence of coherent states in selected subpopulations of neurons—the neurons that respond to contours of an ’attended’ object or pattern.”
The non-phase-locked γ-band activity is also larger with a lower peak frequency in response to both target stimuli. In condition 1, the increase of γ-band activity in response to the target stimulus (twirled blobs) most likely corresponds to a grouping mechanism because the twirl is popping out and thus probably mainly perceived through bottom-up processes. In the second condition, the γ-band activity reflects the convergence of top-down and bottom-up processes required to perceive the target dog. It should be noted that top-down mechanisms are probably also involved in the first condition but to a lesser extent than in the second condition.
A puzzling question is why we do not observe larger energy and lower frequency oscillatory activity in response to the perceived dog compared with the neutral stimulus. Trends toward this energy increase and frequency decrease are observed (Figs. 2, 6) but are far from reaching significance because intersubject variability is quite strong. This may be related to the fact that we checked the correct perception of only the target dog, by asking subjects to report the number of its occurences. Some subjects may have perceived the target dog more consistently than the nontarget one because the task could indeed be performed correctly without recognizing the nontarget dog at each of its occurences.
Low-frequency evoked potentials
The low-frequency N2 is enhanced in response to both the target stimuli and the perceived dog. A similar enhancement of the posterior N2 in response to target stimuli as described previously (Czigler and Csibra, 1990; Heinze and Münte, 1993; Luck and Hylliard, 1994a). Extensive testing of this component by Luck and Hylliard (1994b) showed that it can also be observed for stimuli resembling targets, and that it disappears when competing information is suppressed; this component seems to be related to spatial filtering of irrelevant information, when the stimulus is identified as a possible target. In the present experiment, the N2 may reflect the spatial filtering of the blobs surrounding the meaningful part of the picture.
The N2 enhancement is followed by less positive potentials in condition 2. This can be attributed to the superimposition of a slow negative wave, as observed by Begleiter et al. (1993) in the 180–800 msec range in a visual delayed matching to sample task in response to the second stimulus, and by Stuss et al. (1992) in the 250–550 msec range in a naming task of incomplete pictures; the more incomplete the image, the larger the negativity. In both of these tasks, stimulus identification implies some comparison with a picture already stored in memory. In the second condition, the slow negativity could reflect the late part of the mechanism of comparison between an internal representation of the Dalmatian and the occurring stimulus. Nevertheless, this negativity does not vary with stimulation type but only with the condition; it could also reflect a nonspecific parameter, like the difficulty of the task. This component has a rather distributed topography, affecting all of the electrodes as the non-phase-locked, high-frequency component does. Nevertheless, high- and low-frequency potentials may reflect different types of neuronal processes; it has been shown that the depth cortical profile of 30–40 Hz spontaneous rhythms in cat does not reverse, whereas slow cortical waves show a polarity reversal across cortical layers (Steriade et al., 1996).
The neural mechanisms involved in both tasks could be tentatively summarized as follows:
(1) The 280 msec, non-phase-locked, high-frequency activity may reflect two processes: the activation of an assembly coding for a meaningful object (bottom-up binding process), and the activation of an assembly coding for the attended object (top-down process related to selective attention in condition 2).
(2) The low-frequency negativity at 292 msec may correspond to the spatial filtering of surroundings blobs when an object has been identified in the picture.
(3) The slow wave peaking at 340 msec may reflect further matching of the occurring stimulus with the attended object, when top-down processes are implied.
Footnotes
This work was supported by Human Frontier and Science Program Grant RG-20/95B and by French Ministry of Research Grant ACC-SV12 (functional brain imaging). We thank J. F. Echallier, P. Bouchet, and P. E. Aguera for helpful technical assistance.
Correspondence should be addressed to Dr. Catherine Tallon-Baudry, Brain Signals and Processes Laboratory, INSERM U280, 151 Cours Albert Thomas, F-69424 Lyon Cedex 03, France.