In noisy environments, we use auditory selective attention to actively ignore distracting sounds and select relevant information, as during a cocktail party to follow one particular conversation. The present electrophysiological study aims at deciphering the spatiotemporal organization of the effect of selective attention on the representation of concurrent sounds in the human auditory cortex. Sound onset asynchrony was manipulated to induce the segregation of two concurrent auditory streams. Each stream consisted of amplitude modulated tones at different carrier and modulation frequencies. Electrophysiological recordings were performed in epileptic patients with pharmacologically resistant partial epilepsy, implanted with depth electrodes in the temporal cortex. Patients were presented with the stimuli while they either performed an auditory distracting task or actively selected one of the two concurrent streams. Selective attention was found to affect steady-state responses in the primary auditory cortex, and transient and sustained evoked responses in secondary auditory areas. The results provide new insights on the neural mechanisms of auditory selective attention: stream selection during sound rivalry would be facilitated not only by enhancing the neural representation of relevant sounds, but also by reducing the representation of irrelevant information in the auditory cortex. Finally, they suggest a specialization of the left hemisphere in the attentional selection of fine-grained acoustic information.
In ecological situations, we are often confronted with a mixture of sounds and it is crucial to be able to select relevant information and resist auditory distracters for further cognitive processing and adapted behavioral response. The electrophysiological mechanisms of auditory selective attention have been extensively investigated through selective dichotic paradigms (for review, see Giard et al., 2000). Selective attention has been shown to modulate the processing of both relevant (Hillyard et al., 1973; Woldorff and Hillyard, 1991) and irrelevant stimuli (Donald, 1987; Michie et al., 1993), with effects at multiple levels of sensory analysis including the auditory cortex (Pugh et al., 1996; Jancke et al., 1999), the brainstem (Lukas, 1980, 1981) and down to the cochlea (Giard et al., 1994).
In a real cocktail party situation, however, the sounds do not reach the ears separately like in the dichotic paradigms classically used in the previous studies of selective auditory attention. To our knowledge, no electrophysiological study has investigated the influence of active selection during the perception of overlapping binaural streams. The main reason is the difficulty to dissociate the neural activity specifically corresponding to either sound stream. One way to solve this issue is to use long-duration sounds at different amplitude modulation frequencies. Indeed, each of these sounds would elicit an evoked electrophysiological activity, named steady-state response (SSR), which has the particularity to be at the same frequency as the amplitude modulation of the sound.
The present electrophysiological study aims at deciphering the spatiotemporal organization of the effect of selective attention on the representation of concurrent sounds in the human auditory cortex. Two concurrent streams at different carrier frequencies and amplitude modulation frequencies (21 and 29 Hz) (see Fig. 1) were used. The perceptual segregation of these two simultaneous streams was induced by sound onset asynchrony (Bregman, 1990; Darwin et al., 1995; Turgeon et al., 2002): the 21 Hz stream always started before the 29 Hz stream. Intracranial electrophysiological (EEG) recordings were performed in epileptic patients with pharmacologically resistant epilepsy, implanted with multicontact depth electrodes in the temporal cortex. Patients were required either to focus their attention away from the streams [control (C) condition] or to actively select the 21 Hz stream (AS21 condition) or the 29 Hz stream (AS29 condition).
This paradigm therefore allowed us to compare the electrophysiological responses to acoustically identical stimuli in three different attentional contexts and to characterize the effect of selective attention before and during the sound rivalry. First, we analyzed the transient and sustained evoked responses that are well known to be modulated by attention (for review, see Picton et al., 1978; Giard et al., 2000). Second, thanks to the distinct amplitude modulation frequencies of the two simultaneous streams, the steady-state responses were used for tagging the electrophysiological activity corresponding to each stream independently. This approach thus allowed us to describe the multiple neural mechanisms of selective attention that may operate at distinct processing levels and in different areas of the auditory cortex (Heschl's gyrus, planum temporale, and polare).
Materials and Methods
We recorded the data from 12 patients (eight female and four male, age ranging from 21 to 48 years) suffering from pharmacologically resistant partial epilepsy and candidate for surgery. Because the location of the epileptic focus could not be identified using noninvasive methods, they were stereotactically implanted with multicontact depth probes. Electrophysiological recording is part of the brain functional evaluation that is performed routinely before epilepsy surgery in these patients. According to the French regulations concerning invasive investigations with a direct individual benefit, patients were fully informed about the electrode implantation, stereotactic EEG and evoked-potential recordings, and the cortical-stimulation procedures used to localize the epileptogenic and functional brain areas. All patients gave their informed consent to participate in the experiment. The signals described here were recorded away from the seizure focus. Several days before EEG recordings, antiepileptic drugs administered to the patients had been either discontinued or drastically reduced. No patient was administered with benzodiazepines. None of the patients reported any auditory complaint.
Stimuli and task.
Stimuli were composed of two acoustic streams (Fig. 1): a 21 Hz stream and a 29 Hz stream. The 21 Hz stream was composed of two amplitude-modulated tones separated by two octaves, the carrier frequency of the lower one being equiprobably chosen between 659, 698, 740, or 784 Hz at each trial. These two tones were both amplitude modulated in phase at a frequency of 21 Hz. The 29 Hz stream consisted of one tone separated by one octave from each tone of the first stream. This tone was amplitude modulated at a frequency of 29 Hz. The 21 Hz stream always started first (stim-part1; lasting 0.810 or 1 s for the first patient). Then, the 29 Hz stream started and the duration of the sound rivalry (stim-part2) was equiprobably chosen between 0.810, 0.905, 1, or 1.095 s (1.095, 1.190, 1.286, or 1.381 s for the first patient) at each trial. All component onsets and offsets were linearly ramped during 10 ms. Because the two streams started at different instants (onset asynchrony), auditory stream segregation was induced and two distinct streams were perceived.
There were three attention conditions in separate blocks. In the first one (control condition), the patients had to detect rare noise bursts (targets) superimposed to the stimuli and, thus, orient their attention away from the stimulus content. They were instructed to answer as soon as they heard a noise burst, by pressing a button. The superimposed target sounds were 150 ms bandpass-filtered noise-bursts (20 semitones wide, starting at 784 Hz with 10 ms rise/fall times). The targets were delivered in 15% of the stimuli and randomly occurred during the stimulus, 0.2, 0.5, 1.2, or 1.5 s before the end of the stimulus. When a target was present in a trial, the next stimulus started between 0.7 and 0.1 s after the patient's response, otherwise the intertrial interval was randomized between 0.9 and 1.4 s.
In the other two conditions, the stimuli included an additional, third part. During this last part, one stream or both were changing in spatial direction. Patients were instructed to attend to a given stream (the 21 or 29 Hz stream) and to indicate its final direction (left or right) with a two-direction joystick. Thus, these tasks corresponded to the attentional selection of one of the two streams (attend to the 21 Hz stream, AS21 condition; attend to the 29 Hz stream, AS29 condition). Interaural intensity differences (IIDs) were used to give a realistic impression of spatially moving streams during the last part of the stimulus. This part lasted 0.4 and 0.7 s for AS21 and AS29 conditions, respectively. Thus, in total, the stimuli were lasting between 1.620 and 1.905 s in the C condition, between 2.020 and 2.305 s in the AS21 condition, and between 2.320 and 2.605 s in the AS29 condition. In these two conditions, in 50% of the trials, only the attended stream was changing in spatial direction (25% to the left and 25% to the right), and in the other 50%, both streams were spatially changing in opposite directions (in 25%, the attended stream changed to the left and the unattended to the right, and in 25% the attended stream changed to the right and the unattended to the left). The four categories of stimuli were randomly presented. The level of difficulty was adjusted in each patient by choosing different values of IIDs generating different ranges of spatial motion. In most of the cases, the easiest level (for which the motion appears to end in one ear only) was chosen to obtain a good rate of correct responses. The next trial started between 1.5 and 1.7 s after the patient's response. These tasks were quite difficult. Although all the patients could perform the AS21 task, only two patients (patients 4 and 10) could correctly perform the AS29 task.
The intensities of all tones were corrected according to their carrier frequency [see Botte et al. (1989), their Fig. 1.2] and then 21 and 29 Hz streams were matched in intensity. Stimuli were delivered at an intensity level judged comfortable by the patient before the beginning of the experiment. Noise bursts were 5 dB above the stimulus intensity level.
Stimuli were presented to patients in two blocks of 80 trials each for the C task and in four blocks of 40 trials for the AS21 and AS29 tasks (resulting in 160 repetitions in each condition). Stimulus duration, carrier frequency, and noise burst occurrence or final spatial direction were randomized to limit habituation and predictability.
EEG recording and signal analysis.
Intracranial recordings were performed at the Functional Neurology and Epilepsy Department (Lyon Neurological Hospital, Lyon, France). EEG recordings were made from 64 or 128 intracranial electrode contacts referenced to an intracranial contact away from the superior temporal cortex. The ground electrode was at the forehead. Signals were amplified, filtered (0.1–200 Hz bandwidth), and sampled at 1000 Hz (Synamps; Neuroscan Labs, Sterling, VA) for the first six patients and were amplified, filtered (0.1–200 Hz bandwidth), and sampled at 512 Hz (Brain Quick System Plus; Micromed, Treviso, Italy) for the next six patients.
The analysis was restricted to the electrodes located in the temporal cortex and its immediate vicinity. Raw data were visually inspected and trials showing epileptic spikes were discarded. Because the superior temporal cortex was subsequently found to be the location of their epileptic focus, two patients were excluded from all analysis. The evoked sustained and transient responses from patient 3 were also excluded from analysis because of excessive epileptic spikes, but the data were kept for the SSR analysis because their frequency bands were less contaminated by the spectral content of the epileptic spikes. In the AS29 condition, only the data from two patients (4 and 10) who could perform the task were kept for analysis. In the C condition, trials that included a target occurring before 1.620 s after stimulus onset (∼12% of all trials) were not analyzed. In all conditions, trials with a motor response occurring before 1.620 s, with a false alarm, or with an incorrect response were rejected from additional analysis. The mean numbers of correct and nonartifacted trials were 142, 136, and 123 of 160 in the C, AS21, and AS29 conditions, respectively. The EEG of four patients (6, 7, 8, and 9) were notch-filtered at 50 Hz because of excessive power line artifact.
Evoked responses were averaged over a 2 s period and corrected with respect to the same baseline defined between −150 and 0 ms before stim-part1 onset.
The periodic SSR, evoked by the amplitude modulation of the streams at 21 or 29 Hz was analyzed by means of a wavelet decomposition, which provides a good compromise between time and frequency resolutions. Each single-trial signal was transformed in the time-frequency (TF) domain by convolution with complex Gaussian Morlet's wavelets with a ratio f/σf of 10, with f being the central frequency of the wavelet and σf its SD (for details, see Tallon-Baudry and Bertrand, 1999). A baseline correction was applied on TF plots by subtracting, in each frequency band, the prestimulus power between −250 and −150 ms before stimulus onset. From this TF analysis, time profiles of the 21 Hz and 29 Hz SSR could be computed and were used for statistical analysis.
Statistical analysis in the three conditions focused on several electrophysiological components: the transient responses evoked by the onsets of stim-part1 and stim-part2, the sustained responses elicited during stim-part1, and the SSR at 21 and 29 Hz during both parts of the stimulus (restricted to the shortest duration of stim-part1 and stim-part2, i.e., 0.810 s for each part). Because the data were not normally distributed, only nonparametric tests (Wilcoxon or Mann–Whitney) were used.
Statistical analysis of the sustained responses was performed on the 0.2–15 Hz bandpass-filtered EEG to remove the SSR. Because the transient responses to stim-part2 onset could be direct-current shifted by the ongoing sustained response (of stim-part1), statistical analysis of the transient evoked responses was performed on the 2–150 Hz bandpass-filtered EEG. To identify in each patient the electrode contacts where a transient or sustained response was emerging, a time-varying Wilcoxon test was computed from the single trials. It was applied to the mean amplitude of successive 20 ms time windows between 0 and 300 ms (transient responses) and successive 100 ms windows between 300 and 800 ms (sustained responses) after the onset of each stimulus part, compared with a prestimulus baseline (defined between −100 and 0 ms before stim-part1 onset). For contacts that showed a significant emerging response, differences between conditions were estimated by Mann–Whitney or Kruskal–Wallis tests applied to the same windows as described for the response identification test.
The emergence of SSR was assessed by Wilcoxon tests comparing the mean power of each frequency (21 and 29 Hz) over successive 100 ms windows between 0 and 800 ms after the onset of each stimulus part to the prestimulus baseline power (between −250 and −150 ms before stim-part1 onset) of the respective frequency (note that the prestimulus baseline is shifted away from 0 ms because wavelet analysis tends to stretch out the early poststimulus low frequency components). For contacts that showed a significant SSR, differences between conditions were estimated by Mann–Whitney or Kruskal–Wallis tests applied to the same windows as described for the response-identification test.
For each EEG component, a first statistical analysis in each of the 10 patients was performed by directly comparing the C and AS21 conditions using Mann–Whitney tests. In patients 4 and 10, we further compared the three conditions using a global Kruskal–Wallis test followed by two-by-two Mann–Whitney post hoc tests.
Only those effects are discussed that met Bonferroni-corrected p value criteria. In each patient, the probability threshold of 0.05 was thus divided by the number of tests performed (i.e., the number of time windows tested over all electrode contacts investigated).
All signal analyses were performed using the ELAN-Pack software developed at Institut National de la Santé et de la Recherche Médicale, Unite 821.
Electrode implantation, anatomical registration, and normalization.
Electrode contacts were 2 mm long and spaced every 3.5 mm (center to center). Depth probes (diameter, 0.8 mm) with 10 or 15 contacts each were inserted perpendicularly to the sagittal plane using Talairach's stereotactic grid (Talairach and Tournoux, 1988). Numbering of contacts is increasing from medial to lateral along an electrode track. Electrode locations were measured on x-ray images obtained in the stereotactic frame. The depth of penetration of each contact was measured on the frontal x-ray image from the tip of the electrode to the midline, which was visualized angiographically by the sagittal sinus. The coregistration of the lateral x-ray image and a midsagittal magnetic resonance imaging (MRI) scan, both having the same scale of 1, allowed us to measure the electrode coordinates in the individual Talairach's space defined by the median sagittal plane, the anterior commissure–posterior commissure (AC–PC) horizontal plane, and the vertical AC frontal plane, these anatomical landmarks being identified on the three-dimensional (3D) MRI scans. With this procedure, we could superpose each electrode contact onto the patients' structural MRI scans. The accuracy of the registration procedure was 2 mm, as estimated on another patient's MR images obtained just after electrode explantation and in which electrode tracks were still visible.
Four patients were implanted in the right hemisphere only, two in the left only and four in both hemispheres. In all implanted hemispheres, at least one electrode track was located in the superior temporal cortex. Electrodes H and H′ (prime denoting the left hemisphere) were positioned posteriorly, passing through Heschl's gyrus (HG), the planum temporale (PT), and the superior temporal gyrus (STG), and electrodes T, T′, and W were positioned anteriorly, passing through the HG and the anterior PT or the planum polare (PP). Electrodes A, A′, B, B′, and C were penetrating through the middle temporal gyrus (MTG). Electrodes N and N′ were located just above the superior temporal plane, in the parietal operculum. Although intracranial recordings in epileptic patients provide a sparse spatial sampling of the auditory cortex, we could access not only primary auditory areas (posteromedial part of the HG), but also posterior and anterior secondary auditory regions (PT and PP).
The electrode coordinates of each patient were converted from the individual Talairach's space to the normalized Talairach's space (Talairach and Tournoux, 1988), and then to the Talairach's space of the Montreal Neurological Institute (MNI) standard brain. Eventually, electrode contacts and experimental effects of all patients were plotted on a 3D rendering of the temporal cortices of the MNI standard brain (cortical surface segmentation by FreeSurfer software, http://surfer.nmr.mgh.harvard.edu). This procedure facilitated the comparison across patients of the activated sites that could be positioned with respect to the main superior temporal structures that were delineated on the standard brain (Fig. 2).
The control task presented no major difficulty, but was demanding enough to keep the patients alert (mean, 99.3% of correct responses and 0.2% of false alarms). The stream-selection attention tasks (AS21 and AS29) were more difficult. Although all the patients could perform the AS21 task with at least 85% of correct responses (mean, 97.2%), only two patients could reach the same percentage of correct responses in the AS29 task (patient 4, 85%; patient 10, 88.1% of correct responses).
The difference of performance between the two stream-selection tasks could be attributable to the asymmetric construction of the stimuli. Indeed, the 21 Hz stream could be easier to select because it contained the lowest and highest frequencies of the signal and not the middle one, because it started first, and because it contained two frequencies and not only one.
This difference of performance probably reflects a greater attentional load in the AS29 than in the AS21 condition. This difference is likely to favor the detection of attentional effects on the electrophysiological responses in the AS29 condition.
In the following, for each EEG component, the results of statistical analysis comparing the C and AS21 conditions in 10 (for steady-sate responses) or 9 patients (for transient and sustained responses, respectively) will be presented first, followed by the results of a second analysis considering the three conditions in patients 4 and 10.
Distinct electrophysiological components were identified in different sites of the auditory cortex. Figure 3 illustrates these components on the 3D rendering of patient 10's right temporal cortex. The evoked responses are shown after single trial averaging. The transient and sustained responses (Fig. 3 A,B) could be dissociated by digitally filtering the EEG signal using two different bandwidth, 2–150 and 0.2–15 Hz, respectively. The periodic SSRs clearly visible on the (unfiltered or transient) evoked responses were further analyzed using a time-frequency decomposition of the electrophysiological signals (time-frequency power averaged after a wavelet-based decomposition of each single trial). The time profiles of the SSR power could then be constructed at 21 and 29 Hz from the time-frequency plots (Fig. 3 C).
All of these components were present during both stim-part1 and stim-part2. Effects of attention on these electrophysiological components could thus be investigated during both stimulus parts, because the inputs were acoustically identical and only the focus of attention was changing. These attentional effects will be presented, for the group of patients, on a schematic view of the superior temporal cortex constructed from the MNI standard brain (Fig. 2). For sake of clarity, we only reported, in Figures 4 ⇓ ⇓ ⇓–8, electrodes passing through the superior temporal cortex and the attention effects found at these electrodes. These plots show that the auditory cortex could be covered with a good spatial sampling over the group of patients. To illustrate the time course of the responses, typical waveforms have been plotted from electrode contacts showing attentional effects.
Transient evoked responses (nine patients)
We found significant transient evoked responses after both stim-part1 and stim-part2 onsets at most contacts of electrodes H, H′, T, T′, and W (199 contacts of 208 tested contacts). Three main transient waves could be observed: a first one maximal between 30 and 75 ms, a second one ∼100 ms (between 75 and 150 ms), and a third one starting ∼150–170 ms. These waves had smaller amplitudes and slightly later latencies after stim-part2 onset compared with stim-part1 onset.
After stim-part1 onset (onset of the 21 Hz stream), transient waves were found modulated by attention at 67 contacts (of 199) across the nine patients kept for this analysis (Fig. 4). These waves had significantly larger amplitudes in the AS21 than in the C condition at all the 67 contacts. The three waves could be affected by attention, but most effects were found on the second wave (between 75 and 150 ms) (Fig. 4, green ovals) in several sites of the auditory cortex (HG, PP, and PT) in both hemispheres. For the two patients who could perform the AS29 task, the transient waves were found modulated by attention at 10 contacts (of 36). The main effects were larger amplitudes in the AS21 than in the C condition at nine contacts, in the AS21 than in the AS29 condition at six contacts, and in the AS29 than in the C condition at eight contacts. In other words, we globally found larger amplitudes in the AS21 than in the AS29, and, in turn, greater than in the C condition (in the rest of the study, this will be formulated as AS21 > AS29 > C).
To summarize, the main effects of attention on the transient evoked responses to the onset of the 21 Hz stream (AS21 > AS29 > C) strongly suggest the existence of two levels of attentional selection: a first one to select amplitude modulated streams from target sounds (AS21 > C and AS29 > C) and a second to select one of the two streams (AS21 > AS29). These effects show that selective attention to auditory streams (compared with the control condition) operates by enhancing the neural representation of the relevant information: the more relevant the sound, the more increased its neural representation.
After stim-part2 onset (onset of the 29 Hz stream), the transient waves were found modulated by attention at 34 contacts (of 199) across eight of the nine patients kept for this analysis (Fig. 5). These waves had significantly smaller amplitudes in the AS21 than in the C condition at 31 of the 34 contacts. The three transient waves could be affected by attention, but most effects were found on the third wave (between 150 and 280 ms) (Fig. 5, blue squares) in several sites of the auditory cortex (HG, PP, and PT), and mainly in the left hemisphere. We found attentional effects in only one of the two patients who could perform the AS29 task, at seven contacts (of 13) in the right hemisphere. For this patient, the main effects were larger amplitudes of the transient waves in the C than in the AS21 condition at four contacts, in the AS21 than in the AS29 condition at three contacts, and in the C than in the AS29 condition at five contacts. These effects could be summarized by C > AS21 > AS29. The transient evoked responses at the stim-part2 onset most likely correspond to an electrophysiological response to the onset of the 29 Hz stream. The main effect in the left hemisphere, AS21 < C, indicates that selective attention operates by reducing the neural representation of irrelevant information (the 29 Hz stream in the AS21 condition). This reduction process occurred later in latency (between 150 and 250 ms) than the attentional enhancement observed during stim-part1 (between 75 and 150 ms).
These transient waves were also found at some contacts of electrodes A, A′, B, B′, and C in the MTG, or N and N′ in the parietal operculum. These responses were present at several successive contacts with smaller amplitudes than at electrode contacts in the superior temporal cortex. Thus, the transient waves observed in the MTG and the parietal operculum probably mainly reflect the diffusion by volume conduction of generators located in the superior temporal cortex. The few attention effects found in these regions corresponded closely to the effects found on the nearest electrode in the superior temporal cortex.
Sustained evoked responses (nine patients)
At most of the electrode contacts presenting transient responses (179 of 199), a sustained response was present during both parts of the stimulus (Fig. 6). If the sustained response during stim-part1 most likely corresponds to the processing of the 21 Hz stream only, it is not possible to dissociate the sustained response corresponding to the processing of the 21 and 29 Hz streams, respectively, during sound rivalry (stim-part2). Therefore, the attentional effects on the sustained responses were investigated during stim-part1 only.
The sustained responses were found modulated by attention at 60 contacts (of 179) across six of the nine patients kept for this analysis (Fig. 6). These responses had significantly larger amplitudes in the AS21 than in the C condition, at 58 of the 60 contacts. These effects were mainly concentrated between 400 and 600 ms, in several areas of the auditory cortex (HG, PP, and PT) in both hemispheres. For the two patients who could perform the AS29 task, the sustained waves were found modulated by attention at 20 contacts (of 33). The main effects were larger amplitudes in the AS21 than in the C condition at 18 contacts, in the AS21 than in the AS29 condition at six contacts, and in the AS29 than in the C condition at three contacts. These effects could be summarized by AS21 > AS29 > C.
Thus, the attentional modulations observed in the sustained responses evoked by the 21 Hz stream (AS21 > AS29 > C) are similar to the effects found on the transient evoked responses after stim-part1 onset: the more relevant the sound, the more increased its neural representation.
As for the transient waves, the sustained responses found at some electrode contacts in the MTG and in the parietal operculum mainly reflected the diffusion by volume conduction of generators located in the superior temporal cortex. Similarly, the effects of attention found in these regions corresponded to the effects found on the nearest electrode in the superior temporal cortex.
Steady-state evoked activities (10 patients)
SSRs were emerging at 21 and 29 Hz at several contacts (80 and 39 of 208 contacts, respectively) along the HG in all patients (Figs. 7, 8, disks). They were very focal because they could not be observed either on other electrode contacts in the superior temporal cortex (PT or PP), or in the MTG or in the parietal operculum. The distributions of the 21 Hz SSR were quite similar during stim-part1 and stim-part2. Two bilateral clusters could be observed in the HG: a first one in its posteromedial part and a second one in a more anterior part (Fig. 7). The 21 and 29 Hz SSRs presented similar distributions (Fig. 7, 8), except that the 29 Hz SSR was absent in the anterior HG cluster in the left hemisphere (Fig. 8). In most of the cases, the maximum values of the 21 and 29 Hz SSRs were at different, but adjacent, contacts. These activities lasted during several hundreds of milliseconds during each stimulus part.
During stim-part1, 21 Hz SSRs were modulated by attention at 20 contacts (of 80) in eight patients (Fig. 7). The power of the SSR was significantly larger in the AS21 than in the C condition at five contacts in the left hemisphere, and significantly more prominent in the C than in the AS21 condition at 15 contacts in the right hemisphere. For the two patients who could perform the AS29 task, the 21 Hz SSRs were found modulated by attention at six contacts (of 14). The main effects were AS21 > AS29 at one contact in the left hemisphere and C > AS29 > AS21 at five contacts in the right hemisphere. In both hemispheres, these effects were mainly concentrated between 250 and 550 ms.
In the left hemisphere, the main effects on the 21 Hz SSRs (AS21 > C and AS21 > AS29) again indicate that selective attention can operate by enhancing the neural representation of relevant information in the auditory cortex. In the right hemisphere, the main effect (C > AS29 > AS21) is opposed to effects observed in the left hemisphere. This rather unexpected result suggests a different specialization of the right and left auditory cortices in attentional processes.
During stim-part2, 21 Hz SSRs were also found modulated by attention at six contacts (of 80) in two patients. The power of the SSR was significantly larger in the C than in the AS21 conditions at one contact in the left hemisphere and at five contacts in the right hemisphere. For the two patients who could perform the AS29 task, the main effects were AS21 > AS29 and C > AS29 at one contact in the left hemisphere and C > AS29 > AS21 at five contacts in the right hemisphere. The effects were mainly concentrated between 350 and 600 ms in the left hemisphere, and between 50 and 800 ms in the right hemisphere.
Thus, in the left hemisphere, the effects on the 21 Hz SSRs (AS21 > AS29 and C > AS29) show that selective attention can function by reducing the neural representation of the irrelevant information (the 21 Hz stream in the AS29 condition). This reduction process occurred later in latency (∼350 ms after stim-part2 onset) than the attentional enhancement observed during stim-part1 (∼250 ms after stim-part1 onset). The attentional effect on the 21 Hz SSR in the right hemisphere (C > AS29 > AS21) indicates that the hemispheric specialization noted above would remain during the sound rivalry.
All the effects on 21 Hz SSR were mainly located in the posteromedial cluster of the HG.
During stim-part2, 29 Hz SSRs were found modulated by attention at three contacts (of 39) in two patients (Fig. 8). The power of the SSR was significantly larger in the AS21 than in the C condition at one contact in the left hemisphere, and in the C than in the AS21 condition at one contact in each hemisphere. In the two patients who could perform the AS29 task, the main effects were AS29 > C and AS29 > AS21 at three contacts in each hemisphere (of seven). The effects were mainly concentrated between 300 and 500 ms in both hemispheres.
Thus, the main condition effects, in both hemispheres, on the 29 Hz SSRs (AS29 > C > AS21) would confirm that selective attention can both enhance the neural representation of relevant sounds (the 29 Hz stream in the AS29 condition) and reduce the neural representation of irrelevant information (the 29 Hz stream in the AS21 condition).
The present results provide both a precise time course and a detailed localization of the effects of selective attention on the representation of concurrent sounds in the human auditory cortex. They demonstrate that steady-state responses, generated in the primary auditory cortex, can be modulated by selective attention whereas the transient and sustained evoked responses are rather modulated in anterior and posterior associative auditory areas. They also provide new insights on the neural mechanisms of selective auditory attention.
Attentional modulation of SSRs
Twenty-one and 29 Hz SSRs were found bilaterally along the HG in two foci: in the posteromedial part of the HG, corresponding to the primary auditory cortex (PAC) (Liegeois-Chauvel et al., 1991; Rivier and Clarke, 1997), and in a more anterior part of the HG, considered an associative auditory area. An origin of the steady-state activities in two foci of the HG is consistent with previous magnetoencephalographic (MEG) (Gutschalk et al., 1999) and intracranial EEG (Liegeois-Chauvel et al., 2004) studies.
Very few studies have investigated the effect of attention on SSR and this issue is still debated. We found that SSR can be altered according to the orientation of attention in both foci of the HG. Our results differ from some previous observations (Linden et al., 1987), but are consistent with others (Ross et al., 2004). This discrepancy may be explained by the different tasks used in these studies. In the present experiment, attention is manipulated during a situation of sound rivalry, which requires strong attentional effort to select the relevant amplitude modulated stream. In the study by Ross et al. (2004), attention was directed on the amplitude modulation of the sounds whereas, in the experiment by Linden et al. (1987), attention was directed on sound frequency or intensity.
Thus, it seems that attention can affect SSR in specific situations, when the task-relevant feature is the amplitude modulation of the sound (i.e., the very specific acoustic feature responsible for the SSR), or in more complex situations when the attentional selection is difficult.
Localization of auditory selective-attention effects
In the present study, SSRs generated in the PAC, as well as SSRs and sustained and transient waves generated in anterior or posterior secondary auditory areas (PP and PT, respectively) were modulated by selective attention. Although effects of selective attention in nonprimary auditory areas have been observed in several previous experiments using positron emission tomography (PET) (Alho et al., 1999), functional MRI (Pugh et al., 1996; Jancke et al., 1999), MEG (Woldorff et al., 1993; Ahveninen et al., 2006), or cortical EEG (Neelon et al., 2006), only few neuroimaging studies have reported attention effects in the primary auditory cortex (Fujiwara et al., 1998; Alho et al., 1999; Jancke et al., 1999). The present results confirm that selective auditory attention can alter the sensory responses in both the primary and associative auditory areas. These attentional modulations in sensory areas could be controlled by the frontal cortex. Indeed, prefrontal lesion has been shown to increase early auditory evoked potentials to irrelevant sounds (Knight et al., 1989) and frontal regions would be activated according to the amount of attention required to perform an auditory task (Jancke and Shah, 2002).
Neural mechanisms of auditory selective attention
The present experiment manipulated two levels of selection: a first level to select targets from amplitude-modulated streams (control condition), and a second level to select one of the two auditory streams (AS21 and AS29 conditions). Therefore, in the control condition, the same amount of attention was devoted to both streams, whereas in the other two attentional conditions, one auditory stream was attended and the other ignored. The main effects of selective attention are summarized in Figure 9 and interpreted below. It can be noticed that effects on the 21 Hz SSRs in the right hemisphere present a particular pattern (hatched circles) that will be discussed in terms of hemispheric specializations.
Attentional effects on relevant and irrelevant sound representation
Before the period of sound rivalry (Fig. 9 A), selective attention seems to operate only by enhancing the neural representation (transient, sustained, and 21 Hz SSR responses) of the relevant sound (21 Hz stream), as early as 75 ms after stimulus onset. This mechanism of attentional facilitation has been observed in previous electrophysiological studies on transient waves (Hillyard et al., 1973; Woldorff and Hillyard, 1991; Alcaini et al., 1995), sustained responses (Picton et al., 1978), and SSRs (Ross et al., 2004).
During sound rivalry (Fig. 9 B), selective attention seems to mainly operate by reducing the neural encoding of the irrelevant sounds (21 Hz SSR in the AS29 condition; and transient responses to 29 Hz stream onset and 29 Hz SSR in the AS21 condition) from 150 ms after stimulus onset. This reduction of irrelevant information processing is consistent with previous electrophysiological (Alho et al., 1987, 1994; Donald, 1987; Michie et al., 1990, 1993; Alain and Woods, 1994), PET (Ghatan et al., 1998; Kawashima et al., 1999) and lesion (Knight et al., 1989) studies on auditory selective attention. As shown previously (Donald, 1987; Michie et al., 1990, 1993), this reduction occurred later in the sensory processing chain than the attentional enhancement. Thus, attentional facilitation and inhibition are most likely to be distinct processes with different temporal properties. In addition, the inhibition mechanisms seem to preferentially occur in situation of sound rivalry. This agrees with Lavie's (2005) proposition that only high perceptual load can prevent distracter processing.
To summarize, our findings show that in case of difficult attentional selection, mechanisms of active rejection of irrelevant sounds complement those of enhancement of relevant information processing, probably to improve the neural signal-to-noise ratio of the attended input.
Sensory filtering model of auditory attention
The transient (negative or positive) waves elicited after stream onsets were found to be enhanced or reduced in amplitude, according to the focus of attention. These transient responses recorded from intracranial electrodes correspond to the activity of neural populations in very specific auditory sites, whereas event-related potentials recorded from the scalp surface may be generated in multiple temporal and frontal regions. Thus, our findings strongly suggest that the activity of auditory areas involved in the “obligatory” sensory analysis of acoustic stimuli is modulated by selective attention. This is consistent with the “gain” theory (Hillyard et al., 1973), according to which selective attention acts as a filtering mechanism capable of facilitating attended sound and inhibiting unattended stimuli. However, this does not rule out the involvement of an “attentional trace” in the selection process, as proposed by Näätänen et al. (1978). This trace, expressed in scalp evoked potentials as a long-lasting negative wave, could be generated in other auditory areas or in the frontal cortex.
The transient responses at the 29 Hz stream onset were found affected by attention mainly in the left hemisphere. Moreover, in the right hemisphere, unexpected effects (C > AS29 > AS21) on the 21 Hz SSR were observed during the whole duration of the stimulus: it thus seems that the more relevant the sound, the more reduced its representation. This suggests a functional specialization of each hemisphere in attentional processes.
The left hemisphere could be mainly involved in attentional selection whereas the activity in the right hemisphere could be inhibited as a function of the attentional load. A stronger attention selectivity of the left hemisphere has been observed in previous studies in the visual (Reuter-Lorenz et al., 1990; Zani and Proverbio, 1995) and auditory (Coch et al., 2005) modalities. In the auditory modality, Alcaini et al. (1995) have suggested that the left hemisphere could be preferentially involved in voluntary, selective attention whereas the right hemisphere would be more engaged in automatic attentional orientation to unexpected stimuli.
Hemispheric specialization has also been found in local and global processing of visual stimuli. Early visual areas would be more activated by selective attention to local features in the left hemisphere and by attention to global information in the right hemisphere (Fink et al., 1996, 1997; Proverbio et al., 1998; Evans et al., 2000; Han et al., 2002). Although, to our knowledge, there is no data clearly showing the same hemispheric specialization between the auditory cortices, our findings suggest such an asymmetry. Indeed, we can consider that in the stream-selection attention conditions, a local processing of the auditory scene was necessary to select the relevant sound and, thus, enhanced the neural activity in the left auditory cortex. In the control condition, a more global analysis of the sounds was sufficient to detect superimposed noise bursts and, thus, preferentially enhanced the neural activity in the right auditory cortex.
The present results demonstrate that selective auditory attention can (1) affect the evoked responses in the primary and associative auditory areas, (2) modulate the steady state responses in a situation of sound rivalry, (3) operate by enhancing the neural representation of relevant sounds, and (4) also function by reducing the neural representation of irrelevant information when sound competition makes the attentional selection difficult. Finally, our data suggest a specialization of the left auditory cortex in the attentional selection of fine-grained acoustic information.
We thank Rhodri Cusack for constructive discussion.
- Correspondence should be addressed to A. Bidet-Caulet, Institut National de la Santé et de la Recherche Médicale, Unite 821, Brain Dynamics and Cognition, Centre Hospitalier le Vinatier, 95 Boulevard Pinel, 69500 Bron, France.