Interactive reportOrganizing sound sequences in the human brain: the interplay of auditory streaming and temporal integration1
Introduction
In natural acoustic environments, the ears are confronted by a mixture of sounds emanating from several simultaneously active sources. A large part of early auditory processing is concerned with decomposing the auditory input to its possible constituents and determining the individual events (units) of the auditory scene. One of the key mechanisms of this function is auditory stream segregation [2]. Bregman (1990) defined ‘stream’ as an auditory perceptual entity, a sequence of sounds perceived as belonging together [1]. Acoustic similarities play a major role in determining the grouping/segregation of sounds. The auditory streaming effect (originally described by Miller and Heise [7]), when high and low tones presented at a fast rate form separate streams, is a special case of stream segregation. When a sound sequence ‘streams’, across-stream temporal and sequential relationships are lost in perception. This suggests that auditory streaming stems from early pre-perceptual processing. Temporal integration (see e.g., [28]) is another important mechanism of sound organization which is also assumed to be automatic. From loudness summation (see e.g., [12]) to detection masking (see e.g., [19]), a large number of perceptual phenomena suggest that the acoustic signals presented within a ca. 200 ms temporal window are integrated into a single perceptual unit (see review by Cowan [3], [4]). The temporal window of integration (TWI; [10]) is an aspect of temporal integration which determines the initial unit in which sound energy is handled in the early, pre-attentive stages of auditory information processing.
The question motivating the present investigation was to find out, how closely spaced sounds are grouped/segregated during the early phases of auditory stimulus processing: are they first treated as a single unit irrespective of their spectral features (i.e., TWI precedes stream segregation) or temporal integration only occurs within auditory streams (i.e., stream segregation precedes the TWI processes). The answer to this question has important implications to theories of auditory processing. For example, one interesting question of speech processing is how phonemes are prevented from masking each other despite the fact that two or sometimes even more consonants and/or vowels are pronounced within 200 ms.
In the present study, the magnetic equivalent of the mismatch negativity (MMN and the magnetic MMNm components; [6], [9] and [11], respectively) event-related brain potential (ERP) was measured, which can be employed to probe pre-attentive transient sound representations. MMN(m) is elicited by infrequent violations of a regular sound sequence whether or not attention is focused on the auditory stimulation (see recent reviews by Schröger [15]; Näätänen and Winkler [8]). The elicitation of this component indicates that some regular feature of the sound sequence was pre-attentively detected and this regularity was violated by the MMN-eliciting auditory event. Both temporal integration and auditory streaming have been previously observed to affect the elicitation of the MMN(m). Winkler et al. [23] reported that MMN elicitation was prevented when both the regular (standard) sounds and the rare deviant sounds were followed by a mask presented with a very short (<150 ms) test-to-mask interval. This masking effect was interpreted as a sign of integration between the test and mask sounds occurring only when the two sounds were delivered within a short interval (see e.g., [22], [23]). A similar effect was observed by Tervaniemi et al. [18], who reported that infrequently omitting the second tone of a repetitively presented tone pair only elicited the MMN with short (<200 ms) within-pair intervals. The duration of the TWI was estimated to be 160–170 ms in studies that tested the elicitation of MMN(or MMNm) at different stimulus onset asynchronies (SOA) by infrequently omitting one tone in a repetitive tone sequence [24], [25], [26]. Further evidence compatible with the notion of the TWI was obtained by Winkler et al. [20] and Sussman et al. [16]. As for the auditory streaming effect, Sussman et al. [17] showed that alternating high and low tones at a fast stimulus presentation rate (100 ms SOA) resulted in the pre-attentive detection of separate regularities for the high and the low tones which did not occur at a lower (750 ms SOA) stimulus presentation rate. The authors interpreted their results as demonstrating the emergence of auditory streaming at the short but not at the longer SOA (this interpretation is compatible with the parameters found to induce auditory streaming in perception; for a review see [1]). Compatible evidence was obtained using simultaneous difference in multiple features between two sets of test tones by Ritter et al. [13]. In all of the above MMN studies, subjects performed some task unrelated to the auditory stimuli, which they were instructed to ignore. This shows that the MMN method allows one to study the pre-attentive effects of both temporal integration and auditory stream segregation.
The aim of the present study was to determine the order (primacy) of these two presumably pre-attentive grouping mechanisms in audition: the temporal window of integration and auditory streaming. Testing whether infrequently omitting a tone from a regularly alternating sequence of two tones (ABABAB…) elicits the MMN(m) allows one to answer this question. If the SOA of the sequence is set so that two consecutive tones (AB or BA) fall within the TWI (<160 ms) but neighbors within the separate high and low streams (AA or BB) do not, then one can investigate the effects of the amount of frequency separation between the A and B tones on the response to infrequent tone omissions. If TWI processing precedes frequency-based auditory streaming, then MMN should be elicited irrespective of the frequency separation between the two alternating tones. In contrast, if temporal integration occurs only within streams then tone omission should only elicit the MMN when the A and B tones are grouped into a single stream (which occurs when the frequency separation between A and B is small), but not when the two tones are segregated into separate low and high streams (large frequency separation between A and B).
Section snippets
Subjects and stimuli
The experiments took place in the electrically and magnetically shielded sound-attenuated chamber of the National Institute for Physiological Sciences in Okazaki, Japan. Prior to the magnetic recordings, the subjects’ perception of an alternating sequence of two tones (3000 Hz, ‘high’ and 500 Hz, ‘low’ presented at 125 ms stimulus onset asynchrony [SOA]) was tested. Only those (10 our of the 13 candidates) who reported clear perception of separate low and high streams participated in the main
Neuromagnetic responses to omissions
Superimposed (37 channels) individual responses of one representative subject to the different auditory events are presented on Fig. 1. (For the ‘Repetitive Tone’ condition, the response to every second tone was averaged separately for compatibility with the alternating conditions). Sizable distinct deflections, identified as MMNm, were elicited by the omission responses in the ‘Repetitive Tone’ and the ‘Alternation with Small Frequency Separation’ conditions, but not in the ‘Alternation with
Discussion
Sizable MMNm components were elicited by occasional stimulus omissions in the repetitive tones sequence as well as in the alternating sequence of two tones with moderately different frequencies. In contrast, no significant MMNm was elicited by infrequently omitting a stimulus from the alternating sequence of two tones with largely different frequencies. Since Yabe et al.’s previous studies [24], [25], [26] showed that infrequent stimulus omissions only elicit the MMN (and MMNm) when the SOA
Acknowledgements
The authors are grateful to Prof. R. Näätänen (Helsinki) and Prof. E. Schröger (Leipzig) for the helpful comments and discussions, to Mr Y. Takeshima and Mr O. Nagata for technical help, and to the staff of Department of Integrative Physiology (Okazaki) for helping to conduct the experiments. This study was supported by the Grant-in-Aid for Scientific Research (09670972, 09710064) from the Ministry of Education, Science and Culture of Japan, and by the Hungarian National Research Fund (OTKA
References (28)
- et al.
Responses of the primary auditory cortex to pitch changes in a sequence of tone pips: neuromagnetic recordings in man
Neurosci. Lett.
(1984) - et al.
Magnetoencephalography in studies of human cognitive brain function
Trends Neurosci.
(1994) - et al.
Early selective attention effects on evoked potential reinterpreted
Acta Psychol.
(1978) - et al.
Temporal integration of auditory stimulus deviance as reflected by the mismatch negativity
Neurosci. Lett.
(1999) - et al.
Temporal integration of auditory information in sensory memory as reflected by the mismatch negativity
Biol. Psychol.
(1994) - et al.
Adaptive modeling of the unattended acoustic environment reflected in the mismatch negativity event-related potential
Brain Res.
(1996) Auditory Scene Analysis: The Perceptual Organization of Sound
(1990)- et al.
Auditory segregation: stream or streams?
J. Exp. Psychol. Hum. Percept.
(1975) Sensory memory and its role in information processing
On short and long auditory stores
Psychol. Bull.
(1984)
Magnetoencephalography — Theory, instrumentation, and application to noninvasive studies of the working human brain
Rev. Mod. Phys.
The trill threshold
J. Acoust. Soc. Am.
The concept of auditory stimulus representation in cognitive neuroscience
Psychol. Bull.
Attention and Brain Function
Cited by (101)
Estimation of frequency difference at which stream segregation precedes temporal integration as reflected by omission mismatch negativity
2020, Biological PsychologyCitation Excerpt :These findings suggest that stream segregation occurred without attention and preceded temporal integration when the two processes occurred together. These findings are consistent with those of previous studies (Ritter et al., 2000; Sussman et al., 1999; Sussman, 2005; Winkler et al., 2003; Yabe et al., 2001). In accordance with the findings of several previous studies, our findings support the notion that stream segregation is among the early auditory processes that underlie the organization of auditory input.
- 1
Published on the World Wide Web on 27 February 2001.