Elsevier

Brain Research

Volume 897, Issues 1–2, 6 April 2001, Pages 222-227
Brain Research

Interactive report
Organizing sound sequences in the human brain: the interplay of auditory streaming and temporal integration1

https://doi.org/10.1016/S0006-8993(01)02224-7Get rights and content

Abstract

The present study examined the relationship between two of the early brain processes of sound organization: auditory streaming and the temporal window of integration (TWI). Presented at a fast stimulus delivery rate, two tones alternating in frequency are perceived as separate streams of high and low sounds. However, when two sounds are presented within a ca. 200 ms temporal window, they are often processed as a single auditory event. Both stream segregation and temporal integration occur even in the absence of focused attention as was shown by their effect on the mismatch negativity (MMN) event-related potential. The goal of the present study was to determine the precedence between these two sound organization processes by using the stimulus-omission MMN paradigm. Infrequently omitting one stimulus from a homogeneous tone sequence only elicits an MMN when the stimulus onset asynchrony separating successive tones is shorter than 170 ms. This demonstrates the effect of the TWI. Magnetic brain responses elicited by infrequent stimulus omissions appearing in a sequence of two alternating tones were recorded. The magnetic MMN was elicited by tone omission when the alternating tones formed a single stream (with no or only small frequency separation between the two tones) but not when separate high and low streams emerged in perception (large frequency separation between the two alternating tones). This result shows that auditory streaming takes precedence over the processes of temporal integration.

Introduction

In natural acoustic environments, the ears are confronted by a mixture of sounds emanating from several simultaneously active sources. A large part of early auditory processing is concerned with decomposing the auditory input to its possible constituents and determining the individual events (units) of the auditory scene. One of the key mechanisms of this function is auditory stream segregation [2]. Bregman (1990) defined ‘stream’ as an auditory perceptual entity, a sequence of sounds perceived as belonging together [1]. Acoustic similarities play a major role in determining the grouping/segregation of sounds. The auditory streaming effect (originally described by Miller and Heise [7]), when high and low tones presented at a fast rate form separate streams, is a special case of stream segregation. When a sound sequence ‘streams’, across-stream temporal and sequential relationships are lost in perception. This suggests that auditory streaming stems from early pre-perceptual processing. Temporal integration (see e.g., [28]) is another important mechanism of sound organization which is also assumed to be automatic. From loudness summation (see e.g., [12]) to detection masking (see e.g., [19]), a large number of perceptual phenomena suggest that the acoustic signals presented within a ca. 200 ms temporal window are integrated into a single perceptual unit (see review by Cowan [3], [4]). The temporal window of integration (TWI; [10]) is an aspect of temporal integration which determines the initial unit in which sound energy is handled in the early, pre-attentive stages of auditory information processing.

The question motivating the present investigation was to find out, how closely spaced sounds are grouped/segregated during the early phases of auditory stimulus processing: are they first treated as a single unit irrespective of their spectral features (i.e., TWI precedes stream segregation) or temporal integration only occurs within auditory streams (i.e., stream segregation precedes the TWI processes). The answer to this question has important implications to theories of auditory processing. For example, one interesting question of speech processing is how phonemes are prevented from masking each other despite the fact that two or sometimes even more consonants and/or vowels are pronounced within 200 ms.

In the present study, the magnetic equivalent of the mismatch negativity (MMN and the magnetic MMNm components; [6], [9] and [11], respectively) event-related brain potential (ERP) was measured, which can be employed to probe pre-attentive transient sound representations. MMN(m) is elicited by infrequent violations of a regular sound sequence whether or not attention is focused on the auditory stimulation (see recent reviews by Schröger [15]; Näätänen and Winkler [8]). The elicitation of this component indicates that some regular feature of the sound sequence was pre-attentively detected and this regularity was violated by the MMN-eliciting auditory event. Both temporal integration and auditory streaming have been previously observed to affect the elicitation of the MMN(m). Winkler et al. [23] reported that MMN elicitation was prevented when both the regular (standard) sounds and the rare deviant sounds were followed by a mask presented with a very short (<150 ms) test-to-mask interval. This masking effect was interpreted as a sign of integration between the test and mask sounds occurring only when the two sounds were delivered within a short interval (see e.g., [22], [23]). A similar effect was observed by Tervaniemi et al. [18], who reported that infrequently omitting the second tone of a repetitively presented tone pair only elicited the MMN with short (<200 ms) within-pair intervals. The duration of the TWI was estimated to be 160–170 ms in studies that tested the elicitation of MMN(or MMNm) at different stimulus onset asynchronies (SOA) by infrequently omitting one tone in a repetitive tone sequence [24], [25], [26]. Further evidence compatible with the notion of the TWI was obtained by Winkler et al. [20] and Sussman et al. [16]. As for the auditory streaming effect, Sussman et al. [17] showed that alternating high and low tones at a fast stimulus presentation rate (100 ms SOA) resulted in the pre-attentive detection of separate regularities for the high and the low tones which did not occur at a lower (750 ms SOA) stimulus presentation rate. The authors interpreted their results as demonstrating the emergence of auditory streaming at the short but not at the longer SOA (this interpretation is compatible with the parameters found to induce auditory streaming in perception; for a review see [1]). Compatible evidence was obtained using simultaneous difference in multiple features between two sets of test tones by Ritter et al. [13]. In all of the above MMN studies, subjects performed some task unrelated to the auditory stimuli, which they were instructed to ignore. This shows that the MMN method allows one to study the pre-attentive effects of both temporal integration and auditory stream segregation.

The aim of the present study was to determine the order (primacy) of these two presumably pre-attentive grouping mechanisms in audition: the temporal window of integration and auditory streaming. Testing whether infrequently omitting a tone from a regularly alternating sequence of two tones (ABABAB…) elicits the MMN(m) allows one to answer this question. If the SOA of the sequence is set so that two consecutive tones (AB or BA) fall within the TWI (<160 ms) but neighbors within the separate high and low streams (AA or BB) do not, then one can investigate the effects of the amount of frequency separation between the A and B tones on the response to infrequent tone omissions. If TWI processing precedes frequency-based auditory streaming, then MMN should be elicited irrespective of the frequency separation between the two alternating tones. In contrast, if temporal integration occurs only within streams then tone omission should only elicit the MMN when the A and B tones are grouped into a single stream (which occurs when the frequency separation between A and B is small), but not when the two tones are segregated into separate low and high streams (large frequency separation between A and B).

Section snippets

Subjects and stimuli

The experiments took place in the electrically and magnetically shielded sound-attenuated chamber of the National Institute for Physiological Sciences in Okazaki, Japan. Prior to the magnetic recordings, the subjects’ perception of an alternating sequence of two tones (3000 Hz, ‘high’ and 500 Hz, ‘low’ presented at 125 ms stimulus onset asynchrony [SOA]) was tested. Only those (10 our of the 13 candidates) who reported clear perception of separate low and high streams participated in the main

Neuromagnetic responses to omissions

Superimposed (37 channels) individual responses of one representative subject to the different auditory events are presented on Fig. 1. (For the ‘Repetitive Tone’ condition, the response to every second tone was averaged separately for compatibility with the alternating conditions). Sizable distinct deflections, identified as MMNm, were elicited by the omission responses in the ‘Repetitive Tone’ and the ‘Alternation with Small Frequency Separation’ conditions, but not in the ‘Alternation with

Discussion

Sizable MMNm components were elicited by occasional stimulus omissions in the repetitive tones sequence as well as in the alternating sequence of two tones with moderately different frequencies. In contrast, no significant MMNm was elicited by infrequently omitting a stimulus from the alternating sequence of two tones with largely different frequencies. Since Yabe et al.’s previous studies [24], [25], [26] showed that infrequent stimulus omissions only elicit the MMN (and MMNm) when the SOA

Acknowledgements

The authors are grateful to Prof. R. Näätänen (Helsinki) and Prof. E. Schröger (Leipzig) for the helpful comments and discussions, to Mr Y. Takeshima and Mr O. Nagata for technical help, and to the staff of Department of Integrative Physiology (Okazaki) for helping to conduct the experiments. This study was supported by the Grant-in-Aid for Scientific Research (09670972, 09710064) from the Ministry of Education, Science and Culture of Japan, and by the Hungarian National Research Fund (OTKA

References (28)

  • M. Hämäläinen et al.

    Magnetoencephalography — Theory, instrumentation, and application to noninvasive studies of the working human brain

    Rev. Mod. Phys.

    (1993)
  • G.A. Miller et al.

    The trill threshold

    J. Acoust. Soc. Am.

    (1950)
  • R. Näätänen et al.

    The concept of auditory stimulus representation in cognitive neuroscience

    Psychol. Bull.

    (1999)
  • R. Näätänen

    Attention and Brain Function

    (1992)
  • Cited by (101)

    • Estimation of frequency difference at which stream segregation precedes temporal integration as reflected by omission mismatch negativity

      2020, Biological Psychology
      Citation Excerpt :

      These findings suggest that stream segregation occurred without attention and preceded temporal integration when the two processes occurred together. These findings are consistent with those of previous studies (Ritter et al., 2000; Sussman et al., 1999; Sussman, 2005; Winkler et al., 2003; Yabe et al., 2001). In accordance with the findings of several previous studies, our findings support the notion that stream segregation is among the early auditory processes that underlie the organization of auditory input.

    View all citing articles on Scopus
    1

    Published on the World Wide Web on 27 February 2001.

    View full text