Auditory environments vary as a result of the appearance and disappearance of acoustic sources, as well as fluctuations characteristic of the sources themselves. The appearance of an object is often manifest as a transition in the pattern of ongoing fluctuation, rather than an onset or offset of acoustic power. How does the system detect and process such transitions? Based on magnetoencephalography data, we show that the temporal dynamics and response morphology of the neural temporal-edge detection processes depend in precise ways on the nature of the change. We measure auditory cortical responses to transitions between “disorder,” modeled as a sequence of random frequency tone pips, and “order,” modeled as a constant tone. Such transitions embody key characteristics of natural auditory edges. Early cortical responses (from ∼50 ms post-transition) reveal that order–disorder transitions, and vice versa, are processed by different neural mechanisms. Their dynamics suggest that the auditory cortex optimally adjusts to stimulus statistics, even when this is not required for overt behavior. Furthermore, this response profile bears a striking similarity to that measured from another order–disorder transition, between interaurally correlated and uncorrelated noise, a radically different stimulus. This parallelism suggests the existence of a general mechanism that operates early in the processing stream on the abstract statistics of the auditory input, and is putatively related to the processes of constructing a new representation or detecting a deviation from a previously acquired model of the auditory scene. Together, the data reveal information about the mechanisms with which the brain samples, represents, and detects changes in the environment.
- auditory evoked response
- auditory cortex
- integration window
- change detection
- scene analysis
Auditory environments are constantly changing. To rise to its ecological challenges, an organism must be able to make sense of the variable information that reaches the ears, decompose it into representations of the sound-generating sources, localize them, recognize them, and react appropriately. One useful cue in this process is the detection of temporal edges corresponding to onsets and offsets of auditory objects. These transitions appear superimposed on the already fluctuating context and the objective of the present study is to investigate how the human auditory cortex detects the boundaries of such events within an ongoing stimulus. To detect a temporal edge, a listener must acquire some representation of the ongoing stimulus (Dean et al., 2005; Ulanovsky et al., 2003, 2004), compare, in real time, the incoming information to this internal model, and react if a deviation occurs. In particular, one has to be able to differentiate fluctuations expected of a specific auditory scene from unexpected deviations that indicate the occurrence of a new source.
To study the cortical dynamics of edge detection, we contrasted magnetoencephalography (MEG) responses to stimuli with no “edges,” either a constant tone (C) or a random sequence of tones (R), with responses to stimuli containing a transition from constant to random (CR) or vice versa (RC). These relatively simple stimuli embody key characteristics of temporal edges (see Fig. 1). Theoretically, an ideal observer can immediately detect the transition in the constant-to-random case. The first waveform sample that violates the acquired regularity model suffices to signal the transition. The opposite transition, from random to constant, necessarily takes longer to detect because the observer must distinguish the onset of regularity from an oscillation that might occur by chance. This depends on the statistical properties of the ongoing fluctuating stimulus (DeWeese and Zador, 1998); in the present case, the observer must wait at least the duration of a pip to detect the change from random to constant (see Fig. 1). Here, we target the neural mechanisms underlying the detection of the two types of transitions and the extent to which they adjust o the properties of the fluctuating stimulus.
Our findings relate to, but are distinct from, the prominent mismatch-negativity change-detection response (Näätänen et al., 1978; Kujala and Näätänen, 2003) (see Discussion). We show that early auditory cortical responses to transitions between constant tones and random sequences of tone pips are remarkably different from responses to the opposite transition. Furthermore, this asymmetry resembles an asymmetry observed previously for transitions between correlated and uncorrelated noise signals, which represent “constancy” and “random variation” in the dimension of interaural similarity (Chait et al., 2005). Because of this resemblance, we interpret the responses as reflecting a low-level edge-detection mechanism that monitors the time-varying statistics of auditory signals. The data suggest that the temporal integration requirements for detecting different transitions recruit distinct cortical circuits and illuminate the heuristics with which auditory cortex processes changes in the acoustic environment, including those that are not task relevant.
Materials and Methods
Twenty-four subjects (mean age 20.2 years, 14 female), took part in the experiment. All were right handed (Oldfield, 1971), reported normal hearing, and had no history of neurological disorder. The experimental procedures were approved by the University of Maryland Institutional Review Board and written informed consent was obtained from each participant. Subjects were paid for their participation.
Stimuli were 1440 ms in duration and consisted of a pure tone modulated in frequency and amplitude according to four patterns (C, R, CR, RC). C and R stimuli served as controls for the CR and RC stimuli. For C, the frequency and amplitude were constant (no modulation). Onset and offset were shaped with 3 ms raised-cosine ramps, and the frequency set to one of 20 values equally spaced on a log scale between 222 and 2000 Hz. For R, the frequency of the pure tone was modulated in steps, with frequencies drawn randomly from the above set of 20 values. Step duration was 15, 30, or 60 ms, depending on the condition. The amplitude of each step was shaped by initial and final 3 ms raised-cosine ramps so that an R stimulus effectively consisted of a sequence of tone pips. For CR, the stimulus consisted of an initial 840 ms constant-frequency segment followed by a 600 ms post-transition sequence of pips, and for RC, the first 840 ms were modulated and the last 600 ms were constant (Fig. 1).
We generated 40 signals for each of the four patterns (C, R, CR, RC). Frequencies in the R segments were randomly drawn with two constraints: the change in frequency at the transition at 840 ms postonset had to be at least 20% to make it sufficiently perceptually salient and, at each instant, all of the frequencies had to occur the same number of times (twice) to ensure balanced summation of MEG responses over epochs. MEG responses are known to be frequency-dependent, and an unbalance in frequency content might have created artifactual differences between conditions. CR and RC stimuli were initially created as mirror images of each other, and then trimmed to the required duration. It could happen by chance that two consecutive pips share the same frequency: this occurred with a rate of ∼5%.
Pip durations (15, 30, and 60 ms) were presented by blocks in the main experiment. In each of the three blocks, subjects heard 120 repetitions of every one of the four patterns. The order of blocks was balanced between subjects (Latin-square design). Within each block, the order of presentation was randomized. The interstimulus interval (ISI) was randomized between 600 and 1400 ms.
In addition to these tonal stimuli for which MEG responses were recorded, the stimulus set included a proportion (33%, or 240 per block) of wide-band noise bursts of 200 ms duration with 10 ms raised-cosine onset and offset ramps. Subjects were instructed to detect these decoy stimuli. This ensured that subjects remained vigilant and attentive to the auditory modality, but the task did not involve any processing of the tonal changes that were the focus of our study. This choice of task was motivated by the desire to ensure that MEG responses probe low-level auditory processes and not higher-level processes engaged by the task. The noise burst-detection task may bias the neural activity to detect white noise, but presumably this bias is equal across all conditions (CR, RC), which may not be the case if the task were an explicit change-detection task.
The stimuli were created off-line and saved in 16-bit stereo wave format at a sampling rate of 44 kHz. The signals were delivered to the subjects' ears with tubephones (E-A-RTONE 3A 50 ohm; Etymotic Research, Elk Grove Village, IL) attached to E-A-RLINK foam plugs inserted into the ear canal and presented at a comfortable listening level.
A follow-up experiment used six additional participants and included 60 ms, and 120 ms pip durations as well as a constant-to-constant (CC) condition, where the pretransition and post-transition segments were of constant, but different, frequencies (see Fig. 6). Stimulus generation and procedure were identical to the main experiment.
Subjects were naive as to the real purpose of the investigation (in order that they not focus special attention on tonal transitions). They lay supine inside a magnetically shielded room. Before the experiment, subjects listened to 200 repetitions of a 1 kHz 50 ms sinusoidal tone (ISI randomized between 750 and 1550 ms). Responses to these tones were used to verify that signals from the auditory cortex had a satisfactory signal-to-noise ratio, that the subject was positioned properly in the machine, and to determine which MEG channels best respond to activity within the auditory cortex. In the experiment proper (∼1.5 h), subjects listened to stimuli while performing the noise burst-detection task as described above. They were instructed to respond by pressing a button, held in the right hand, as soon as a noise burst appeared. The instructions encouraged speed and accuracy. Stimulus presentation was divided into runs of 160 stimuli. Between runs, subjects were allowed a short rest, but were required to stay still.
Neuromagnetic recording and data analysis.
Methods and analysis are described in more detail by Chait et al. (2005). The magnetic signals were recorded using a 160-channel, whole-head axial gradiometer system (KIT, Kanazawa, Japan). Data for the pre-experiment were acquired with a sampling rate of 1 kHz, filtered online between 1 Hz (hardware filter) and 58.8 Hz (17 ms moving average filter), stored in 500 ms stimulus-related epochs starting 100 ms preonset, and baseline-corrected to the 100 ms preonset interval. Data for the main (change detection) experiment were acquired continuously with a sampling rate of 0.5 kHz, filtered in hardware between 1 and 200 Hz with a notch at 60 Hz (to remove line noise), and stored for later analysis. Effects of environmental magnetic fields were reduced based on several sensors distant from the head using the CALM algorithm (Adachi et al., 2001) and responses were then smoothed by convolution with a 39 ms Hanning window (cut-off, 55 Hz). These are standard signal-processing methods; additional processing is described below.
Auditory evoked responses to the 1 kHz pure tone sequence were examined and the M100 onset response identified. The M100 is a prominent deflection at ∼100 ms postonset, robust across listeners and stimuli, and has been the most investigated auditory MEG response (Roberts et al., 2000) (for review of the electric N1 response, see Näätänen and Picton, 1987). It was identified for each subject as a dipole-like pattern (i.e., a source/sink pair) in the magnetic field contour plots distributed over the temporal region of each hemisphere. In previous studies, under the same conditions, the M100 response was attributed to a current source localized to the upper banks of the superior temporal gyrus in both hemispheres (Hari, 1990; Pantev et al., 1995; Lütkenhöner and Steinsträter, 1998). For each subject, the 20 strongest channels at the peak of the M100 (five in each sink and source, yielding 10 in each hemisphere) were chosen for the analysis of the experimental data.
Stimulus-evoked magnetic fields, measured outside the head by MEG, are generated by synchronous neuronal currents flowing in tens of thousands of cortical pyramidal cells on the supratemporal gyrus (Hämäläinen et al., 1993). This electromagnetic fluctuation can be modeled as resulting from a magnetic dipole source characterized by position, orientation, and strength. Because of the location of the source inside a cortical fold, responses from the auditory cortex are characterized by a source/sink pairs that are antisymmetric across the two hemispheres. We use two-dimensional contour maps to display this information.
Two measures of dynamics of brain response are reported: the time course of the root mean square (rms) over the selected channels, reflecting instantaneous amplitude of neural responses, and the spatial distributions of the magnetic field (contour plots) over all channels, sampled at certain times postonset. For illustration purposes, we plot the rms of the grand-averaged response (average over all subjects), but statistical analysis is always performed on a subject-by-subject, hemisphere by hemisphere basis using the rms over the 10 channels chosen for each subject in each hemisphere.
In the main experiment, 1400 ms epochs (including 200 ms preonset) were created for each of the 12 stimulus conditions (three pip durations by four patterns). Epochs with amplitudes larger than 3 pT (∼5%) were considered artifactual and discarded. The rest were averaged, low-pass filtered at 30 Hz (67-point-wide Hanning window), and base-line corrected to the preonset interval. In each hemisphere, the rms of the field strength across the 10 channels, selected in the pre-experiment, was calculated for each sample point. Twenty-four rms time series, one for each condition in each hemisphere, were thus created for each subject.
To evaluate congruity across subjects, the individual rms time series were combined into 24 group rms (rms of individual rmss) time series. Consistency of peaks in each group rms was automatically assessed with the Bootstrap method (Efron and Tibshirani, 1993) (500 iterations; balanced). The consistency across subjects of magnetic field distributions at those peaks was assessed automatically by dividing the 20 channels chosen for each subject into four sets (five channels each): left temporofrontal, left posterior-temporal, right temporofrontal, and right posterior-temporal (see Fig. 2). For each set, the activation was averaged over a 20 ms window defined around the group-rms peak, and the set was classified as either a “sink” (negative average amplitude) or a “source” (positive average amplitude). If the majority of subjects showed the same sink–source configuration, the pattern was considered consistent across subjects.
Subjects were accurate and fast at performing the decoy task (detecting the noise bursts). The average miss and false-positive counts were 4.5 (SD, 10.2) and 1.5 (SD, 1.3), respectively, of a total of 240 presentations per block. The average response time was 375 ms (SD, 75 ms). None of the measures differed between blocks, implying that subjects maintained a constant vigilance level throughout the experiment.
MEG responses to onsets and transitions
Magnetic waveform and field distribution analyses reveal that all participants had comparable responses. Figure 2 shows the rms (black) of the grand-averaged auditory evoked responses to CR and RC transitions for a pip duration of 15 ms. The appropriate control conditions (C and R, respectively) are plotted in gray. The origin of the time scale coincides with the onset of the signals and the transition occurs at 840 ms after stimulus onset. The evoked MEG activity exhibits a deflection at ∼100 ms after stimulus onset and a later deflection after the transition that begins at ∼900 ms postonset (60 ms post-transition). These two aspects of the response are discussed, in turn, below.
Figure 3 shows onset responses to CR (black) and RC (gray) stimuli for each pip-duration condition. Responses to all conditions had similar dynamics (latency and shape of the deflection) and magnetic field distributions. Specifically, all conditions produced a prominent onset response with a peak at ∼110 ms postonset with a characteristic M100 field distribution (Fig. 2). A repeated-measures ANOVA on amplitude differences between corresponding conditions, with hemisphere and pip duration as factors, revealed only a main effect of pip duration (F(2,46) = 10.648; p < 0.001), stemming from the fact that the difference in amplitude between CR and RC conditions was negatively related to pip duration in both hemispheres. The progressive increase in amplitude difference between corresponding C and R M100 onset responses as pip size decreased is consistent with previous reports of a ∼40 ms temporal window of integration during which stimulus attributes are accumulated in processes leading up to the formation of the M100 peak (Gage and Roberts, 2000; Roberts et al., 2000; Gage et al., 2006).
Unlike onset responses, which are qualitatively similar across conditions, transition responses differ greatly between CR and RC in both temporal dynamics and field distribution (Fig. 2). The transition from a constant tone to a random sequence of tone pips evokes two consecutive deflections, at ∼70 and 150 ms post-transition (Fig. 2A), whereas the first response to the opposite (RC) transition occurs at ∼150 ms post-transition (Fig. 2B). In some subjects it is followed by an additional deflection (visible at ∼1100 after stimulus onset) (Fig. 2B) with a dipolar distribution similar to the peaks in Figure 2A, but this feature was not very consistent across subjects.
Magnetic field distributions differ between CR and RC responses: both response peaks in Figure 2A are of opposite polarity from the first response peak in Figure 2B. Following the nomenclature used to describe auditory evoked onset responses, CR transitions evoke prominent M50 (P1 in EEG) and M150 (P2 in EEG) responses, whereas RC transitions evoke no M50 response, the first activation being a late M100 response (N1 in EEG). The sources underlying the two dipolar patterns, disregarding their sign, are too close to be adequately differentiated with the spatial resolution of our recording technique. However, given the origin of the MEG signal, sustained dendritic currents in pyramidal neurons (Nunez and Silberstein, 2000), spatial distributions of opposite polarity likely reflect the activation of distinct neural substrates (Lutkenhoner, 2003) (see also Jones, 2002).
Therefore, the response to the RC transition is not only delayed with respect to that of the CR transition, but also involves a different neural population. A similar asymmetry was reported previously for responses to changes between interaurally correlated and uncorrelated noise (Chait et al., 2005). This is further discussed below.
Dependence on pip duration
An ideal observer can immediately detect the transition in the constant-to-random case, but must wait at least the duration of a pip to detect the opposite transition (Fig. 1). The listener needs that time to distinguish the RC transition from a pip-to-pip transition within the pip train. We therefore expect RC, but not CR responses to exhibit dependence on pip size. Figure 4A illustrates the response to CR and RC transitions for each pip duration. Indeed, for RC, the amplitudes (Fig. 4C) and latencies (Fig. 5C) exhibit a dependence on pip size, such that deflections occur earlier and with a larger amplitude for shorter pips. A repeated-measures ANOVA on RC peak amplitudes (Fig. 4C), with hemisphere and pip duration as factors, revealed only a main effect of pip duration (F(2,46) = 54.69; p<0.0001). Planned comparisons showed that amplitude significantly increased with decreasing pip size between 60 and 30 ms in both hemispheres (paired sample t test, df = 23; LH, t = −4.388, p < 0.0001; RH, t = −2.3, p = 0.03) and between 30 and 15 ms in the right hemisphere (t = −6.67; p < 0.0001).
For response latencies, a repeated-measures ANOVA on RC peak latencies (Fig. 5C, light gray bars), with hemisphere and pip duration as factors, revealed only a main effect of pip duration (F(2,46) = 80.06; p < 0.0001). Planned comparisons indicated that latency decreased with decreasing pip size between 60 and 30 ms (paired sample t test, df = 23; LH, t = 4.213, p < 0.0001; RH, t = 5.825; p < 0.0001) and 30 and 15 ms (LH, t = 5.811, p < 0.0001; RH, t = 2.61, p = 0.01) in both hemispheres. Moreover, after subtracting the appropriate pip duration, the latencies were ∼150 ms and did not differ between conditions (Fig. 5C, dark gray bars), and a follow-up experiment with 120 ms tone pips also found the same corrected latency. This indicates that responses scale precisely with stimulus properties, such that the brain response lags the ultimate frequency transition by 150 ms plus one pip size.
In contrast to these findings for RC stimuli, for CR transitions, the two successive deflections at ∼70 and 150 ms after change, in both hemispheres, do not differ significantly between pip duration conditions in either amplitude or latency (Figs. 4B, 5A,B).
One aspect of the response to CR stimuli that is dependent on pip duration is the emergence of a third, “M100-like” peak (Fig. 6, II), visible in Figure 4A as a discrete bump between the two major peaks. This peak, with a dipolar distribution similar to that of the stimulus onset response (and of the response to change in RC stimuli) is not prominent for the 15 ms condition, but is visible in the 60 ms condition (still dominated, however, by the much larger M50 and M150 peaks). In Figure 6, we replot the data for the 60 and 15 ms conditions compared with the response evoked by a constant-to-constant stimulus (a transition between two constant-frequency tones; green) acquired in a follow up study. Because the number of participants in the main and follow up studies is substantially different, response amplitudes and latencies are not directly comparable. But what is clear from Figure 6 is that, in terms of response dynamics, as pip duration increases, responses to constant-to-random stimuli resemble responses to constant-to-constant transitions. The amplitude difference between the 60 and 15 ms conditions parallels the differences seen at onset (Fig. 3).
The main finding of this study is a fundamental asymmetry in the auditory cortical responses to acoustically symmetric transitions between “random” and “constant” signals (Fig. 2). The first observed peak in the constant-to-random transition occurred ∼80 ms earlier than the first observed peak in the reverse transition, and with an opposite dipolar distribution that indicates distinct underlying neural machinery (Lütkenhöner, 2003). Closely parallel response characteristics were found in another study (Chait et al., 2005) measuring MEG responses evoked by transitions between noise stimuli that were interaurally correlated (identical at the two ears) or uncorrelated (different signals at the two ears) (Fig. 7). The stimuli of the two studies differ acoustically (narrowband vs broadband, monaural vs binaural, stationary vs fluctuating), and the perceptual attributes of the change are accordingly different. However, they share a common abstract characteristic: a transition between disorder (or random fluctuation) and order (or constancy). Binaural disparity is hypothesized to be centrally represented as a form of subtraction of the signals at the two ears (Durlach, 1963). For interaurally correlated noise, the result of this process is a constant value of zero, whereas uncorrelated noise results in a randomly fluctuating value. Detection of a transition from correlated to uncorrelated noise can occur rapidly, but detection of the opposite transition requires some temporal integration to distinguish between the “zero” characteristic of the correlated state and a spurious zero value caused by random fluctuations in uncorrelated state. The temporal integration requirement in the binaural case resembles that discussed for our tone-sequence stimuli, and the similarity of responses between studies suggests that identical mechanisms might be at work in both cases. In particular, the notable resemblance of responses between studies excludes explanations in terms of particular stimulus features, suggesting instead that both experiments tap the same “edge detection” computation: identical cortical processing signatures to temporal edges that share the same abstract statistical properties.
The temporal-edge response that we characterize appears to be distinct from the mismatch-negativity (MMN) response derived from change-detection paradigms commonly used to study adaptation to stimulus environment and novelty detection (for review, see Polich 2003). In MMN paradigms, the occurrence of rare, deviant stimuli inserted in a sequence of repeated “standard” stimuli elicits a brain potential that peaks between 150 and 250 ms after deviant onset (Näätänen et al., 1978; Kujala and Näätänen, 2003). The MMN is thought to reflect a discrepancy between the memory trace, or expectations generated by the standard stimulus, and the new, deviant information (Näätänen, 1992; Sams et al., 1993) or processes that update the internal representation when a previously registered regularity is violated (Winkler et al., 1996; Winkler, 2003). MMN experiments are usually conducted with sequences of stimuli separated by silent intervals, for which the notion of “event” (standard or deviant) is straightforward. However, ecological scenes are rarely so neatly segmented; changes are superimposed on the ongoing waveform that enters the ears and a listener thus needs a mechanism for detecting the boundaries of events within an ongoing stimulus. It is such a mechanism that we believe we are tapping into with these experiments. Our paradigm thus targets a stage of neural processing that logically (and temporally) precedes the novelty-detection mechanisms probed in standard MMN designs. Indeed the earliest transition-related responses that we observed occurred with significantly shorter latencies than typical MMN responses, and with temporal/morphological asymmetries that are not found in typical MMN studies. It is quite possible, however, that some generators are common to the two types of responses (e.g., the later peaks observed here) (see also Jones, 2002).
What computations do these responses reflect? The early deflections observed at ∼70 ms post-transition in the constant-to-random (or correlated-to-uncorrelated) conditions may reflect the output of a “minimal integration window,” representing the integration that occurs at subcortical and early cortical processing stages (the latency is thus a measure of the minimum duration the system takes to respond to an apparent change). In contrast, the random-to-constant (or uncorrelated-to-correlated) conditions require longer integration and the different surface distribution maps suggest that this is provided by a distinct neuronal population. The amount of additional temporal integration (and, thus, the response latency) depends on stimulus statistics (see below). Put differently, this process reflects the acquisition (for RC) or discharge (for CR) of a representation of stimulus regularity. Indeed, responses to CC transitions have characteristics that resemble a combination of both (Fig. 6). The first response at ∼70 ms might reflect detection of a deviation from the previous “constant” representation, and that at ∼100 ms, the construction of a representation of the new constant signal. It is interesting to note that the second deflection in the CC transition (Fig. 6, II), which resembles the response that characterizes the RC transitions, occurs ∼50 ms earlier than the pip-duration-adjusted latency of the latter. This is possibly attributable to facilitatory (“resetting”) effects provided by the mechanisms underlying the first (∼70 ms after change) peak in the CC transition, which is absent in RC responses.
How adjustable are these integration mechanisms? As noted above, an observer presented with an RC stimulus requires a certain amount of time after the transition to discriminate it from a transition between pips. For equal-duration pips, this time is at least one pip duration. Importantly, the latencies of early auditory cortical responses observed by MEG in fact reflect this adjustment accurately, even when the parameter is not directly behaviorally relevant (subjects are performing a task that does not require the processing of transitions in the stimuli or adjustment to stimulus pip duration). Our choice of pip durations in the 15–60 ms range was intended to bracket a value (30 ms) that has been proposed as the size of a putative cortical short-term integration window (Poeppel, 2003; Wang et al., 2003). We expected responses to 15 and 30 ms stimuli to exhibit a similar response pattern, distinct from the 60 ms condition. Instead, we found a regular progression of response latencies from 15 to 30 to 60 ms, and to 120 ms in the follow-up control. It is therefore necessary to use smaller and larger pip durations to investigate the limits of this cortical adjustment or to interleave the pip durations.
The stimuli used here are simple examples of order/disorder. One may ask whether the relevant characteristic of R stimuli (as opposed to C) is randomness, or mere temporal fluctuation. We addressed the question in an additional study (our unpublished observation) that showed the same response patterns when the constant tone was replaced by a regularly alternating pip sequence (Fig. 8), suggesting that the important factor is indeed a transition between regularity and irregularity, and not simply between constant and varying signals. It is also not possible to rule out the alternative that R stimuli cover a wider spectral range than C. However, the fact that we detect entirely parallel response asymmetries to transitions in interaurally correlated wideband noise makes an explanation in terms of differential spectral content unlikely.
The current paradigm, based on measuring responses to transitions within an ongoing stimulus, is not in itself new. Similar stimulus configurations, a test signal possessing a feature of interest, immediately preceded by a baseline signal whose properties match those of the test signal except for the experimentally relevant dimension, are increasingly used in the literature (Lavikainen et al., 1995; Kaernbach et al., 1998; Martin and Boothroyd; 2000; Jones and Perez, 2002; Krumbholz et al., 2003; Gutschalk et al., 2004; Chait et al., 2006). Responses to these transitions tend to be interpreted as reflecting processing specific to the test feature, distinct from the generic stimulus energy-onset response. The present study introduces a different approach to the interpretation of these transition responses as reflecting the processing of abstract “auditory temporal edges.” Indeed, previous MEG studies measuring responses to disorder/order transitions between irregular and regular click trains (Gutschalk et al., 2004) and between white noise and iterated rippled noise (Krumbholz et al., 2003; Rupp et al., 2005) report responses that share the asymmetry characteristics of those observed here. These responses have been interpreted by the authors as related to pitch processing mechanisms because all other aspects of the stimuli (spectral content, energy) are not altered in the transition. However, the fact that they are similar to responses evoked by our stimuli indicates that they may not be specific to pitch processing mechanisms per se, but to a process that handles transitions between states that differ along a more abstract dimension (e.g., degree of regularity or order).
Do the asymmetric brain responses correlate with behavioral asymmetries? There is some evidence (Chait et al., 2005) that this is the case; however, the exact relationship between recorded MEG responses and behavior should be elaborated on in future studies. These investigations may provide a key as to the aspects of stimulus statistics to which listeners are sensitive (Yost et al., 2005) as well as illuminate the dimensions of auditory signals that are relevant for the construction of perceptual representations.
This work was supported by National Institutes of Health Grant R01DC05660 (M.C., D.P.). We are grateful to Jeff Walker for excellent MEG technical support.
- Correspondence should be addressed to Maria Chait, Equipe Audition, Département d'Etudes Cognitives, École Normale Supérieure, 29 rue d'Ulm, Paris 75230, France.