I Heard That Coming: Event-Related Potential Evidence for Stimulus-Driven Prediction in the Auditory System

The auditory system has been shown to detect predictability in a tone sequence, but does it use the extracted regularities for actually predicting the continuation of the sequence? The present study sought to find evidence for the generation of such predictions. Predictability was manipulated in an isochronous series of tones in which every other tone was a repetition of its predecessor. The existence of predictions was probed by occasionally omitting either the first (unpredictable) or the second (predictable) tone of a same-frequency tone pair. Event-related electrical brain activity elicited by the omission of an unpredictable tone differed from the response to the actual tone right from the tone onset. In contrast, early electrical brain activity elicited by the omission of a predictable tone was quite similar to the response to the actual tone. This suggests that the auditory system preactivates the neural circuits for expected input, using sequential predictions to specifically prepare for future acoustic events.


Introduction
In every natural environment, part of the acoustic information is lost due to overlaps between concurrent sounds. Fortunately, our auditory system is able to fill in such gaps, as exemplified in phoneme restoration (Warren, 1970) and the continuity illusion (Miller and Licklider, 1950) (cf. review by Bregman, 1990). However, it is unclear whether these are prospective or retrospective phenomena. The prospective account states that the system extrapolates from the information before the gap (i.e., derives a prediction about what should come), whereas the retrospective account suggests that the system reconsiders all available information after the continuation of the sound (i.e., restores the missing piece after detecting the gap).
On grounds of behavioral data, Bregman (1990) argued that the retrospective account is more reasonable. Yet recently, the idea that the auditory system works in a prospective, predictive manner has attracted much interest (Winkler et al., 1996;Näätänen and Winkler, 1999;Baldeweg, 2006;Denham and Winkler, 2006;Zanto et al., 2006;Grimm and Schröger, 2007;Schröger, 2007;Winkler, 2007;Dubnov, 2008). Current theories suggest that the auditory system constantly predicts what will come next in a sequence of tones. Such predictions would not only be beneficial for dealing with missing information, but also for the efficient processing of any upcoming stimulus that meets the predictions (Sinkkonen, 1999).
Arguments for the predictive account are mostly based on the fact that the auditory system detects violations in predictable sound sequences [as illustrated by the elicitation of the mismatch negativity (MMN) event-related potential (ERP) (Näätänen et al., 1978;Kujala et al., 2007;Näätänen et al., 2007)]. A reasonable explanation of this finding is that the auditory system extracts regularities from the input and compares new input with predictions derived from these regularities (Winkler, 2007). It is, however, equally possible to posit a retrospective explanation by suggesting that the system attempts to match each stimulus to the preceding sequence only after it has encountered the stimulus. This idea of recalculation may seem to be uneconomic, yet it cannot be ruled out on the basis of previous studies. The present study was designed to distinguish between the prospective and retrospective accounts. It is based on a comparison of ERPs elicited by tone omissions in sequences manipulating the predictability of the omitted tone. The sequence provides information as to which tone was omitted either before (predictable condition) or after (restorable condition) the omission, or else it provides no such information (control condition). Finding that the ERP response elicited by omissions in the predictable condition differs from that obtained in the other two conditions would support the hypothesis of the predictive nature of the auditory system, because a specific prediction about the upcoming tone can only be formed in the predictable condition. If, on the other hand, the auditory system works in a retrospective manner, similar omission ERP responses should be elicited in the predictable and restorable conditions, both differing from that obtained in the control condition, in which neither prediction nor restoration of the missing tone is possible.

Materials and Methods
Participants. Fourteen healthy volunteers (10 male, 4 left-handed; mean age 21.9 years) participated in the experiment. All participants had frequency thresholds not higher than 20 dB SPL in the 250 -4000 Hz range and no threshold difference exceeding 10 dB between the two ears (as-sessed with a Mediroll, SA-5 audiometer). None of the participants were taking any medication affecting the CNS. Before the beginning of the experiment, written informed consent was obtained from each participant according to the Declaration of Helsinki after experimental procedures and aims were explained to them. The study was approved by the Ethical Committee of the Institute for Psychology, Hungarian Academy of Sciences.
Apparatus and stimuli. Participants were seated in an acoustically shielded chamber at the Institute for Psychology, Hungarian Academy of Sciences. A computer screen was placed in front of them at a distance of 100 cm. Sinusoidal tones with an intensity of ϳ60 dB sensation level (above hearing threshold, adjusted individually for each participant) were presented binaurally via headphones in a continuous series with a stimulus-onset asynchrony (SOA) of 150 ms.
The duration of each tone was 50 ms (including 5 ms rise and 5 ms fall times). Tone frequencies were chosen from a set with 5% steps in the range of 400 -1500 Hz. In the control condition ( Fig. 1), the frequency of each tone was chosen randomly with equal probability within the set. In the restorable and predictable conditions, tones were presented in pairs, i.e., each frequency was repeated once before the next random choice of frequency. Note that the pair structure was induced solely by the frequency repetition, as tones were presented in an isochronous manner.
In each condition, 10% of the tones were replaced by silence. In the control condition, these omissions occurred quasirandomly with the restriction of at least four tones being delivered between two successive tone omissions. Additional restrictions were imposed in the restorable condition, where the omitted tone was always the first one of the pairs, and in the predictable condition, where the omission always occurred on the second tone of the pairs.
For each condition, 1350 stimuli and 150 omissions were presented. Stimulation was randomized individually for each participant (see Fig. 1 for stimulus sequence examples of the three conditions).
Procedure. Participants watched a silent, subtitled movie and were instructed to ignore the tones. Each condition was administered in one stimulus block of 3.75 min duration. Condition order was randomized separately for each participant. The experiment was administered conjointly with another ERP and a behavioral study, both of which are to be reported elsewhere.
Data recording and analysis. EEG was continuously recorded with Ag/ AgCl electrodes placed at Fz, Cz, Pz, F3, F4, C3, and C4 according to the international 10-20 system (Jasper, 1958). Additional electrodes were placed at the tip of the nose, which served as a reference, and at the left and right mastoid sites (LM, RM). Eye movements were monitored by electrodes placed above and below the left eye and at the outer canthi of both eyes, which were bipolarized off-line to yield vertical and horizontal electro-ocular activity (EOG), respectively. EEG and EOG signals were amplified (0 -40 Hz) by NuAmps amplifiers (Neuroscan), sampled at 250 Hz, and filtered off-line using a 1-20 Hz bandpass filter.
For each trial, epochs of 250 ms duration including a 50 ms prestimulus baseline were averaged with reference to stimulus onset (or "expected" stimulus onset in case of omissions) to form ERPs. Later latency ranges were not analyzed to avoid the confounding overlap from ERP responses elicited by the tone immediately following the omission. Epochs with amplitude changes exceeding 100 V on any channel were rejected from averaging, which led to the exclusion of 12.8% of the stimuli on average.
Epochs for tones and omissions were separately averaged for the three conditions (predictable, restorable, control) with the first two tones following an omission excluded from the averaging. As predictive processing should have the strongest effect immediately after the expected onset of the tone, following visual inspection, average ERP amplitudes were measured at F3, Fz, and F4 in the interval of 10 -50 ms after stimulus onset. Amplitudes were compared across conditions using a repeatedmeasures ANOVA with the factors condition (three levels: predictable, restorable, control) and electrode (three levels: F3, Fz, F4). Post hoc tests for statistical analyses were performed with the Bonferroni correction of the confidence level for multiple comparisons. The Greenhouse-Geisser correction (Greenhouse and Geisser, 1959) was applied when the assumption of sphericity was violated. Significant ANOVA effects are reported with the partial 2 effect size measure.
To control for context influences on processing the omissions (i.e., preceding frequency repetition vs change in the restorable vs predictable conditions), difference waveforms were calculated by subtracting from the ERPs elicited by omissions those elicited by tones delivered in the corresponding position (the first tone of the pairs in the restorable and the second tone of the pairs in the predictable condition). Because difference waveforms appeared to be modulated by condition through a longer period of time, the difference ERP amplitudes were measured in an interval of 10 -100 ms after stimulus onset and tested in an ANOVA of the same structure as was done for the omission responses.

Results
The ERPs elicited by tones (Fig. 2, top row) showed only a prominent P1 response with clear polarity reversal at the mastoid leads. The ERPs were very similar across conditions as confirmed by a nonsignificant influence of the condition factor on the ERP amplitudes (F (2,26) ϭ 0.235, p ϭ 0.792). In contrast, the ERPs elicited by the omissions (Fig. 2, middle row) differed between conditions in the analysis window of 10 -50 ms after the expected stimulus onset (F (2,26) ϭ 6.776, p Ͻ 0.01, 2 ϭ 0.343). Post hoc comparisons revealed that this was due to more positive amplitudes elicited in the predictable condition than in the restorable and control conditions (both p values Ͻ 0.05), whereas amplitudes in the latter two conditions did not significantly differ from each other (p ϭ 1.000). Neither tone nor omission ERP amplitudes were influenced by electrode (tones, F (2,26) ϭ 2.243, p ϭ 0.126; omissions, F (2,26) ϭ 0.008, p ϭ 0.992), nor was an interaction between the condition and electrode factors observed (tones, F (4,52) ϭ 0.423, p ϭ 0.792; omissions, F (4,52) ϭ 1.131, p ϭ 0.352). When controlling for the influence of the context preceding the omission by subtracting the ERPs elicited by tones from the omission responses obtained in the same context (Fig. 2, bottom row), a negative peak appeared on the difference waveforms. This negativity was brought about by subtracting the P1 elicited by tones from the omission ERPs, which showed no sign of P1 elicitation. Moreover, the differences between conditions obtained in the omission ERPs were visible in the context-corrected difference waveforms as well, and they continued throughout the P1 latency range. Consequently, in a latency range of 10 -100 ms after the expected stimulus onset, difference amplitudes were significantly modulated by condition (F (2,26) ϭ 3.463, p Ͻ 0.05, 2 ϭ 0.210). A planned contrast analysis showed that this was again due to ERPs in the predictable condition significantly deviating from ERPs in the restorable and control conditions (F (1,13) ϭ 7.047, p Ͻ 0.05, 2 ϭ 0.352). More specifically, one-sample, two-tailed t tests against zero revealed that the ERPs elicited by tones and by tone omissions did not significantly differ from each other in the predictable condition [t (13) ϭ Ϫ0.517, p ϭ 0.614], whereas significant differences between the activity elicited by omissions and the corresponding tones were obtained in the restorable [t (13) ϭ Ϫ2.942, p Ͻ 0.05] and control [t (13) ϭ Ϫ4.846, p Ͻ 0.001] conditions. Difference amplitudes were additionally influenced by the electrode factor (F (2,26) ϭ 3.432, p Ͻ 0.05, 2 ϭ 0.209) (due to more negative amplitudes at Fz than at F3, p Ͻ 0.05), but no interaction of condition with electrode was obtained (F (4,52) ϭ 0.645, p ϭ 0.633).

Discussion
The present data show a differential brain response to tone omissions depending on whether the omitted tone was fully predict-able (predictable condition) or whether the tone sequence only yielded information that "some tone will occur at a given time" (restorable and control conditions). The predictable and restorable conditions were identical except for the position of the omitted sound. Thus the difference between the ERP responses to omissions cannot be explained by differences in global characteristics of the sequence. Moreover, the presence of the predictability effect in the difference waveforms rules out the possible confound of late ERP effects elicited by the tone preceding the omission, since these are eliminated by the subtraction procedure. Therefore, the effect of sequential predictability observed in the present experiment suggests that the auditory system indeed predicts upcoming sounds when the sound input allows specific predictions.
The modulation of the ERPs by sequential predictability occurred at a very early time range, which corresponds to the auditory middle latency responses (Yvert et al., 2001). During the first 50 ms from the expected onset of the tone, the course of the ERP elicited by the omission of a fully predictable tone resembled that elicited by the actual tone presentation. This suggests that the system was set to process the expected tone, and that processing was only interrupted when the omission was detected, possibly shortly before P1 would have been elicited. The difference between conditions extending to ϳ100 ms suggests that sequential predictability modulated processing even after the omission was detected. Thus the present results strongly support the view that the auditory system continuously attempts to predict the sound input in the immediate future (Baldeweg, 2006;Schröger, 2007;Winkler, 2007;Dubnov, 2008). Note that the current stimulus paradigm and analyses were not optimized toward testing the omission-related MMN, and that the predictability effect was found substantially earlier than the expected latency range of the omission-related MMN [100 -150 ms from the onset of the expected tone (cf. Yabe et al., 1997Yabe et al., , 1998]. The current conclusion is supported by the results of Tervaniemi et al. (1994), who likewise observed initial processing similarities between tones and their omissions. Critically, this and other previous studies using tone omissions [e.g., Yabe et al. (1997omissions [e.g., Yabe et al. ( , 1998omissions [e.g., Yabe et al. ( , 2001, Shinozaki et al. (2003), and Winkler et al. (2005)] did not manipulate the sequential predictability of the omitted tone. The present study reveals that initial processing similarities are only observed with full predictability of the omitted tone, but not with partial predictability as given in the restorable and control conditions. Further corroborating evidence was provided by Sussman and Winkler (2001), who found that the second of two successive violations of the same auditory regularity within 150 ms does not elicit MMN when deviations always occur in pairs within the sequence (i.e., the second deviation is fully predicted by the first). However, the second of two successive deviations elicits an MMN when the sequence includes both single and paired deviations (i.e., the first deviance does not fully predict the second one). Another case for effects of the predictive power of the auditory system at the level of middle latency responses has been provided from the corollary-discharge approach (Baess et al., 2009). Baess et al. (2009) found an amplitude attenuation of the middle-latency Pa and Nb components for self-initiated relative to externally generated sounds. These results, as well as similar effects of auditory predictability (Kraemer et al., 2005;Widmann et al., 2007), may, however, have been based on conscious expectations, whereas in the present study predictability was extracted in a stimulus-driven manner as participants had no task related to the auditory stimuli. The present results thus show that the auditory system predicts upcoming sounds by using sequential relations that it has just extracted.
Based on results of a symbol-to-sound matching paradigm, Widmann et al. (2007) proposed that cortical auditory representations of expected sounds are preactivated and then compared against the actual sensory input. Possible neuronal bases for such preactivation could be formed by predictive oscillatory patterns (Engel et al., 2001). Widmann et al.'s (2007) participants performed a matching task between the auditory and visual patterns. However, in studies in which participants did not actively link the auditory and visual stimuli, presenting visual cues before each sound in a sequence failed to modulate the auditory MMN as a function of the predictive value of the visual cue (Ritter et al., 1999;Sussman et al., 2003). Thus it appears that whereas prediction within the auditory modality may be a stimulus-driven process, prediction across different modalities probably requires voluntary processing of cross-modal links.
The early ERP response to tone omission showed only effects of predictability but not of restorability as compared with the response elicited by omitting an unspecified tone (i.e., the control condition, in which the quality of the omitted tone cannot be established even after encountering the first tone following the omission). This does not rule out the possibility that in the restorable condition the missing information was restored after the arrival of the tone immediately following the omission. Therefore, the present results cannot be taken to show that the auditory system would never use restoration for filling gaps. With complex sounds, such as linguistic material, a retroactive strategy may be quite possible (Bregman, 1990). However, the present data show that the predictive strategy is indeed used by the auditory system. This finding fits with predictive elements implemented in mod-eling approaches of continuity perception (Masuda-Katsuse and Kawahara, 1999) and with animal data showing neuronal activity associated with illusory continuity percepts during noise occlusions (Petkov et al., 2007).
Beyond the specific phenomena of filling gaps, the present study provides electrophysiological evidence for the predictive character of human audition. This is in line with findings on tone repetition (Haenschel et al., 2005;Baldeweg, 2006), deviance detection (Sussman and Winkler, 2001;Paavilainen et al., 2007;Bendixen et al., 2008), and music processing (Kraemer et al., 2005;Zanto et al., 2006;Ladinig et al., 2009;Winkler et al., 2009), as well as fitting to current theoretical accounts of the auditory system (Baldeweg, 2006;Schröger, 2007;Winkler, 2007;Dubnov, 2008). Predictions within the auditory system can also be linked to wider theories on sensory and motor systems (Friston, 2005;Prinz, 2006;Schubotz, 2007). The evolvement of information over time is crucial for the auditory system, yet the system does not wait for this information to occur-it actively generates hypotheses about its environment (Gregory, 1980), as becomes increasingly evident in all sensory domains (Engel et al., 2001;Bar, 2007).