Surviving in a complex and changeable environment relies on the ability to extract probable recurring patterns. Here we report a neurophysiological mechanism for rapid probabilistic learning of a new system of music. Participants listened to different combinations of tones from a previously unheard system of pitches based on the Bohlen-Pierce scale, with chord progressions that form 3:1 ratios in frequency, notably different from 2:1 frequency ratios in existing musical systems. Event-related brain potentials elicited by improbable sounds in the new music system showed emergence over a 1 h period of physiological signatures known to index sound expectation in standard Western music. These indices of expectation learning were eliminated when sound patterns were played equiprobably, and covaried with individual behavioral differences in learning. These results demonstrate that humans use a generalized probability-based perceptual learning mechanism to process novel sound patterns in music.
- probability learning
- auditory perception
- pattern processing
- event-related potentials
- mismatch negativity (MMN)
The brain's ability to perceive sound patterns is necessary for speech and music. Electrophysiological and neuroimaging evidence indicate that while the auditory cortex analyzes sound features such as pitch (Zatorre et al., 1994), the lateral prefrontal cortex further processes stimuli and selects for further actions (Alain et al., 1998; Miller and Cohen, 2001). Sound pattern learning can be measured using various brain signatures. Frequency and harmonicity tuning in the auditory cortex depends on interactions with sounds, as has been observed from song learning in birds (Grace et al., 2003) and noise exposure in rats (Zhang et al., 2002). Violations of simple sound patterns in animals and humans elicit distinct brain signatures including a negative event-related potential (ERP) waveform onsetting 150–210 ms after pattern violation (Näätänen et al., 1982; Näätänen and Alho, 1995; Deouell and Bentin, 1998; Woldorff et al., 1998) termed the mismatch negativity (MMN). The MMN is generated in the superior temporal plane (Näätänen and Alho, 1995; Woldorff et al., 1998), is thought to index echoic memory (Alain et al., 1998; Näätänen et al., 2005), and at a cellular level is dependent on intact functioning of NMDA receptors (Javitt et al., 1996).
Further brain signatures of structural violation are elicited by language and Western music. Language studies have shown that syntactical violations elicit a left-lateralized negativity peaking around 200 ms termed the ELAN (early left anterior negativity) (Hahne and Friederici, 1999). Semantically incongruous words generate the N400, a negative waveform largest central-parietally around 400 ms after word onset (Kutas and Hillyard, 1980; Bentin et al., 1993). In the music domain, violations of Western musical rules or expectations are shown to elicit a negative-polarity ERP at 150–210 ms (Koelsch et al., 2000). This waveform is largest frontally and was originally observed to be right-sided and thus termed the early right anterior negativity (ERAN) (Koelsch et al., 2000), but has been later observed as bilateral and termed the early anterior negativity (EAN) (Loui et al., 2005; Leino et al., 2007). Additionally, violations in Western music elicit a late negativity (LN) or N5 (Koelsch et al., 2000; Loui et al., 2005), a negative-going waveform 400–600 ms after the unexpected chord, largest over prefrontal sites.
While EAN and LN components are elicited by unexpected chords in traditional Western music, nothing is known about neural processing of non-Western music. Thus, the EAN and LN could reflect processing of rules specific to Western music (Leino et al., 2007; Miranda and Ullman, 2007), or more general processing of sound patterns in any system of pitches. Children and adults with no formal musical training show ERPs to Western musical violations (Koelsch et al., 2003; Loui et al., 2005), suggesting either an innate system specialized for musical processing (Peretz, 2002), or a rapid and implicit ability to learn musical patterns based on probabilities (Huron, 2006). If the latter is true, novel and/or non-Western musical systems may elicit similar brain signatures.
We tested the hypothesis that music perception recruits rapid probability learning by measuring electrophysiological responses to a novel, unfamiliar musical system. In the current study, we examined the development of auditory pattern perception and context integration by manipulating probabilities with which participants heard these novel chord progressions. We tested whether these novel sound patterns elicited similar brain responses observed in well learned Western music. Furthermore, we traced the emergence and individual differences in auditory learning using these brain signatures.
Materials and Methods
Design of a new musical system.
Musical systems around the world are based on the octave, which is a 2:1 ratio in frequency. For instance, the most commonly used Western scale has 12 logarithmically even divisions of the octave, such that the equation for the frequency (F) of each note is as follows: F = k × 2(n/12), where n is the number of pitches along the scale and k is taken to be the reference point of the scale, usually 440 Hz.
In contrast to the Western musical system, the new musical system used in the present study is based on the Bohlen-Pierce scale, an artificial scale that has pitches recurring at the 3:1 (tritave) rather than 2:1 ratio (octave) (Krumhansl, 1987; Mathews, 1988; Sethares, 2004). The Bohlen-Pierce scale contains 13 logarithmically even divisions of the tritave, and the frequency of each tone in the scale is expressed by the following formula: F = k × 3(n/13).
In these experiments, three different starting points (k) were used, with k being set to 220 Hz, 289.5 Hz, and 167.2 Hz. These three starting values correspond to n of 10, 0, and 3, three neighboring keys in the Bohlen-Pierce scale (Krumhansl, 1987). The use of three starting values increases variability among the stimuli to ensure that results observed reflected generalized pattern learning, rather than rote memory of a single stimulus. For each set value of k, it was possible to solve for n such that the resultant F values formed tones with frequencies that were approximately related to each other in low-integer ratios. As tones with frequencies that form low-integer ratios (such as the Pythagorean ratios of 3:4:5) are known to sound consonant and relatively pleasant when played together (Kameoka and Kuriyagawa, 1969; Blood and Zatorre, 1999; Sethares, 2004), three tones that approximated ratios of 3:5:7 could be played simultaneously to form chords. Four of these chords were played sequentially to form a chord progression known as the “standard” chord progression. The progression of chords were chosen such that each chord shared one tone with its predecessor and successive chords contained no large leaps, in accordance with perceptual principles that give rise to voice-leading principles in Western music (Huron, 2001). These three standard chord progressions, each corresponding to one value of k and a set of 12 (3 simultaneous × 4 sequential) values of n (see Fig. 1, left, for an illustration of tone frequencies in the three standard chord progressions), were used as the most common stimulus type.
To investigate the violation of novel pitch patterns, the “deviant” chord progression was constructed by substituting another three-tone chord, which also obeyed the low-integer ratio of 3:5:7, into the position of the third chord for each of the chord progressions with different starting points. This resulted in another stimulus type of three chord progressions, in contrast to the three standard chord progressions (Fig. 1, middle).
In addition, a third stimulus type was created for each starting point, forming the “fadeout” chord progressions. These stimuli were identical to the standard stimuli, except that one of the four chords in each chord progression was changed in amplitude so as to create a rapid fadeout in volume (Fig. 1, right). The chord which contained the amplitude decrease was randomized. During the experiment, participants were required to respond via button press whenever they heard the volume decrease. This ensured that participants listened attentively to all stimuli, but responded to a feature unrelated to pitch patterns.
All tones were artificially generated sine waves (pure tones); this was to avoid any possible influence of overtones in most instrumental timbres on the perception of the novel scale (Sethares, 2004). The three stimulus types were presented at different probabilities, with standard sounds being presented with 70% probability, deviant sounds at 20%, and fadeout chords at 10%. The three starting points k were used equiprobably. (See Fig. 1 for a schematic of stimuli.) During the experiment, the participant's task was to press a button upon detecting fadeout sounds, ensuring attentive listening to all auditory stimuli.
Twelve normal healthy adults (8 females, 4 males, mean age 23.5 years, age range 19–29) participated in this study. All subjects were right handed and reported having normal hearing, normal or corrected-to-normal vision, and no history of neurological or psychiatric disorder. All subjects were recruited as volunteers from the University of California at Berkeley community; each subject gave written informed consent before the experiment and was paid $10 per hour for participation. Subjects had no prior exposure to the musical system used in the present study. All research was approved by the Committee for the Protection of Human Subjects at UC Berkeley.
Participants were seated in a sound-attenuated, electrically shielded chamber. Electroencephalograms (EEGs) were recorded while sounds were presented. Participants were instructed to make a button-press response on a joystick immediately upon detecting each fadeout chord. Stimuli were presented at a level of 70 dB on a PC using Presentation 9.90 software (Neurobehavioral Systems) with a pair of Altec Lansing computer speakers, which were placed 100 cm from each ear. Each experiment included 10 runs, with each run containing 100 chord progressions in total. Thus, each participant heard 1000 chord progressions overall, with 700 being standard, 200 being deviant, and 100 being fadeout targets.
EEGs were recorded from a 64-channel electrode cap which corresponded to the international 10–20 system, with six additional external electrodes placed at the outer canthi of the eyes, below the left eye, on the nose, and on each mastoid. EEGs and behavioral data were acquired using a BioSemi system with ActiView 5.1 software. Electrode impedances were kept below 25 kΩ for all electrodes. All channels were continuously recorded with a bandpass filter of 0.01–100 Hz and referenced to the right mastoid during recording. The raw signal was digitized with a sampling rate of 512 Hz. Recordings took place in an electrically shielded, sound-attenuated chamber. A video zoom lens camera was used to monitor participants' movements during recording.
Data analysis and statistical testing.
Raw EEG data were imported into BioSemi software BrainVision Analyzer for analysis. Raw data were referenced to the averaged signal of the left and right mastoids and high-pass filtered at 0.5 Hz to eliminate low-frequency drift. EEG epochs containing fluctuations of >100 μV was rejected to eliminate noise due to eye blinks, eye movements, excessive muscle activity, and other artifacts. ERPs were segmented and averaged separately for each condition (standard, deviant, and fadeout) over the time window of 200 ms before stimulus to 1000 ms after stimulus, and then bandpass filtered at 0.5–20 Hz and baseline corrected relative to a period of 200 ms before stimulus to 0 ms (stimulus onset). ERPs were grand averaged across 12 subjects on mean amplitudes across latency windows of 150–210 ms (EAN) and 400–600 ms (LN). Peak and latency ANOVAs were conducted over the most activated site for each time epoch: FCz (EAN) and Fpz (LN). Scalp topography statistics were calculated by clustering electrodes into five regions: anterior frontal (Fpz, Fp1, Fp2, AFz, AF3, AF4, AF7, AF8), frontal (Fz, F1, F2, F3, F4, F5, F6, F7, F8, FCz, FC1, FC2, FC3, FC4, FC5, FC6, FT7, FT8), central (Cz, C1, C2, C3, C4, C5, C6, CPz, CP1, CP2, CP3, CP4, CP5, CP6, T7, T8, TP7, TP8), parietal (Pz, P1, P10, P2, P3, P4, P5, P6, P7, P8, P9, POz, PO3, PO4, PO7, PO8), and occipital (Oz, O1, O2, Iz). Frontal electrodes were selected based on regions of interest defined for the EAN in previous studies (Koelsch et al., 2007).
Follow-up behavioral experiment.
To assess the relationship between the observed brain potentials and the behavioral ability to learn grammatical rules, participants of the EEG experiment were invited for a behavioral follow-up experiment that assessed their learning and generalization of grammatical rules.
Chord progressions used as stimuli in the EEG experiment were used as an artificial grammar from which sequences of pitches were generated. Each note in the chord progression could either repeat itself, go up or down vertically within the chord, or go forward to any note within the next chord. Supplemental Figure 1 (available at www.jneurosci.org as supplemental material) illustrates the derivation of a pitch sequence, or a melody, from the pitches in the chord progression. Four hundred ten melodies were composed using this artificial grammar. At the end of a 30 min exposure phase to these melodies, participants were given a two-alternative forced choice test to measure their ability to generalize the rules they had implicitly learned from exposure to the new musical system toward new melodies. The generalization test contained 10 trials. In each trial, two melodies were presented; one melody followed the grammatical rules whereas the other melody violated the rules. Participants were asked to choose the melody that sounded most familiar to them. The identification of novel grammatical melodies is an appropriate test of generalization as it requires knowledge of the combinations of possible melodies that can be generated from chord progressions.
ERPs to standard and deviant sounds
First we compared event-related potentials elicited by high-probability standard sounds with low-probability deviant sounds. ERPs elicited by deviant sounds showed both the EAN and the LN effects when compared with ERPs of standard sounds (Fig. 2). Topography statistics were tested using electrodes clustered by region (prefrontal, frontal, central, parietal, and occipital). All other statistics were calculated using the single frontal electrode FCz for the EAN and the prefrontal electrode Fpz for the LN, as these sites were hypothesized to show EAN and LN effects based on previous studies of traditional Western music.
An EAN was observed for deviant chords at 150–210 ms after stimulus onset (F(1,22) = 5.70, p < 0.02), with a bilateral frontal scalp distribution (F(4,118) = 13.87, p < 0.001). An LN to deviant chords was also observed (F(1,22) = 13.91, p = 0.001), and this response was maximal bilaterally over prefrontal channels onsetting at 400 ms after stimulus (F(4,118) = 17.08, p < 0.001). EAN versus LN topographies differed as indicated by a three-way interaction between time course (150–210 vs 400–600 ms), stimulus type (standard vs deviant), and electrode region (prefrontal, frontal, central, parietal, occipital): F(4,236) = 2.59, p < 0.05; indicating that different neural processes contributed to the EAN and the LN respectively. Importantly, the topography and timescale of these two components paralleled that observed for traditional Western music (Loui et al., 2005), suggesting that perceiving novel patterns of pitches recruits the same neural systems engaged in the perception of Western music.
ERPs and probability learning
To examine probability learning as a function of time, data from the EEG recording sessions were divided evenly into three temporal blocks. The first and last blocks of EEG data (first block = first 20 min, last block = last 20 min) were compared separately to assess the evolution of EAN and LN effects over the 1 h duration of the recording session. ERPs for standard stimuli in the early and late phases were identical (Fig. 3a). However, a comparison of the deviant stimulus types in the early and late phases showed an enhanced EAN in the late phase (F(1,22) = 4.99, p = 0.03) (Fig. 3a). No significant differences between early and late phases were observed for the LN.
The EAN results may reflect the brain's sensitivity to differential probabilities of sounds. However, an alternative account is that these effects were driven by a physical difference between standard and deviant chords. Such a physical difference may include surface features of the deviant stimuli such as dissonance arising from interactions between tones in a chord (Kameoka and Kuriyagawa, 1969). To address the alternative possibility that the EAN reflects surface features of sounds rather than their relative probabilities of occurrence, we implemented an additional control condition. Before the beginning of the experiment, standard and deviant sounds were presented equiprobably (45%). The remaining 10% of sounds contained a rapid fadeout in amplitude, and, as in the rest of the experiment, participants indicated when they detected these amplitude fadeouts. No significant difference between standard and deviant chords was observed when the sounds were played equiprobably (Fig. 3b). The fact that this equiprobable control condition elicited no EAN or LN effects supports the claim that subjects were sensitive to the relative conditional probabilities of sound patterns, rather than surface properties of the sound stimuli or the occurrence of the rote deviant items.
If the EAN indexes probability learning, an individual's behavioral capabilities in probability learning should be reflected in EAN effects. To test this hypothesis, we conducted a follow-up behavioral experiment of probability learning in another session with the same participants.
EAN reflects grammatical generalization
Replicating previous results (Loui and Wessel, 2008), the behavioral follow-up experiment demonstrated successful rule generalization: participants were able to identify melodies that followed the same rules as being more familiar, even when they had not heard the specific melodies before. Performance on generalization trials were confirmed as being above chance by a t test against chance level of 50% correct (mean performance = 66.4% correct; SD = 14%; two-tailed t test against chance: t(10) = 3.79, p < 0.01).
To relate behavioral results to electrophysiological indices, results for each individual (in proportion correct out of 1.0) were correlated with the size of the early anterior negativity for each participant. Individual participants' generalization scores correlated with the amplitude of their EAN in the ERP study (Pearson's r = 0.75; p = 0.02, two-tailed) (Fig. 4), suggesting that the EAN may be an index of grammar learning.
The current data show that the human brain rapidly and flexibly integrates novel sound patterns to form a musical context. Both the early anterior negativity and the late negativity, considered indices of syntax processing in Western music, are elicited by low-probability sound patterns in the novel music system. The time course and scalp topographies of these waveforms parallel findings from Western music, supporting a shared neural mechanism for processing well known as well as novel sound patterns.
We also observed rapid probability-based learning during the course of the experiment. Both EAN and LN were significant in the main comparison of standards vs deviants; and the EAN was significantly larger in the late phase when compared with the early phase. The increase in amplitude of the EAN over the course of the experiment reflects gradual development of expertise as a function of exposure, suggesting that the EAN is an effective index of probability learning in the auditory modality, in line with previous findings of larger MMN in musicians than nonmusicians (Tervaniemi et al., 2001).
These results are also consistent with language learning research using both natural and artificial languages (Friederici et al., 2002). Second-language learners elicit increased N400 component amplitudes for incorrect words during the course of language acquisition (McLaughlin et al., 2004). Our results converge with this observation by showing adaptive functioning of the brain via buildup of expectations and the development of context-dependent sensitivity for incongruous events. However, the present data reveal rapid learning over the course of 1 h, compared with linguistic studies, which report development of expertise over several months. Finally, EAN amplitude reflected behavioral performance in grammar learning, suggesting that the EAN may provide a neural correlate of individual differences in learning.
The present data suggest that the EAN reflects perceptual mechanisms of expectation violation, whereas the LN may reflect further cognitive analysis, specifically an integration of an unexpected event into its context. MEG and patient data (Alain et al., 1998; Woldorff et al., 1998) have implicated the superior temporal planes as sources of the EAN, with top-down modulation from the lateral prefrontal cortex (Maess et al., 2001; Barcelo and Knight, 2007). The lateral prefrontal cortex has been implicated in maintaining contextual information (Huettel et al., 2002; Barcelo and Knight, 2007), converging with neuroimaging results (Levitin and Menon, 2003) supporting the view that musical structure is processed in a neural network in which prefrontal areas couple with auditory cortices.
As rapid discrimination learning has been shown previously with ERPs, specifically the MMN (Näätänen et al., 1993), one question arises regarding whether the EAN is same or different from MMN. The ERAN or EAN is thought to be a special case of the MMN (Koelsch, 2009) which reflects the processing of memory traces and rules specific to musical syntax (Miranda and Ullman, 2007). In this case we employ the nomenclature EAN, rather than ERAN, to reflect the observation that the waveform observed here is not right-lateralized, but appears to be bilateral across multiple studies (Loui et al., 2005).
Another question regarding the present data concerns why the LN does not change to reflect learning over time or individual differences in grammar learning. Several possibilities might account for this observation: one is that the task does not require grammar learning, but rather the efficient monitoring of sound volume, a feature unrelated to grammatical structure or musical syntax. If the LN is sensitive to task effects, neural generators of LN may not be differentially taxed by standard and deviant sound types as the experiment progresses. Another, less interesting explanation is that the LN is more sensitive to experimental noise compared with the EAN, and therefore more power is required to detect amplitude changes as a function of individual differences in grammar learning.
Together, our results show that the perception of pitch patterns engages a generalized neural mechanism which rapidly develops expectations and integrate sounds into new contexts. Such neural mechanisms of learning are dictated by the probabilities of sounds and may also subserve speech perception (Hickok and Poeppel, 2000), language acquisition (Friederici et al., 2002), and more general identification of patterns and contexts (Barcelo and Knight, 2007) in the development of sensitivity toward probable events in an ever-changing environment.
This work was supported by National Institute of Neurological Disorders and Stroke Grants NS21135 and PO 40813. We thank Pearl Chen and Judy Wang for help with data collection, Christina Karns, Catherine Dam, Ani Flevaris, and Mark Kishiyama for help with experimental setup and data analysis, Carol Krumhansl for helpful discussions on designing the new music system, and Ayelet Landau and Carla Hudson Kam for helpful comments.
- Correspondence should be addressed to Psyche Loui, Department of Neurology, Beth Israel Deaconess Medical Center, Harvard Medical School, 330 Brookline Avenue, Palmer 127, Boston, MA 02215.