Critical periods in language acquisition have been discussed primarily with reference to studies of people who are deaf or bilingual. Here, we provide evidence on the opening of sensitivity to the linguistic environment by studying the response to a change of phoneme at a native and nonnative phonetic boundary in full-term and preterm human infants using event-related potentials. Full-term infants show a decline in their discrimination of nonnative phonetic contrasts between 9 and 12 months of age. Because the womb is a high-frequency filter, many phonemes are strongly degraded in utero. Preterm infants thus benefit from earlier and richer exposure to broadcast speech. We find that preterms do not take advantage of this enriched linguistic environment: the decrease in amplitude of the mismatch response to a nonnative change of phoneme at the end of the first year of life was dependent on maturational age and not on the duration of exposure to broadcast speech. The shaping of phonological representations by the environment is thus strongly constrained by brain maturation factors.
During the first year of life, speech perception becomes attuned to the native language. Infants first learn suprasegmental prosodic and then segmental phonetic properties (Werker and Tees, 1984; Kuhl et al., 1992; Dehaene-Lambertz and Houston, 1998). The nature of the mechanisms underlying phonetic attunement remain unclear. Statistical analysis of phoneme distributions available in speech input plays a crucial role (Maye et al., 2002). Systematic exposure to the most frequent well formed exemplars of phonemes may distort the initial phonetic space, decreasing the perceptual sensitivity in the neighborhood of native prototypes (Kuhl, 2000). Although still effective, by 10 months, passive exposure is less efficient at maintaining discrimination of nonnative contrasts (Yoshida et al., 2010), whereas meaning-related input (Yeung and Werker, 2009) and social interaction become more so (Kuhl et al., 2003).
Preterm infants receive broad speech stimulation several weeks earlier than full-terms. They hear the full frequency range of speech, which contrasts with the low-passed filtered speech fetuses hear in the womb, and experience face-to-face interactions with their caregivers, and sensorimotor and auditory feedback from their own vocalizations. Do preterms benefit from this richer environment? Vocalization amount positively correlates with exposure to parental talk in preterms (Caskey et al., 2011), but it is unclear whether preterms are able to extract linguistic regularities from the speech input. A recent study showed that discrimination of languages from the same rhythmic class—which takes place at ∼4.5 months in full-term infants—was not accelerated in healthy preterm infants (Peña et al., 2010). However, it can be argued that preterms have no advantage over full-terms in rhythm tasks because filtering by human tissues only weakly degrades speech rhythmic properties. In contrast, consonant phonetic perception is degraded in utero, particularly place and manner of articulation (Griffiths et al., 1994). Information relevant to the native consonant repertoire is thus more accessible to preterm infants than to fetuses. If perceptual attunement to the native phonetic repertoire is only influenced by the statistical analysis of environmental input and communicative interactions, preterms should be ahead of full-terms.
Using event-related potentials, we compared discrimination of a native and a nonnative place-of-articulation contrast in healthy full-term and preterm infants. Full-term infants should respond equally to both contrasts at 9 months, whereas at 12 months, their response should be weaker toward nonnative than native contrasts (Rivera-Gaxiola et al., 2005). Preterms were born and exposed to broadcast speech nearly 3 months earlier than full-term infants (Fig. 1). We thus evaluated whether preterm infants at 9 months of postterm age, corresponding to 12 months of exposure to broadcast speech, behave as full-terms matched by maturational age or by duration of exposure to speech.
Materials and Methods
We tested two groups of healthy 9- and 12-month-old full-term infants (FT9 and FT12) and two groups of preterm infants evaluated at the same maturational age (PT9 and PT12), all from a monolingual Spanish-speaking environment. Thirty-two infants (5 FT9, 12 FT12, 8 PT9, and 7 PT12) were excluded because they presented <12 artifact-free EEG trials per condition. We thus report on 32 FT9 (20 male), 60 FT12 (39 male), 24 PT9 (14 male), and 32 PT12 (23 male). In the preterm groups, infants were born between 27 and 31 weeks gestational age (wGA; mean = 28.6 ± 1.6 wGA) and in the full-term groups, between 38 and 42 wGA (mean = 39.5 wGA). At birth, all infants: (1) had Apgar scores higher than 6 and 8 at 1 and 5 min, respectively; (2) presented normal weight, size, and cranial perimeter for their gestational age; (3) demonstrated normal otoacoustic emissions; and (4) had neuropediatric scores corresponding to their gestational age. In preterm infants, auditory brainstem-evoked responses and brain ultrasonography were normal for gestational age. All infants came from lower-middle socioeconomic class families and presented normal clinical outcomes over 4 years.
Preterm infants were stimulated with broadcast speech in several circumstances. First, after birth they were exposed to speech stimulation outside the incubator during a Kangaroo procedure, a medical protocol used in the Neonatal Intensive Care Unit involving skin-to-skin contact between the mother and newborn (Feldman et al., 2002). Second, once clinically stable, preterm infants were placed in open cradles where they were systematically exposed to broadcast speech from their mothers and other people who talked to or near them. Finally, based on international recommendations (American Academy of Pediatrics, 1998), preterm infants were discharged at ∼34 wGA; they then began to receive exposure to speech as full-terms in the home. At the time of testing, the mean durations of exposure to broadcast speech for FT9, FT12, PT9, and PT12 were 39.9, 49.9, 49.9, and 59.2 weeks, respectively. Ethical approval was obtained from the Sótero del Río Hospital ethics committee and informed written consent from the parents.
We used the synthetic consonant–vowel stimuli prepared by Werker and Lalonde (1988) to study categorical perception in English and Hindi speakers. This continuum comprises eight steps along the voiced place-of-articulation dimension from the bilabial /b/ to the dental /d/ and retroflex /D/, associated with the vowel /a/ (hereafter S1 to S8). Along this continuum, native English speakers perceive two phonetic categories (S1–S3 as /ba/ and the following as /da/) and native Hindi speakers perceive three (S1–S3 as /ba/, S4–S5 as /da/, and S6–S8 as retroflex /Da/). Six-month-old full-term English infants perceive both boundaries while at 12 months, they fail to perceive the Hindi boundary, similar to English-speaking adults (Werker and Lalonde, 1988). Adult Spanish speakers from Chile perceive only one boundary between S3 and S4. However, they identify these categories as /pa/ and /ta/ due to a shorter voice onset time for these syllables in Spanish, and perceive the end of the continuum (S7–S8) as less natural. Each syllable was 275 ms long and was delivered by loudspeakers at 60 db SPL.
Infants were tested in a soundproof Faraday booth. The infant sat on the parent's lap and the parent listened to music through earphones to mask the speech stimuli during testing. To avoid body movement, infants saw attention-grabbing images and could play with a small toy during testing. Infants heard 180 randomly presented trials, 30 in each of six experimental conditions [three trial types (i.e., standard, acoustic, and phonetic) and two phoneme contrast types (i.e., native and nonnative)]. Each trial comprised the presentation of four consecutive syllables with 600 ms interstimulus intervals. The first three syllables were always identical while the fourth remained identical in standard trials (i.e., S3 S3 S3 S3 or S6 S6 S6 S6) and changed to a syllable from the other phonetic category in phonetic trials (i.e., S4 S4 S4 S3 or S5 S5 S5 S6) or from the same phonetic category in acoustic trials (i.e., S2 S2 S2 S3 or S7 S7 S7 S6). The test syllable was thus similar in all three conditions for the native (S4) and non-native boundary (S6). Because S7 was perceived as unnatural, we did not analyze the acoustic condition at the non-native boundary. Intertrial intervals randomly varied between 3000 and 3500 ms.
Data acquisition and processing.
EEG data were collected using a 64-electrode geodesic sensor net (EGI) referenced to the vertex with a sampling rate of 500 Hz. Maximal impedance was 40 kΩ. The continuous recording was first filtered (bandpass 0.5–20 Hz) and then segmented into epochs lasting 3000 ms including 500 ms preceding the first syllable of the trial. Epochs containing >20 electrodes with voltage fluctuations exceeding ±150 μV or transients exceeding ±100 μV were rejected. Nonrejected trials were averaged, baseline corrected (from 200 ms before the onset of the trial for the analysis of the response to the first syllable, or from 200 ms before the onset of the trial to the onset of the fourth syllable for the analysis of the response to the fourth syllable), and transformed into an average reference.
We first checked whether our groups processed syllables differently by inspecting the response to the first syllable of the trials, which induced the strongest response (Dehaene-Lambertz and Dehaene, 1994). The grand average computed across all infants revealed a first peak between 90 and 190 ms (positive on 19 frontocentral electrodes and negative on 20 occipitotemporal electrodes), followed by a second component from 300 to 512 ms (positive on 4 lateral frontal electrodes on each side and negative over 11 occipitotemporal electrodes), as previously described at this age (Kushnerenko et al., 2002). For each component, we averaged the voltage during the corresponding time window and across the electrodes covering the positive and negative poles in each infant and for each type of presented syllables (S3, S6, S4, and S5) and submitted this variable to an ANOVA with Electrodes (Positive and Negative poles) and Syllable (S3, S6, S4,and S5) as within-subjects factors, and Group (FT9, F12, PT9, and PT12) as a between-subjects factor. We tested whether the factor Group interacted with any other variables.
Second, we examined the ERPs to the fourth syllable to study the infants' response to a change of phoneme. As expected by the literature (Dehaene-Lambertz and Gliga, 2004), we recorded a mismatch response (MMR) to deviant stimuli, with a positive pole over the frontal electrodes and a reverse of polarity over the posterior electrodes, between 170 and 310 ms after the onset of the fourth syllable (Fig. 2). We restricted our analysis to the time window and groups of electrodes at which the MMR difference between standard and deviant trials across all four groups and both phoneme contrasts were significantly different (two-tailed t test, p < 0.05 corrected by false discovery rate; q < 0.05). We then analyzed the difference between the mean voltage for the third and fourth syllables across the mentioned time window and anterior/posterior groups of electrodes computed for each infant and condition. We first confirmed the linguistic nature of the MMR at the native boundary by showing that a linguistic change (i.e., Phonetic) induced a stronger MMR than a similar nonlinguistic change (i.e., Acoustic) (Dehaene-Lambertz and Baillet, 1998). An ANOVA was thus computed with Electrodes (Anterior and Posterior) and Condition (Phonemic and Acoustic at the native boundary) as within-subjects factors, and Group (FT9, F12, PT9, and PT12) as a between-subjects factor. Then we estimated the effect of neural maturation and duration of the speech exposure on MMR by computing an ANOVA with Electrodes (Anterior and Posterior), Condition (Deviant and Standard), and Phonetic Contrast (Native and Nonnative) as within-subjects factors; and Group (FT9, F12, PT9, and PT12) as a between-subjects factor.
Our first analyses confirmed that there was no important change at this age in the general processing of a sound (i.e., no main effect of Group, nor significant interaction of this factor with the other variables in the ANOVA carried on the ERP to the first syllable of the trials), nor in the linguistic nature of the response at the native boundary, i.e., we observed the expected significant interaction Electrode × Condition (Acoustic vs Phonetic change): F(1,151) = 11.72, p = 0.001, η2 = 0.072, with no interaction with group (F(1,151) < 1). In each group, the MMR for phonetic trials was significantly stronger than for acoustic trials (FT9: p = 0.035; FT12: p = 0.042; PT9: p = 0.022; PT12: p = 0.012).
Our main goal was to explore how neural maturation and speech exposure influence nonnative phoneme discrimination. The predicted differential decrease of the MMR for the nonnative phonetic contrast relative to the phonetic contrast between groups was confirmed by a significant interaction of Electrodes × Condition × Phonetic Contrast × Group (F(3,151) = 5.141, p < 0.002, η2 = 0.093; Fig. 3). This effect was strongest at the anterior cluster of electrodes (Condition × Phonetic Contrast × Group: F(3,151) = 3.616, p < 0.015, η2 = 0.067). We thus restricted subsequent analyses to this location. As predicted, a significant Condition × Group interaction (F(3,151) = 8.484, p < 0.001, η2 = 0.144) was observed for the nonnative contrast, but not for the native contrast (F(3,151) = 1.606, p = 0.190).
To disentangle the impact of neural maturation versus amount of broadcast speech exposure, we compared PT9 and FT9 (same maturational age) and PT9 and FT12 (comparable amount of broadcast speech exposure). The MMRs were significantly different between PT9 and FT12 (Condition × Group: F(1,82) = 19.787, p < 0.001, η2 = 0.194) but not between PT9 and FT9 (Condition × Group: F(1,54) = 0.794, p = 0.377), suggesting a stronger effect of maturational age than duration of exposure. Post hoc analyses confirmed that the Condition × Contrast interaction was indeed significant in FT12 (F(1,59) = 9.958, p < 0.003, η2 = 0.144) and PT12 (F(1,38) = 13.878, p < 0.001, η2 = 0.268), indicating that these infants no longer reacted to changes crossing the nonnative boundary, but it was not significant in FT9 (p = 0.51) or PT9 (p = 0.69), showing that an MMR was present for both phonetic contrasts in these younger groups. Thus, preterm infants did not gain an advantage from longer exposure to broadcast speech and reacted similarly to full-term infants of the same maturational age. Because our results might have been affected by the higher number of subjects in FT12, we drew 50,000 subsamples of 30 subjects from this group to check the reproducibility of our results. The results remained significant at p < 0.05 in 92.5% to 99.9% of the draws, depending on the analyses (97.8% to 100% at p < 0.1).
To confirm the effect of maturational age over duration of exposure to broadcast speech, we computed linear regressions of these two factors on the magnitude of the MMR to the nonnative contrast across the four groups of infants. To isolate a pure effect of maturational age, we first performed a linear regression of duration of exposure on the MMR (R2 = 0.06, F(1,153) = 9.7, p = 0.002) and then used the residuals of that regression in a regression analysis of maturational age. A significant negative relation of maturational age was still observed when exposure was canceled out (R2 = 0.03, F(1,153) = 5.0, p = 0.027). When a linear regression of maturational age on MMR was first computed (R2 = 0.11, F(1,153) = 19.1, p < 0.001) and its residual entered in a regression analysis of exposure, no significant effect remained (F(1,153) < 1). The same analyses conducted for the native contrast showed no effect of the duration of exposure on the amplitude of the MMR (F(1,153) = 3.1, p = 0.08) but a negative effect of maturational age (R2 = 0.04, F(1,153) = 5.6, p = 0.019) that was no longer significant when duration of exposure was canceled out (F(1,153) = 1.4, p = 0.22).
Our results strikingly demonstrate that preterm infants do not benefit from supplementary exposure to broadcast speech. These results fall in line with previous clinical neuropediatric observations indicating that cognitive and neurological development of healthy, highly premature infants is guided more by neural maturation than external stimuli exposure (deRegnier, 2007). Why do preterm infants not benefit from additional exposure to well formed speech input and meaningful interactions to develop their native phonetic repertoire? One explanation could be that their auditory system is too immature to compute phonetic representations before term. This does not seem plausible as neonates display mismatch responses to phonetic contrasts (Dehaene-Lambertz and Peña, 2001). Moreover, a recent experiment shows that the capacity to discriminate at least some phonetic distinctions (e.g., /ba/ vs /ga/) is present at very early ages (28–32 wGA) (M. Mahmoudzadeh, G. Dehaene-Lambertz, M. Fournier, G. Kongolo, S. Godjil, J. Dubois, R. Grebe, and F. Wallois, unpublished observations).
Language presents several periods during which learning is easier—also called windows of opportunity or critical periods—and these may not be open during the last weeks of gestation. Recent animal studies have shown that critical periods are biologically guided by biochemical factors promoting the maturation of GABA neurons and shaping corticocortical connectivity (Yuan et al., 2011). In studies with rodents and cats, the manipulation of these biochemical factors has been shown to reopen critical periods after closure and accelerate both their opening and their closure before the expected time (Hensch, 2005; Barkat et al., 2011). Similar acceleration by biochemical factors has also recently been reported in humans (Weikum et al., 2012). The timing of closure can also be delayed if no relevant stimulation is experienced (Slater et al., 1988). Similarly, in humans, the closure of windows of opportunity appears to be delayed in infants born deaf (for review, see Werker and Tees, 2005) or whose mothers are depressed (Weikum et al., 2012) secondary to the lack of speech stimulation, although if the delay is too long, the ultimate level of sensitivity is compromised.
Little is known, however, about how and when these windows of opportunities start in humans. During the last weeks of gestation, neurons are still migrating to their correct positions in the cortical plate (Kostović et al., 1995). The first thalamocortical circuits established with the subplate neurons are progressively replaced with the definitive cortical circuits (Yuan et al., 2011). Before term (<38 wGA), immature networks may be sufficient to provide discrimination of tones (Draganova et al., 2005) and syllables (Shahidullah and Hepper, 1994; Dehaene-Lambertz, 1998; M. Mahmoudzadeh, G. Dehaene-Lambertz, M. Fournier, G. Kongolo, S. Godjil, J. Dubois, R. Grebe, and F. Wallois, unpublished observations), but stabilization of representations, including those induced by the environment, may need more complex circuitry involving GABA interneurons in particular.
The fact that full-term neonates have already memorized their mother's voice (DeCasper and Fifer, 1980) and have become sensitive to their native language prosodic properties (Mehler et al., 1988; Byers-Heinlein et al., 2010) demonstrates that learning is possible in utero. However, these studies do not inform when this learning started. Studies specifically testing learning in utero have tested fetuses after 37 wGA, i.e., at term. deRegnier et al. (2002) did not find evidence of mother's voice recognition in a-few-days-old neonates born between 35 and 38 wGA, contrasting with the results obtained in those born after 39 wGA. There is also no evidence that neonates have learned about the consonants of their native language from prenatal or even initial postnatal listening experience. Indeed, the phonetic repertoire seems to be largely established in the absence of specific experience, as indicated by the substantial work showing that very young infants can discriminate not only familiar, native speech sounds but also nonnative speech sounds they have never before heard. Together, these facts suggest that before 37–38 wGA, as term is defined in humans, the cortical circuitry might not yet be ready to begin perceptual tuning to the native language environment.
Optimal windows of learning have often been discussed concerning language acquisition. The fact that the premature brain is not able to take advantage of its superior exposure to broadcast speech to speed up tuning toward properties of the native language environment (Peña et al., 2010), even though it is able to discriminate some phonemes, underscores the dependence of at least some aspects of speech acquisition on biological factors. Our result suggest that, as in rodents (Barkat et al., 2011), cascading series of developmental windows might open and close at different ages to shape the auditory/linguistic networks. These aspects should be taken into account in neurocognitive models of early human cognitive development.
This work was supported by Fondecyt Grants 1060767 and 1110928 to M.P. We are grateful to the infants and parents who participated in the study; to Enrica Pittaluga, Margarita Luna, and Elizabeth Pavez for their assistance in infant recruitment; and to Samantha Bangayan, who edited this manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Marcela Peña, Laboratorio de Neurociencias Cognitivas, Escuela de Psicología, P. Universidad Católica de Chile, 7820436 Santiago, Chile.