Abstract
The quantity and quality of the language input that infants receive from their caregivers affects their future language abilities; however, it is unclear how variation in this input relates to preverbal brain circuitry. The current study investigated the relation between naturalistic language input and the functional connectivity (FC) of language networks in human infancy using resting-state functional magnetic resonance imaging (rsfMRI). We recorded the naturalistic language environments of five- to eight-month-old male and female infants using the Linguistic ENvironment Analysis (LENA) system and measured the quantity and consistency of their exposure to adult words (AWs) and adult–infant conversational turns (CTs). Infants completed an rsfMRI scan during natural sleep, and we examined FC among regions of interest (ROIs) previously implicated in language comprehension, including the auditory cortex, the left inferior frontal gyrus (IFG), and the bilateral superior temporal gyrus (STG). Consistent with theory of the ontogeny of the cortical language network (Skeide and Friederici, 2016), we identified two subnetworks posited to have distinct developmental trajectories: a posterior temporal network involving connections of the auditory cortex and bilateral STG and a frontotemporal network involving connections of the left IFG. Independent of socioeconomic status (SES), the quantity of CTs was uniquely associated with FC of these networks. Infants who engaged in a larger number of CTs in daily life had lower connectivity in the posterior temporal language network. These results provide evidence for the role of vocal interactions with caregivers, compared with overheard adult speech, in the function of language networks in infancy.
Significance Statement
Infants whose caregivers speak to them more develop better language skills. It is unclear, however, how real-word language input is associated with preverbal brain circuitry. Resting-state functional magnetic resonance imaging (rsfMRI) during natural sleep can noninvasively measure patterns of brain activation in infancy. The present study finds that the quantity of vocal interactions infants engage in with their caregivers in daily life correlates with the strength of resting-state functional connectivity (FC) in regions of the brain implicated in language comprehension. These results provide evidence for the role of vocal interactions with caregivers in the function of language networks in infancy. Interventions that focus on increasing vocal interactions may be associated with infant brain function in a manner that ultimately enhances language abilities.
Introduction
How do children's earliest experiences with language influence brain development?
Observational and experimental research indicates that the quantity and quality of language input that infants receive from their caregivers affects their future language abilities (Weisleder and Fernald, 2013; Ferjan Ramírez et al., 2019); however, it is unclear how variation in naturalistic language input relates to preverbal brain circuitry. Given rapid neurodevelopment in infancy, maturational processes including synaptogenesis, axonal growth, and myelination are responsive to early environmental exposures (Tau and Peterson, 2010). These neurodevelopmental events, involving increasing anatomic connectivity, are theorized to underlie the emergence of coordinated neuronal activity between brain regions [i.e., functional connectivity (FC); Cao et al., 2017; Grayson and Fair, 2017]. Resting-state functional magnetic resonance imaging (rsfMRI), which measures the hemodynamics associated with spontaneous neural activity (Biswal et al., 1995; Fox et al., 2005), can noninvasively measure differences in FC corresponding to variation in naturalistic language input in infancy.
Language networks are more plastic in infancy but reflect emerging adult-like organization. The ontogeny of the cortical language circuit is theorized to involve two stages: the “bottom-up” processing stage, which begins in utero and involves connections of the bilateral superior temporal auditory system facilitating word form detection and segmentation of prosodic information, and the “top-down” processing stage, which begins in early childhood and involves connections of the left inferior frontal gyrus (IFG) that gradually refine to enable processing of complex syntax (Skeide and Friederici, 2016). Thus, language processing involves both primary sensory networks, which are topologically mature in neonates, and higher-order cognitive subnetworks, which synchronize (Gao et al., 2015; Wen et al., 2019) and specialize (Skeide et al., 2014; Emerson et al., 2016) across childhood. Despite the relative immaturity of their language networks, sleeping infants exposed to speech sounds activate brain regions implicated in language processing in older children and adults, including the IFG and superior temporal gyrus (STG; Blasi et al., 2011). Neonates possess the white matter tracts, including the arcuate fasciculus, that connect regions of the mature language network (Sket et al., 2019).
Although exposure to language is experience expectant, the nature of this exposure is experience dependent (Greenough et al., 1987). Children's language experiences vary widely (Romeo, 2019). Naturalistic observations indicate that each day young children hear from 3000 to 30,000 adult words (AWs) and participate in from 60 to over 1000 adult–child conversational turns (CTs; Gilkerson et al., 2017). While some children experience consistent language input across the day, others experience input during circumscribed periods (King et al., 2020). Compared with the amount of exposure to AWs, CTs are linked to greater gains in language abilities (Gilkerson et al., 2018) and explain variation in brain function and structure in children (Romeo et al., 2018a,b; Merz et al., 2020). For example, preschool children who experience more CTs exhibit greater activation in Broca's area during a story-listening task and more coherent connectivity of the left arcuate fasciculus (Romeo et al., 2018a,b). Thus, multiple dimensions of language may differentially influence brain development.
The current study investigated the relation between naturalistic language input and the FC of resting-state language networks in infants. We recorded the naturalistic language environments of five- to eight-month-old infants using the Linguistic Environment Analysis (LENA) system (Gilkerson et al., 2017) and measured the quantity and consistency of their exposure to AWs and CTs in daily life. Infants completed an rsfMRI scan during natural sleep and we examined FC among brain regions involved in language comprehension, including the auditory cortex, the IFG, and the bilateral STG. We anticipated that measures of adult–infant CTs (i.e., the quantity and consistency of CTs) would be associated more strongly with FC in language networks than would measures of overheard adult speech (i.e., the quantity and consistency of AWs). Results supporting this hypothesis would provide evidence for the role of vocal interactions with caregivers, compared with overheard adult speech, in the function of language networks in infancy.
Materials and Methods
Participants
Women and their infants were recruited from communities in the San Francisco Bay Area to participate in the Brain and Behavior Infant Experiences (BABIES) project (see Humphreys et al., 2018; King et al., 2019; Camacho et al., 2020), an observational study of the association between perinatal experiences and infant and toddler psychobiological development. The sample for the current analyses included mother–infant dyads who completed at least 8 h of recording of the language environment (i.e., the LENA assessment) and provided usable rsfMRI data when infants were ages five to eight months. We focused on infants ages five to eight months for several reasons. First, we were interested in examining brain function within the first year of life given that this period is defined by rapid neurodevelopment and, therefore, enhanced sensitivity to environmental input (Graham et al., 2015). Second, whereas most infant fMRI studies have focused on newborns who have minimal exposure to the postnatal environment (Azhari et al., 2020), we were interested in examining the earliest associations of variation in the postnatal environment with infant brain function. Infants ages five to eight months have considerable exposure to the postnatal environment and yet their brain development and exposure is limited compared with older children. Third, infants ages five to eight months have reached important developmental milestones that allow us to investigate not only their exposure to total adult speech but also their experience of linguistic interactions with adults. Specifically, most infants of this age are not only vocalizing or “babbling” (e.g., vowels, squeals) but are also engaging in canonical babbling (i.e., producing the vowel-consanent syllables [e.g., “ma”] that are the building blocks of words; Oller et al., 1997).
Of the 151 dyads who participated in the BABIES project when infants were age five to eight months, 99 completed the LENA assessment (41 did not attempt to complete and 10 completed when infants were more than eight months), 51 of whom also provided usable infant rsfMRI data (see below, MRI data preprocessing).
Procedure
The BABIES Project was approved by the Stanford Institutional Review Board. Mothers provided informed written consent for themselves and their infants and were compensated for their time. Participants included in the current analyses were recruited either during their pregnancies (16–35 weeks of gestation) or when their infants were less than or equal to six months through online advertisements and flyers posted in the local community. Participants recruited during pregnancy participated in additional sessions not included in the current analyses. All participants were screened for inclusion/exclusion criteria through a phone interview. When infants were approaching six months, mother–infant dyads were invited to attend a laboratory session. Inclusion criteria for this session were that mothers had a singleton infant between five and eight months, were ≥18 years, were fluent in English, and had no immediate plans to leave the geographic area. Exclusion criteria included maternal bipolar disorder, maternal psychosis, maternal severe learning disabilities, severe complications during birth, infant head trauma, infant premature birth (<36 weeks of gestation), infant congenital, genetic, or neurologic disorders, and contraindication for infant MRI. At the laboratory session, dyads participated in mother–infant interactions and mothers completed questionnaires and interviews. At the end of this session, a research coordinator provided dyads with a LENA audio recording device and scheduled an infant MRI brain scan session for the infant. The MRI scans were completed an average of 1.91 weeks (SD = 1.94) after the LENA assessment.
Code and data availability
All templates and scripts used in the MRI data processing are available at https://github.com/babies-study/language_rest. Additional details about the materials and methods, processed data in tabular format, statistical analysis scripts for the associations among language input variables and resting-state FC, and results of analyses probing the robustness of our findings are available at https://github.com/lucysking/infant_rsfMRI.
Measures
Language input and vocalizations
As described by King et al. (2019), we provided mother–infant dyads with a LENA audio recording device and specialized infant clothing with written and oral instructions that the infant wear the device from waking to bedtime across at least one typical day at home. The reliability of the LENA system for measuring language input has been extensively documented (Xu et al., 2008, 2014; Zimmerman et al., 2009; Oller et al., 2010). Research comparing LENA counts of AWs, CTs, and child vocalizations to counts based on human transcription finds that the LENA is reliable for children ages 2–48 months and that reliability does not vary as a function of age (correlations between LENA and human-based measures ranged from 0.82 to 0.95, and differences between LENA and human-based measures were uncorrelated with age). The LENA system has also been validated in several languages in addition to English, including Spanish, Chinese, French, and Swedish (Weisleder and Fernald, 2013; Gilkerson et al., 2015; Canault et al., 2016; Schwarz et al., 2017).
The LENA device records for up to 16 h. To be included in the current analyses, families were required to have completed ≥8 h of recording. We uploaded the recordings to the LENA Pro analysis software (Xu et al., 2009), which yields counts of AWs (overheard “near and clear” words spoken by adults in the infant's vicinity), CTs (adult–infant or infant–adult back-and-forth vocalizations separated by ≤5 s), and infant vocalizations within each 5-min epoch in the day. Using these estimates, we calculated the following measures:
1) To measure the quantity of AWs, CTs, and infant vocalizations, we followed previous research testing the association between naturalistic language input and the brain (Romeo et al., 2018a,b) to compute the maximum hourly total of AWs, CTs, and infant vocalizations.
2) To measure the consistency of exposure to AWs and CTs, we calculated the proportion of 5-min epochs out of the total number of 5-min epochs in the recording in which ≥1 AW (AW consistency) or ≥1 CT (CT consistency) was present.
Whereas the quantity of language input reflects the maximum level of stimulation the infant receives, the consistency of language input reflects the dependability of this stimulation throughout the day. Figure 1 presents the distributions of CTs across the day for two infants: one exposed to lower quantity but higher consistency, and one exposed to lower consistency but higher quantity.
Covariates
Infant characteristics
We calculated infant gestational age at birth and corrected age at the MRI session based on infants' reported due date and birth date. Mothers reported their infants' birthweight (two mothers did not provide this information). To assess infant temperament, mothers completed the negative affectivity subscale (Cronbach's α = 0.87) of the short form of the Infant Behavior Questionnaire-Revised Short Form (IBQ-R-SF; Putnam et al., 2014). One mother did not complete the IBQ-R-SF.
Maternal and family characteristics
Mothers reported their primary language, their number of previous pregnancies (two mothers did not provide this information), whether they were currently breastfeeding (five mothers did not respond to this question), their education level, their annual household income in bins ranging from 1 ($0–50,000) to 7 (>$150,000), and the number of people in their household. We calculated family income-to-needs ratio by dividing the annual household income (median point of each bin) by the county-specific low-income threshold for the number of people in the household (www.huduser.gov). To assess maternal crystallized English verbal knowledge, we calculated total scores from mothers' responses to the Vocabulary section of the Shipley Institute of Living Scale–2 (Shipley et al., 2009; Cronbach's α = 0.90). To measure maternal short-term memory, we calculated the total number of mothers' correct responses in a forward and backward Digit Span task (four mothers did not complete this task).
MRI data acquisition
As previously described (Camacho et al., 2020), dyads were provided with an MRI prep kit including earplugs and audio-recordings of MRI sounds to prepare infants for the MRI session. Dyads arrived at the MRI session ∼30 min before the infant's typical bedtime. For scanning, infants were undressed, changed into a disposable diaper, swaddled in a muslin cloth, and placed in in a MedVac Immobilizer. Sound protection included earplugs paired with active noise cancelling headphones playing white noise or Natus Medical neonatal noise attenuators (miniMuffs). Infants were then soothed and fed according to their typical bedtime routine. Once the infant had been sleeping for 10 min, the infant was transferred to the MRI scanner bed. If the infant remained asleep during transition and for the next 5 min, MRI acquisition was initiated. Acquisition was stopped if the infant woke and was restarted after the infant had been sleeping again for 10 min. This process continued until all sequences were collected, the infant's mother wanted to stop scanning, or the infant refused to soothe or sleep in the scanner room. A staff member remained with the infant at all times in the scanner room. Of the 151 dyads who participated in the larger project, 101 attempted the MRI scan visit, and 24 of these infants were unable to soothe to sleep.
MR images were collected using a 3 Tesla GE MR750 Discovery scanner equipped with a 32-channel NovaMedical head coil. High-resolution T2-weighted (T2w) images were collected using a 3D fast spin echo sequence (1.0 × 1.0 × 0.8 mm voxel, 204 sagittal slices, 256 × 256 acquisition matrix, flip angle = 90°, FOV = 256 mm, TR = 2502 ms, TE = 91.4 ms). Resting-state BOLD fMRI data (rsfMRI) were collected using a T2*-weighted spiral in and out sequence (Glover, 2012) designed to reduce dropout in orbitofrontal regions (3.4 × 3.4 × 3.0 mm voxel, 35 axial slices, 64 × 64 acquisition matrix, flip angle = 80°, FOV = 220 mm, TR = 2500 ms, TE = 30 ms). Given the probability of the infant waking during the scan, rsfMRI data were collected in two runs of 6 min each. If the infant remained asleep through the end of the first acquisition, a second acquisition was collected, resulting in 6–12 min of data for each infant (145–290 volumes). Data were visually inspected for artifacts before data processing. At two points during the study, scan sessions were moved to another identical scanner and scanning sequences were harmonized between sites (15 infants on scanner one, four on scanner two, 32 on scanner three). Thus, scanner is included as a covariate in all primary analyses.
MRI data preprocessing
Of the 77 infants who successfully slept during the MRI scan visit, six infants woke up before the rsfMRI sequence was completed, two infants were removed for too much motion (<5 min of low motion data), two were removed for image artifacts (one likely because of a corrupted data file and one because of a large prefrontal distortion), and one infant was removed because of an extreme outlying value in FC in the frontotemporal network (i.e., beyond 3 SD from the mean). Of the 66 infants with usable rsfMRI data, 51 infants also had LENA recordings. MR image processing was conducted in Python version 3.7 using the NiPype framework (Gorgolewski et al., 2011).
T2w anatomical data
First, T2w images were corrected for nonuniform intensities (Tustison et al., 2010); next, the brain was extracted using FSL's brain extraction tool (BET; Jenkinson et al., 2005), down-sampled to a 2 mm isotropic voxel size, and registered to a sample-specific T2w anatomic template using FSL's FLIRT (Jenkinson et al., 2002). This template was created from artifact-free T2w images collected from these same infants analyzed here using the ANTs template creation pipeline, which involves iterative diffeomorphic registrations before eventual averaging (Avants et al., 2011).
rsfMRI data
Resting-state fMRI were trimmed of the first four volumes, slice-time corrected, and then rigidly aligned to the middle volume in the acquisition using FSL's MCFLIRT (Jenkinson et al., 2002), which produced frame-by-frame measurements of translation and rotation in each direction that were used in later de-noising (see below, Resting-state fMRI noise characterization and removal). Next, rsfMRI were co-registered to the down-sampled T2w (see above, T2w anatomical data), and the transform from the T2w to template registration was applied to the rsfMRI, warping it into the template space using FSL's FLIRT (Jenkinson et al., 2002). Registrations were visually inspected for accuracy and modified as needed to correct alignment. Only one infant's data required manual alignment, and this infant was ultimately removed because of excessive motion. Measurement noise from motion and global signals are both known to dramatically influence BOLD signal and connectivity measurements (Satterthwaite et al., 2012; Murphy and Fox, 2017); thus, we took careful denoising steps (see below, Resting-state fMRI noise characterization and removal) before temporally interpolating across volumes with a frame-wise displacement (FD) greater than 0.5 (Power et al., 2012) and finally bandpass filtering to retain signal fluctuations between 0.005 and 0.1 Hz (Hallquist et al., 2013). The FD cutoff of 0.5 was determined based on the relation between FD and the change in average whole brain signal (known as DVARS; Smyser et al., 2010) in our sample; DVARS spikes greater than 2 SD from mean DVARS within each infant were associated with FD values of 0.5 and greater. Thus, “low-motion” volumes were those with FD less than 0.5 and DVARS less than 2 SD from the mean. After bandpass filtering, interpolated volumes were dropped before conducting connectivity analysis. Infants with less than 5 min of usable data after these procedures were removed (n = 2). We did not conduct spatial smoothing of BOLD signal based on concerns that BOLD signal would be incorrectly localized to white matter given the small size of the infant brain and the locations of our regions of interests (ROIs; Baxter et al., 2019). Minimal smoothing kernels or no spatial smoothing is standard in infant imaging (Fitzgibbon et al., 2019; Howell et al., 2019).
Resting-state fMRI noise characterization and removal
Global signals
Global signals were characterized on the subject-specific and context-specific level. To characterize subject-specific noise, BOLD signal from outside the brain (“session noise”) and from white matter and cerebral spinal fluid within the brain (“physiological noise”) were each isolated by masking out the brain and gray matter, respectively. Noise volumes were then smoothed with a 4 × 4 × 4 mm Gaussian kernel, filling the removed brain areas with neighboring noise signal. For the session noise volume, the kernel base was extended to 22 mm to avoid zeros in the inner voxels of the brain (instead, these were near-zero values). To characterize noise associated with the general procedural context (e.g., the scanner sequence and the scanning experience), runs of rsfMRI were temporally averaged across all participants, producing voxel-level estimates of procedural noise. These steps produced three voxel-specific and time-specific regressors for each participant: session, physiological, and procedural. Regression of global signal shifts connectivity correlations from predominantly positive values to center at approximately zero across the brain (Murphy et al., 2009; Murphy and Fox, 2017). Thus, the negative correlations observed here should be interpreted as relatively less connectivity rather than negative connectivity.
Motion
Derivatives were calculated from the six motion parameters (translation and rotation in each of three directions) produced from rsfMRI rigid realignment. Nonlinear influences of each of these 6 motion parameters on BOLD signal were characterized by creating a Volterra series from the motion derivatives and the square of the motion derivatives. Specifically, each motion parameter was lagged six times, capturing a “memory effect” of motion on the signal up to 15 s later. In total, this procedure generated 42 motion regressors: six original parameters and 36 lagged derivatives. To determine which of the lagged parameters exerted the strongest influence on BOLD signal (immediate vs later, translational vs rotational, etc.), the partial contribution, operationalized as the R2 fit for the model lost by removing each group from the regression, of each lagged motion parameter was computed for groups of three (total of 24 partial contribution estimates), and was then ranked. The fourth, fifth, and sixth lags provided the least contribution to the BOLD signal; thus, the final motion nuisance regressors were limited to the first three lags in addition to the original six motion parameters (24 parameters in total).
Altogether, the final denoising step included regressing voxel-specific session, physiological, and procedural signal as well as 24 motion regressors, resulting in 28 regressors in total when including the constant. We used these residuals in subsequent processing and analysis.
Resting-state fMRI network analysis
Language network region selection
ROIs were selected based on a recent review of the ontogeny of the cortical language comprehension circuit (Skeide and Friederici, 2016) in which the language network is subdivided into two developmental stages corresponding to two subnetworks of functionally connected brain regions. The bottom-up processing stage begins in utero, involving connections between areas of the bilateral temporal cortices; in contrast, the top-down stage begins in middle childhood, involving connections of the IFG. The following nine ROIs were selected based on this subdivision: the left temporal pole, left superior temporal sulcus (STS), right and left temporoparietal junction (TPJ), left auditory cortex, left frontal operculum, and three regions of the left IFG (pars opercularis, pars triangularis, and pars orbitalis). ROIs were manually delineated on the T2w anatomic template derived from the sample based on anatomic landmarks (see Table 1).
Estimation of language network FC
To estimate language network FC, we calculated pair-wise Pearson correlations of mean signal between ROIs. We generated a group-level adjacency matrix using the plotting function in Nilearn (https://nilearn.github.io/) showing the correlations and visually identified subnetworks based on sets of intercorrelated ROIs. To confirm statistically that visual designation of the subnetworks was appropriate, we also performed modularity detection using an iterative Louvain community detection approach. One hundred iterations of the Louvain algorithm (Brain Connectivity Toolbox python implementation, γ = 0.25) were applied to the data (seeds 0–99), and connections that were placed within the same module in a majority of these iterations (at least 51%) were designated as part of the same module. We then extracted correlation coefficients for each infant and calculated subnetwork connectivity values by taking the mean of the correlation coefficients among the ROIs in each subnetwork. Given that language networks are still developing in infancy, we did not threshold correlation coefficients for our primary analyses; however, as a supplemental analysis, we calculated FC excluding coefficients <0.1 and examined the similarity of findings across the two approaches. We standardized final connectivity values using the Fisher r to z transformation.
Statistical tests
Statistical tests were conducted in R version 4.0.0 (R Core Team, 2020). First, we examined associations (Pearson's correlations for numerical variables, Spearman's correlations for ordinal variables, Welch's t tests for mean comparisons between groups) among the language input and infant vocalization variables (AW and CT quantity, AW and CT consistency, infant vocalizations), infant characteristics, maternal and family characteristics, and resting-state connectivity values. Next, we fit a series of four multivariate multiple regression models to examine the associations of the quantity and consistency of language input with connectivity values. Specifically, for each measure of language input, we fit a model in which variables for connectivity in each identified network were the dependent variables and the measure of language input was the statistical predictor. We used the “Manova” function from the “car” package in R (Fox and Weisberg, 2019) to calculate Pillai's trace statistics to determine whether the measure of language input explained significant variance in connectivity jointly across the networks. We set the family-wise α to 0.05, adjusting for multiple comparisons using the Meff correction (i.e., four multivariate regressions testing the association between each of the four measures of language input and FC across language networks; Derringer, 2018), such that a measure of language input was determined to be significantly associated with FC jointly across the networks when p < 0.014. If the Pillai's trace statistic was significant, we examined the linear regression model for each network. For significant associations, we quantified effect sizes using standardized βs, 95% confidence intervals (CI), and % variance explained above and beyond covariates (ΔR2). Finally, for significant associations, we used the “BayesFactor” package in R (Morey, 2019) to compute Bayes factors to quantify the strength of evidence in support of the alternative hypothesis (i.e., that the measure of language input explains variance in connectivity) versus the null hypothesis (i.e., that measure of language input does not explain variance beyond an intercept only model).
We used a formal model fitting approach (Chambers, 1992) to determine the inclusion of covariates in our primary analyses. Specifically, we iteratively added covariates (i.e., each of the measures of infant, maternal, and family characteristics described above, Covariates) to a base multivariate regression model in which estimates of connectivity in each identified network were the dependent variables and scanner was the single independent variable. To preserve model parsimony, only covariates that were significantly associated with resting-state connectivity as determined by the Pillai's test statistic in the multivariate multiple regression model were included. However, scanner was included as a centered effect-coded covariate in all models regardless of statistical significance.
Results
Sample characteristics
We present characteristics of the final sample in Table 2. We present histograms showing the distributions of each measure of language input in Figure 2. The 51 dyads included in the final sample did not differ significantly from dyads not included with respect to LENA metrics (AW quantity: t(95.74) = −1.47, p = 0.145; CT quantity: t(95.85) = −1.33, p = 0.187; AW consistency: t(99.63) = −0.87, p = 0.383; CT consistency: t(94.39) = −0.55, p = 0.581); infant vocalizations: t(95.96) = −0.77, p = 0.441), infant characteristics (age at MRI scan: t(96.65) = −0.12, p = 0.908, gestational age at birth: t(100.71) = −0.43, p = 0.670; birth weight: t(101.54) = –0.146, p = 0.148; negative affectivity: t(104.53) = 0.05, p = 0.960; sex: χ2(1) < 0.01, p = 0.999), maternal characteristics [maternal age: t(99.68) = 0.34, p = 0.785; number of previous pregnancies: t(96.51) = 0.05, p = 0.962; income-to-needs ratio: t(118.71) = −0.90, p = 0.445; primary language (English vs another language): χ2(1) = 1.26, p = 0.261; education (more than or equal to four-year college degree vs less than four-year college degree): χ2(1) = 0.63, p = 0.428; Hispanic or Latinx ethnicity: χ2(1) < 0.01, p = 0.992; or race (White vs person of color): χ2(1) = 0.25, p = 0.615].
Identification of language networks
We display the group-level adjacency matrix showing the correlations among the ROIs implicated in language comprehension in Figure 3. We visually identified two language networks based on two sets of ROIs that were positively intercorrelated. Consistent with the subdivision of cortical language comprehension proposed by Skeide and Friederici (2016), we identified a posterior temporal language network comprising the left auditory cortex, the bilateral TPJ, and the left STS, and a frontotemporal language network comprising the left temporal pole, the left frontal operculum, and three regions of the IFG (orbitalis, triangularis, opercularis). Distributions of FC estimates in the identified language networks are presented in Figure 4.
Results of Louvain community detection confirmed statistically our visual designation of the subnetworks was appropriate. The modules detected were highly similar to the networks we created based on visual clustering and a priori designations, with the exception of the left frontal operculum, which was detected as distinct from the other two subnetworks. Results of our primary analyses presented below were highly similar and conclusions were identical when we excluded the left frontal operculum.
Associations of covariates with language input and FC in language networks
We present Pearson correlation coefficients of the associations among measures of language input and infant vocalizations in Figure 5. CT quantity was positively associated with CT consistency, AW quantity, and infant vocalizations (p values < 0.030), but was not significantly associated with AW consistency; CT consistency was positively associated with AW quantity, AW consistency, and infant vocalizations (p values < 0.027); AW quantity was positively correlated with AW consistency at the trend level (p = 0.065); neither AW quantity nor AW consistency were associated with infant vocalizations.
Gestational age at birth, infant age at the LENA assessment, infant sex, temperament, maternal primary language (English vs another language), breastfeeding status, maternal age, maternal education, income-to-needs ratio, maternal verbal knowledge scores, and maternal short-term memory scores all were not associated significantly with any of the measures of language input or infant vocalizations (r values < ±0.20, t statistics < ±1.62); however, birth weight was significantly positively associated with CT quantity, CT consistency, and infant vocalizations (r values = 0.32–0.35, p values < 0.028). Finally, FD values were not associated with any of the measures of language input or infant vocalizations except AW consistency; infants with greater postcensoring FD values experienced lower AW consistency (r(49) = –0.33, p = 0.018).
FD values, gestational age at birth, infant corrected age at the MRI scan, infant sex, birth weight, maternal primary language, maternal age, income-to-needs ratio, maternal verbal knowledge scores, and maternal short-term memory scores all were not associated significantly with connectivity in either network (r values < ±0.28, t statistics < ±1.50); however, maternal education was negatively associated with connectivity in the frontal network (Spearman's ρ = –0.38, p < 0.001).
Formal model fitting in which we entered estimates of connectivity in the posterior temporal and frontotemporal networks as the dependent variables and tested each of the infant, maternal, and family characteristics as covariates indicated that none of these variables improved model fit above and beyond the effect of scanner. Specifically, neither the addition of infant gestational age at birth (Pillai's trace, F(2,46) = 0.47, p = 0.626), infant corrected age at the MRI scan (linear age: Pillai's trace, F(2,46) = 1.07, p = 0.350; quadratic age: Pillai's trace, F(2,46) = 1.94, p = 0.110), infant sex (Pillai's trace, F(2,46) = 0.72, p = 0.492), infant birth weight (Pillai's trace, F(2,44) = 0.67, p = 0.725), breastfeeding status (Pillai's trace, F(2,46) = 0.02, p = 0.720), maternal primary language (English vs another language; Pillai's trace, F(2,41) = 1.90, p = 0.161), maternal age (Pillai's trace, F(2,46) = 1.42, p = 0.251), family income-to-needs ratio (Pillai's trace, F(2,46) = 2.05, p = 0.140), maternal education (Pillai's trace, F(2,46) = 0.49, p = 0.613), maternal verbal knowledge (Pillai's trace, F(2,46) = 1.58, p = 0.216), nor maternal short-term memory (Pillai's trace, F(2,42) = 0.01, p = 0.731), significantly improved model fit. Although the effect of the scanner on which the rsfMRI data were acquired was not significantly associated with FC across the posterior temporal and frontotemporal networks (Pillai's trace, F(4,48) = 1.64, p = 0.169), we retained the variable in all analyses.
Associations between the consistency of language input and FC in language networks
Neither the consistency of AWs (Pillai's trace, F(2,46) = 0.09, p = 0.916) nor the consistency of CTs (Pillai's trace, F(2,46) = 0.74, p = 0.484) were associated with FC across the posterior temporal and frontotemporal networks (see Fig. 6). Findings were highly similar when we examined associations between the quantity of language input and FC using a threshold of <0.1 for correlations among ROIs. The consistency of AWs was not associated with FC across the posterior temporal and frontotemporal networks (Pillai's trace, F(2,46) = 0.02, p = 0.622), nor was the consistency of CTs (Pillai's trace, F(2,46) = 0.04 p = 0.355).
Associations between the quantity of language input and FC in language networks
The quantity of AWs was not associated with FC across the posterior temporal and frontotemporal networks (Pillai's trace, F(2,46) = 0.79, p = 0.460; see Fig. 6); however, consistent with our hypothesis, the quantity of CTs was significantly associated with FC across the networks (Pillai's trace, F(2,46) = 4.75, p = 0.013). Although CT quantity was not significantly associated with connectivity in the frontotemporal network, CT quantity was significantly negatively associated with connectivity in the posterior temporal network (β = −0.35, SE = 0.13, t(47) = −2.81, p = 0.007, 95%CI[−0.65, −0.11], ΔR2 = 0.14; see Fig. 6). According to the Bayes factor, the alternative hypothesis that CT quantity explained variance in connectivity in the posterior temporal network was 7.05 times more likely than the null hypothesis, indicating positive evidence for the alternative hypothesis (Kass and Raftery, 1995).
We conducted diagnostic tests to determine whether the significant association between the quantity of CTs and FC in the posterior temporal network was robust to potential outliers. First, we examined the distribution of Studentized residuals for the linear regression model in which FC in the posterior temporal network was entered as the dependent variable and the quantity of CTs and scanner were entered as the independent variables. All Studentized residuals were less than three, suggesting there were no extreme cases. Second, we used the “outlierTest” function from the “car” package in R (Fox and Weisberg, 2019) to run a Bonferroni corrected outlier test for this model. The most extreme observation included in the model (Studentized residual = 2.49) was not a significant outlier (Bonferroni p = 0.829).
Given previous findings of an association between socioeconomic status (SES) and language abilities (Fernald et al., 2013), we conducted additional analyses to determine whether the association between CT quantity and FC was independent of family income-to-needs ratio and maternal education. The effect of CT quantity remained significant when covarying for these variables (Pillai's trace, F(2,44) = 4.79, p = 0.013); further, neither variable moderated the association between CT quantity and FC.
Finally, findings were highly similar when we examined associations between the quantity of language input and FC using a threshold of <0.1 for correlations among ROIs. The quantity of AWs was not significantly associated with FC across the posterior temporal and the frontotemporal networks (Pillai's trace, F(2,46) = 0.74, p = 0.483); however, the quantity of CTs was significantly associated with FC across these networks (Pillai's trace, F(2,46) = 4.80 p = 0.013) such that infants who engaged in more CTs had lower FC in the posterior temporal network (β = −0.37, SE = 0.14, t(47) = −2.72, p = 0.009, 95%CI[−0.65, −0.010]).
Associations between infant vocalizations and FC in language networks
Given the strong positive correlation between CT quantity and infant vocalizations, we conducted a separate multivariate regression model to test the association between infant vocalizations and FC in the language networks. The quantity of infant vocalizations was associated with FC across the posterior temporal and frontotemporal networks (Pillai's trace, F(2,46) = 3.24, p = 0.048). Specifically, infant vocalizations were not associated with connectivity in the frontotemporal network but were negatively associated with connectivity in the posterior temporal network (β = −0.35, SE = 0.14, t(47) = −2.40, p = 0.019, 95%CI[−0.64, −0.06], ΔR2 = 0.11). The Bayes factor for the strength of the evidence in support of the alternative hypothesis that infant vocalizations explained variance in connectivity in the posterior temporal network relative to an intercept only model was 2.18, indicating weak evidence for the alternative hypothesis (Kass and Raftery, 1995). Thus, although both CT quantity and infant vocalizations were significantly negatively associated with connectivity in the posterior temporal network, only CT quantity explained meaningful variance in connectivity.
Discussion
We examined the relation between naturalistic language input and the FC of language networks in five- to eight-month-old infants. We passively recorded infants' language environments, calculating the quantity and consistency of AWs and adult–infant CTs in daily life. Infants completed an rsfMRI scan during natural sleep and we estimated FC among ROIs implicated in language comprehension. We first identified two subnetworks based on correlations of activation among these ROIs: a posterior temporal network comprising the left auditory cortex, left STS, and bilateral posterior STG, and a frontotemporal network comprising regions of the left IFG, the left frontal opercular cortex, and the left anterior STG. We next examined the associations of measures of naturalistic language input with FC in these language networks. Among the four measures of language input, only the quantity of CTs was associated with FC. Infants who engaged in more CTs evidenced lower connectivity in the posterior language network than did infants who engaged in fewer CTs.
Neural systems develop heterogeneously; previous research using rsfMRI to examine neurodevelopment indicates that primary sensory networks develop first, followed by higher order cognitive networks (Eyre et al., 2020). Our findings support a model in which the cortical language circuit is subdivided into two networks of functionally connected brain regions corresponding to distinct developmental stages (Skeide and Friederici, 2016). Whereas the posterior temporal network that we identified comprises regions hypothesized to support bottom-up processing of language input beginning in utero, the frontotemporal network that we identified comprises regions hypothesized to support top-down processing of lower-level language representations beginning in the second year of life (Skeide and Friederici, 2016). Although the top-down frontotemporal network develops later (Skeide et al., 2014), our results suggest that this network exists in infancy. Emergence of this network is likely supported by its structural connections, including the arcuate fasciculus, that are present in neonates (Sket et al., 2019).
Why was the quantity of CTs associated with FC in the posterior temporal network but not in the frontotemporal network? These two subnetworks likely have different developmental trajectories, allowing for the possibility that language input affects one more strongly than the other, based on the timing of exposure. On the one hand, the posterior temporal network may be especially sensitive to language input in the first months of life. During CTs, infants may coactivate cells in this network as they recruit the auditory cortex, STS, and STG to detect and categorize words. Frequent coactivation of these cells could lead to long-term potentiation (Segal, 2005). On the other hand, the top-down processing network is posited to gradually develop through the pruning of perisylvian neurons and the maturation of white matter tracts including the arcuate fasciculus (Skeide and Friederici, 2016). Associations between naturalistic language input and FC in the frontotemporal network may emerge later in development as children begin to understand complex sentences.
Whereas we found lower FC in the posterior temporal network in infants who engaged in higher numbers of CTs, the only other study to have examined the association between CTs and fMRI measures reported higher connectivity in the left IFG in relation to larger quantities of CTs (Romeo et al., 2018a). However, Romeo et al. (2018a) did not use rsfMRI; instead, they focused on responses to a language processing task in preschoolers. Thus, our findings are best interpreted in the context of research examining typical development of infant rsfMRI networks. Unfortunately, resting-state network topology in infancy is not well understood (Grayson and Fair, 2017). Although an investigation using spatial independent components analysis (ICA) to identify whole-brain networks found that the strength of connectivity within higher-order cognitive networks, including a network labeled as “auditory/language,” increased between the neonatal stage and one year, connectivity within the primary sensory network decreased across this period (Gao et al., 2015). A study focused on development from four to nine months found that the strength of connectivity decreased in all ICA-identified networks except the default mode, including in the auditory and temporal networks (Damaraju et al., 2014). Taken together, these apparently conflicting findings suggest that development of FC differs across circuits and is nonlinear across the first year of life. Rapid postnatal development may involve the preferential strengthening of long-distance connections between brain regions (see Grayson and Fair, 2017). Decreasing connectivity in certain networks may reflect more efficient within-network interactions as synapses that transmit less organized patterns of activity are pruned and more distant connections are formed (Tau and Peterson, 2010). Nonetheless, greater efficiency in a network does not necessarily underlie observations of lower FC (Poldrack, 2015).
Although our findings cannot be directly compared with studies of older children using different neuroimaging approaches, they are consistent with the general finding that CTs, as opposed to overheard adult speech, are uniquely associated with brain metrics (Romeo et al., 2018a,b; Merz et al., 2020). There are at least two explanations for the apparent unique association of CTs with FC in infancy. First, whereas overheard adult speech includes speech directed toward other adults, CTs include caregivers' initiations of and responses to infant vocalizations. Therefore, CTs likely involve a special speech register known as infant-directed speech, which is defined by the slow and melodic cadence with exaggerated pitch contours that caregivers tend to adopt when interacting with infants (Fernald et al., 1989). Infants prefer infant-directed speech to adult-directed speech (Byers-Heinlein et al., 2020), and are better able to discriminate speech sounds and segment words from infant-directed speech than from adult-directed speech (Trainor and Desjardins, 2002; Thiessen et al., 2005). Thus, CTs likely go hand-in-hand with infant-directed speech, possibly leading to enhanced language processing and, in turn, differential FC of the posterior temporal language network.
A second explanation for the unique association of CTs with FC is that CTs involve infant vocalizations. Infants who vocalize more may elicit greater responsiveness from their caregivers, leading to higher quantities of CTs. Thus, caregiver input could explain minimal variation in the brain; instead, infants who vocalize more could have differential FC of the posterior temporal network regardless of the level of caregiver input. Counter to this explanation, however, our findings suggest that adult–infant vocal interactions, rather than infant vocalizations in isolation, explain variation in FC. Although infant vocalizations were also negatively associated with FC in the posterior language network, Bayes factors indicated that only the association with CTs was meaningful. The dynamic between caregivers and infants, involving infants' active participation combined with caregivers' contingent responses, may be most important for language learning and associated brain function (Golinkoff et al., 2015).
It is important to consider the findings of this study in the context of our approach to defining the language network. We focused on a set of brain regions that have been identified as functionally specialized for language processing to constrain our analyses and conclusions. However, there are currently no consensus criteria for defining the language network (Fedorenko and Thompson-Schill, 2014). The ROIs we selected have also been implicated in other types of processing (e.g., working memory; Kumar et al., 2016; Ghaleh et al., 2020). Further, other brain regions that are involved in general purpose mental operations (e.g., areas implicated in cognitive control) may support language comprehension. Overall, any conclusions that are drawn about the function of a brain network are tied to how that network is defined (Fedorenko and Thompson-Schill, 2014).
The current study is limited by the correlational nature of the analyses. Although there is experimental evidence that increasing the amount of speech infants receive from their caregivers improves language abilities (Ferjan Ramírez et al., 2019), future research is needed to determine the effects of enhancing naturalistic language input on brain development. In addition, the sample for the current study was Western and highly educated. Unlike previous studies (Romeo et al., 2018a), we found no associations with SES. Nonetheless, we observed wide variation in language input within this restricted sample. Our findings must be replicated in a larger and more diverse sample. Infant fMRI is a burgeoning field, but less research has focused on infants beyond the neonatal period (Azhari et al., 2020). This study highlights the feasibility and importance of building a program of research focused on early postnatal experiences and infant neurodevelopment throughout the first year of life.
In conclusion, the current study provides evidence for the role of vocal interactions with caregivers, compared with overheard adult speech, in the function of language networks in infancy. The quantity of CTs was associated with patterns of neural activation in an early-developing posterior temporal language network that is responsible for the bottom-up processing of language stimulation. These findings extend those of previous studies that have identified associations between CTs and brain structure and function in older children (Romeo et al., 2018a,b; Merz et al., 2020), and suggest that rsfMRI is a useful method for investigating experience-dependent differences in infants' language networks.
Footnotes
We thank Anna Cichocki, Amar Ojha, Francesca Querdasi, Marissa Roth, Lucinda Sisk, and Jillian Segarra for their assistance in data collection and management. We also thank the participants for their contributions. This work was supported by National Institutes of Health Grants R21 MH111978 and R21 HD090493 (to I.H.G.), the National Science Foundation Graduate Student Research Fellowship (L.S.K. and M.C.C.), and the Jacobs Foundation Early Career Research Fellowship 2017-1261-05 (to K.L.H.).
The authors declare no competing financial interests.
- Correspondence should be addressed to Lucy S. King at lucyking{at}stanford.edu