Abstract
The social worlds of young children primarily revolve around parents and caregivers, who play a key role in guiding children's social and cognitive development. However, a hallmark of adolescence is a shift in orientation toward nonfamilial social targets, an adaptive process that prepares adolescents for their independence. Little is known regarding neurobiological signatures underlying changes in adolescents' social orientation. Using functional brain imaging of human voice processing in children and adolescents (ages 7-16), we demonstrate distinct neural signatures for mother's voice and nonfamilial voices across child and adolescent development in reward and social valuation systems, instantiated in nucleus accumbens and ventromedial prefrontal cortex. While younger children showed greater activity in these brain systems for mother's voice compared with nonfamilial voices, older adolescents showed the opposite effect with increased activity for nonfamilial compared with mother's voice. Findings uncover a critical role for reward and social valuative brain systems in the pronounced changes in adolescents' orientation toward nonfamilial social targets. Our approach provides a template for examining developmental shifts in social reward and motivation in individuals with pronounced social impairments, including adolescents with autism.
SIGNIFICANCE STATEMENT Children's social worlds undergo a transformation during adolescence. While socialization in young children revolves around parents and caregivers, adolescence is characterized by a shift in social orientation toward nonfamilial social partners. Here we show that this shift is reflected in neural activity measured from reward processing regions in response to brief vocal samples. When younger children hear their mother's voice, reward processing regions show greater activity compared with when they hear nonfamilial, unfamiliar voices. Strikingly, older adolescents show the opposite effect, with increased activity for nonfamilial compared with mother's voice. Findings identify the brain basis of adolescents' switch in social orientation toward nonfamilial social partners and provides a template for understanding neurodevelopment in clinical populations with social and communication difficulties.
Introduction
Children's social worlds undergo a dramatic transformation during adolescence. In younger children, socialization revolves primarily around parents and caregivers, and parent–child interactions during this stage of development play a key role in children's cognitive (Landry et al., 2006; Bernier et al., 2012), language (Liu et al., 2003; Majorano et al., 2013), and socio-emotional development (Cassidy et al., 1996; Rah and Parke, 2008). During adolescence, however, individuals increasingly engage with social targets outside the family (Larson et al., 1996). Importantly, adolescents show increased sensitivity to interactions with nonfamilial (NF) social targets (Allen et al., 2005) and seek these individuals out for social support (Furman and Buhrmester, 1992). It is thought that this shift in social orientation is adaptive and represents a key component of healthy development that prepares children for their journey toward independence (Sachser et al., 2018).
Very little is known regarding neurobiological signatures underlying changes in social orientation that occur across childhood and adolescence. Functional brain imaging of human voice processing provides a powerful approach for examining these changes during development. The human voice provides a primary channel for social engagement, and listeners are highly adept at identifying “who” is speaking from the first days of life (Kisilevsky et al., 2009). Moreover, a recent fMRI study in children (7-12 years old) identified a functional brain circuit that is selectively engaged during the processing of mother's voice (Abrams et al., 2016), a biologically salient signal associated with cognitive (Liu et al., 2003) and social function (Seltzer et al., 2012), compared with novel NF voices. Results showed that, compared with NF voices, brief samples of mother's voice elicit increased neural activity not only in auditory processing regions, including primary auditory cortex and voice-selective superior temporal sulcus (STS) (Belin et al., 2000), but also in salience (Menon and Uddin, 2010) and reward processing regions (Haber and Knutson, 2010), instantiated in the anterior insula (AI), nucleus accumbens (NAc), and orbitofrontal cortex (OFC). Results further showed that, compared with NF voices, mother's voice recruited key nodes of the default mode network (DMN) (Greicius et al., 2003), anchored in the ventromedial prefrontal cortex (vmPFC) and posterior medial cortex (posterior cingulate cortex [PCC] and precuneus), which has been implicated in self-referential (Gusnard et al., 2001) and social valuation processing (Fareri et al., 2015; Kumar et al., 2019). In contrast, when children listened to novel NF female voices compared with nonsocial environmental sounds, neural activation was prominent in voice-selective STS and the amygdala but did not elicit activity in the DMN or salience and reward processing systems.
How these patterns of brain responses change during the transition from childhood to adolescence is not understood. While previous studies in children and adolescents have shown age-related increases in neural activity in the NAc and insula in response to visual social stimuli (i.e., images of faces) (Guyer et al., 2009; Somerville et al., 2011), previous studies have not examined these neurodevelopmental effects in the context of familial and NF communication targets. Specifically, it is unknown whether the pronounced social changes that occur during adolescent development shape the neural signatures of both biologically salient voices, such as mother's voice, and NF communication targets, who become increasingly important in adolescents' social world.
Here, we investigate brain features associated with the adaptive transition toward NF social stimuli that occurs between childhood and adolescence, and test neurodevelopmental models using a cross-sectional sample of children and adolescents (7.7-16.6 years old). The primary goal of our study was to determine developmental changes in brain response to NF and mother's voice and examine whether children and adolescents show stimulus-specific preferences for these stimuli across development. Our findings reveal neurodevelopmental changes in response to mother's and NF voices and identify the neural basis of adolescents' switch in social orientation toward NF targets.
Materials and Methods
Participants
The Stanford University Institutional Review Board approved the study protocol, and all participants provided written consent for their participation in the study. Participants were recruited locally from schools near Stanford University. All children and adolescents were required to have a full-scale IQ > 80, as measured by the Wechsler Abbreviated Scale of Intelligence (Wechsler, 1999). All children and adolescents were right-handed and had no history of the following: neurologic, psychiatric, or learning disorders; negative personal and family history (first degree) of developmental cognitive disorders and heritable neuropsychiatric disorders; evidence of significant difficulty during pregnancy, labor, delivery, or immediate neonatal period; or abnormal developmental milestones as determined by neurologic history and examination. Participants are the biological offspring of the mothers who were recorded for the fMRI and behavioral experiments and were raised in homes that included their mothers. Participants' neuropsychological characteristics are provided in Table 1.
Stimuli
Stimuli consisted of the three nonsense words, “teebudieshawlt,” “keebudieshawlt,” and “peebudieshawlt,” produced by the participant's mother as well as two NF females who were also mothers (Fig. 1a; for audio examples, see Extended Data Figs. 1-1, 1-2, 1-3, 1-4, 1-5, and 1-6). Nonsense words were used to avoid activating semantic neural systems, thereby enabling a focus on the neural responses to each speaker's vocal characteristics (Binder et al., 2000; Raettig and Kotz, 2008). These particular nonsense words were selected for a number of reasons: first, stimuli are exemplars from a standardized behavioral test of phonological abilities (Wagner et al., 1999); second, the use of stimulus contrasts that differ by only one phoneme (i.e., minimal pair) enables a fine-grained assessment of phonological decoding in the auditory system; finally, the use of four syllable nonsense words, differentiated by word-initial, place-of-articulation contrasts, provides challenging but realistic speech-like stimuli. These vocal stimuli were used in both the fMRI task and the Voice Identification behavioral task. A second class of stimuli included in the fMRI task was nonspeech environmental sounds. These sounds, which included brief recordings of laundry machines, dishwashers, and other household sounds, were taken from a professional sound effects library.
Stimulus recording
Recordings of each mother were made individually for use in the voice identification and fMRI tasks. Mother's voice stimuli and NF voices were recorded in a quiet conference room using a Shure PG27-USB condenser microphone connected to a MacBook Air laptop. The audio signal was digitized at a sampling rate of 44.1 kHz and A/D converted with 16-bit resolution. Mothers were positioned in the conference room to avoid early sound wave reflections from contaminating the recordings. To provide a natural speech context for the recording of each nonsense word, mothers were instructed to repeat three sentences, each of which contained one of the nonsense words, during the recording. The first word of each of these sentences was their child's name, which was followed by the words “that is a,” followed by one of the three nonsense words. A hypothetical example of a sentence spoken by a mother for the recording was “Johnny, that is a keebudieshawlt.” Before beginning the recording, mothers were instructed on how to produce these nonsense words by repeating them to the experimenter until the mothers had reached proficiency. Importantly, mothers were instructed to say these sentences using the tone of voice they would use when speaking with their child during an engaging and enjoyable shared learning experience (e.g., if their child asked them to identify an item at a museum). The vocal recording session resulted in digitized recordings of the mothers repeating each of the three sentences ∼30 times to ensure multiple high-quality samples of each nonsense word for each mother.
Stimulus postprocessing
The goal of stimulus postprocessing was to isolate the three nonsense words from the sentences that each mother and NF spoke during the recording session and normalize them for duration and RMS amplitude for inclusion in the fMRI stimulus presentation protocol and the voice identification task. First, a digital sound editor (Audacity: http://audacity.sourceforge.net) was used to isolate each utterance of the three nonsense words from the sentences spoken by each mother. The three best versions of each nonsense word were then selected based on the audio and vocal quality of the utterances (i.e., eliminating versions that were mispronounced, included vocal creak, or were otherwise not ideal exemplars of the nonsense words). These nine nonsense words were then normalized for duration to 956 ms, the mean duration of the nonsense words produced by the NF voices, using Praat software similar to previous studies (Abrams et al., 2008). On average, speech samples were adjusted 8.7% during normalization, and this process did not affect the naturalness of the vocal stimuli. A 10 ms linear fade (ramp and damp) was then performed on each stimulus to prevent click-like sounds at the beginning and end of the stimulus, and then stimuli were equated for RMS amplitude. These final stimuli were then evaluated for audibility and clarity to ensure that postprocessing manipulations had not introduced any artifacts into the samples. The same process was performed on the NF voices and environmental sounds to ensure that all stimuli presented in the fMRI experiment were the same duration and RMS amplitude.
Voice identification behavioral task
All participants who participated in the fMRI experiment completed an auditory behavioral test immediately following the voice processing fMRI scan. The goal of the voice identification behavioral task was to ensure that participants were able to reliably discriminate their mother's voice from NF female voices. Participants were seated in a quiet area of the brain imaging suite in front of a laptop computer and facing a wall with noise-cancellation headphones placed over their ears to prevent distractions. Vocal stimuli were delivered via Eprime on the laptop computer. In each trial, participants were presented with a recording of a multisyllabic nonsense word spoken by either the participant's mother or an unfamiliar, NF voice, and the task was to indicate whether their mother spoke the word. The multisyllabic nonsense words used in the behavioral task were the exact same samples used in the fMRI task. Specifically, the stimuli that were presented in the voice identification task consisted of the three nonsense words produced 3 times each by each child's mother as well as the two NF voices (i.e., all the vocal stimuli identified in Fig. 1a). This yielded a total of 27 stimuli (3 nonsense words × 3 repetitions × 3 different speakers). Each stimulus was presented twice in a random order during the voice identification task for a total of 54 trials, including 18 trials in which mother's voice nonsense words were presented and 18 trials in which each NF voice nonsense word was presented. Participants were instructed to press a button on the laptop keyboard as soon as they knew the answer.
fMRI data acquisition parameters
All fMRI data were acquired in a single session at the Richard M. Lucas Center for Imaging at Stanford University. Functional images were acquired on a 3-T Signa scanner (General Electric) using a custom-built head coil. Participants were instructed to stay as still as possible during scanning, and head movement was further minimized by placing memory-foam pillows around the participant's head. A total of 31 axial slices (4.0 mm thickness, 0.5 mm skip) parallel to the anterior/posterior commissure line and covering the whole brain were imaged by using a T2*-weighted gradient-echo spiral in-out pulse sequence (Glover and Law, 2001) with the following parameters: TR, 3576 ms; TE, 30 ms; flip angle, 80°; one interleave. This TR can be calculated as the sum of the stimulus duration (956 ms), a 300 ms silent interval buffering the beginning and end of each stimulus presentation to avoid backward and forward masking effects, the 2000 ms volume acquisition time, and an additional 22 ms silent interval which helped the stimulus computer maintain precise and accurate timing during stimulus presentation. The FOV was 22 cm, and the matrix size was 64 × 64, providing an in-plane spatial resolution of 3.4375 mm. Reduction of blurring and signal loss arising from field inhomogeneities was accomplished using an automated high-order shimming method before data acquisition.
fMRI task
Auditory stimuli were presented in 10 separate runs, each lasting 4 min. One run consisted of 56 randomized trials of mother's voice and NF voices producing the three nonsense words, environmental sounds, and catch trials. Randomizing the order of presentation for all stimuli, including both mother's and NFs vocal samples, during fMRI data collection creates the same level of (in)dependence between iterations of the nonsense words for both vocal sources included in the data analysis. Each stimulus lasted 956 ms in duration. Before each run, participants were instructed to play the “kitty cat” game during the fMRI scan. While laying down in the scanner, participants were first shown a brief video of a cat and were told that the goal of the cat game was to listen to a variety of sounds, including “voices that may be familiar,” and to push a button on a button box only when they heard cat meows (catch trials). During each run, four or five exemplars of each stimulus type, including three speakers producing three nonsense words, multiple environmental sounds, and three catch trials, were presented. Silent trials were not included in the fMRI task. At the end of each run, participants were shown another engaging video of a cat. Across the 10 runs, a total of 48 exemplars of each stimulus condition were presented to each subject. Auditory stimuli were presented to participants in the scanner using Eprime version 1.0 (Psychological Software Tools, 2002). Participants wore custom-built headphones designed to reduce the background scanner noise to ∼70 dBA (Abrams et al., 2011, 2013a). Headphone sound levels were calibrated before each data collection session, and all stimuli were presented at a sound level of 75 dBA. Participants were scanned using an event-related design. Auditory stimuli were presented during silent intervals between volume acquisitions to eliminate the effects of scanner noise on auditory discrimination. One stimulus was presented every 3.576 s.
fMRI preprocessing
fMRI data collected in each of the 10 functional runs were subject to the following preprocessing procedures. The first five volumes were not analyzed to allow for signal equilibration. A linear shim correction was applied separately for each slice during reconstruction by using a magnetic field map acquired automatically by the pulse sequence at the beginning of the scan. Translational movement in millimeters (x, y, z) was calculated based on the SPM12 parameters for motion correction of the functional images in each subject. To correct for deviant volumes resulting from spikes in movement, we used a de-spiking procedure (Iuculano et al., 2014) similar to those implemented in the Analysis of Functional Neuroimages toolkit maintained by the National Institute of Mental Health (Cox, 1996). Volumes with movement exceeding 0.5 voxels (1.562 mm) or spikes in global signal exceeding 5% were interpolated using adjacent scans. The majority of volumes repaired occurred in isolation. After the interpolation procedure, images were spatially normalized to standard MNI space, resampled to 2 mm isotropic voxels, and smoothed with a 6 mm FWHM Gaussian kernel.
Movement criteria for inclusion in fMRI analysis
For inclusion in the fMRI analysis, we required that each functional run had a maximum scan-to-scan movement of <6 mm and no more than 15% of volumes were corrected in the de-spiking procedure. Moreover, we required that all individual subject data included in the analysis consisted of at least seven functional runs that met our criteria for scan-to-scan movement and percentage of volumes corrected; subjects who had fewer than seven functional runs that met our movement criteria were not included in the data analysis. All 46 participants included in the analysis had at least 7 functional runs that met our movement criteria; 32 of the participants had 10 runs of data that met these movement criteria; 6 subjects had 9 runs of data that met movement criteria, 6 subjects had 8 runs of data, and 2 subjects had 7 runs that met criteria.
Voxel-wise analysis of fMRI activation
The goal of this analysis was to identify brain regions that showed differential activity levels in response to mother's voice, NF voices, and environmental sounds. Brain activation related to each task condition was first modeled at the individual subject level using boxcar functions with a canonical HRF and a temporal derivative to account for voxel-wise latency differences in hemodynamic response. Low-frequency drifts at each voxel were removed using a high-pass filter (0.5 cycles/min), and serial correlations were accounted for by modeling the fMRI time series as a first-degree autoregressive process (Friston et al., 1997). Voxel-wise t-statistics maps for each condition were generated for each participant using the GLM, along with the respective contrast images. Group-level activation was determined using individual subject contrast images and a second-level ANOVA. The contrasts of interest were as follows: (1) [NF voices – environmental sounds]; (2) [mother's voice – environmental sounds]; and (3) [NF voices – mother's voice]. To ensure that the inclusion of two NF voices compared with the (one) mother's voice stimulus did not bias fMRI contrast betas and T-maps, a value of 1 was entered into the contrast matrix for mother's voice, while a value of 0.5 was entered into the contrast matrix for each of the two NF voices. A fourth contrast of interest [NF voice #1 – NF voice #2] served as a control analysis to examine whether brain activation differences in response to the two NF voices used in the study vary as a function of age.
For all voxel-wise analyses, including the age covariate analysis (Figs. 2-4) and main effects of all stimulus contrasts (Fig. 5), significant clusters of activation were determined using a voxel-wise statistical height threshold of p < 0.005, with family-wise error corrections for multiple spatial comparisons (p < 0.05; 67 voxels) determined using Monte Carlo simulations using a custom MATLAB script. Significant correlations are inherent to all scatterplots included in the age covariate analysis (Figs. 2-4) because they are based on results from the whole-brain GLM analysis (Vul et al., 2009); however, the results provide important information regarding the distributions and covariation of activity strength in response to voices and age. For age covariate analyses, effect sizes were computed as Cohen's f according to Equation 1, where t is the mean t-score within a cluster and N is the sample size, as follows:
To examine GLM results in the NAc, a small subcortical brain structure, we used a small volume correction at p < 0.005 using the Harvard-Oxford probabilistic maps of the NAc thresholded at 25%.
Acoustical analysis
The goal of the acoustical analysis was to examine whether physical attributes of the mother's voice stimuli may have contributed to age-related increases in brain activity for the NF voice versus mother's voice contrast (Fig. 4). Acoustical features of each mother's voice stimulus were extracted, including mean pitch, pitch SD, pitch slope, spectral center of gravity, and spectral SD, using Praat software similar to previous studies (Abrams et al., 2016, 2019). These acoustical values were then averaged across pseudoword stimuli for each mother. We examined the relation between mean acoustical values for each mother and signal levels measured in ROIs identified in Figure 4, including left-hemisphere NAc and right-hemisphere vmPFC, for the [NF – mother's voice] contrast. GLM betas were extracted from left-hemisphere NAc and right-hemisphere vmPFC. Coordinates for these ROIs were based on the voxel with the peak T-score in that region for each of these whole-brain age covariate maps. The cortical ROI (i.e., vmPFC) was defined as a 4 mm sphere, and the subcortical ROI (i.e., NAc) was a 2 mm sphere, and signal level was calculated by extracting the β value from individual subjects' contrast maps within each ROI and computing the mean β value for each ROI. Results were FDR-corrected for multiple comparisons.
Sex difference analysis
A regression analysis was performed to examine the effect of sex on age-related changes in ROI signal level measured in response to the [NF voices > mother's voice] GLM contrast. GLM betas were extracted from left-hemisphere NAc and right-hemisphere vmPFC as described previously (see Acoustical analysis). We then built a regression model with age as the dependent variable and sex, mean β value for each ROI, and their interaction as predictors in the regression equation. Significant results for the [sex × mean β value] interaction are reported (p < 0.05).
Support vector regression (SVR) analysis: brain activity levels and age prediction
The robustness and replicability of fMRI data remain a crucial concern for the field, and an established approach for addressing this issue is to perform a confirmatory cross-validation (CV) analysis (Cohen et al., 2010). We therefore used SVR to perform a confirmatory CV analysis that uses a machine-learning approach with balanced fourfold CV combined with linear regression (Cohen et al., 2010). In this analysis, we extracted individual subject activation β values taken from the [NF voices > environmental sounds], [mother's voice > environmental sounds], and [NF voices > mother's voice] GLM contrasts as described above. Mean β values for each ROI were entered as independent variables in a linear regression analysis with age as the dependent variable. r(predicted, observed), a measure of how well the independent variable predicts the dependent variable, was first estimated using a balanced fourfold CV procedure. Data were divided into four folds so that the distributions of dependent and independent variables were balanced across folds. Data were randomly assigned to four folds and the independent and dependent variables tested in one-way ANOVAs, repeating as necessary until both ANOVAs were insignificant to guarantee balance across the folds. A linear regression model was built using three folds leaving out the fourth, and this model was then used to predict the data in the left-out fold. This procedure was repeated 4 times to compute a final r(predicted, observed) representing the correlation between the data predicted by the regression model and the observed data. Finally, the statistical significance of the model was assessed using a nonparametric testing approach. The empirical null distribution of r(predicted, observed) was estimated by generating 1000 surrogate datasets under the null hypothesis that there was no association between changes in age and brain activity levels.
Results
Voice identification in children and adolescents
We first examined behavioral sensitivity for recognizing mother's voice in child and adolescent participants. Results from the voice identification behavioral task revealed that child and adolescent participants showed high levels of accuracy for identifying mother's voice with a mean accuracy of 97.7% across all participants (SD = 0.05; Fig. 1b). There was neither a relationship between age and mother's voice identification accuracy (R = 0.09; p = 0.54) nor was there a group difference in accuracy when splitting the sample into younger participants (7- to 12-year-olds) and older participants (13- to 16-year-olds; 2 samples t test, p = 0.73). Reaction time (RT) results from the voice identification task revealed a negative relationship between age and RT (Fig. 1c; R = −0.32; p = 0.03), which is consistent with previous studies reporting reductions in RT as a function of age (Kail, 1991a,b; Kwon et al., 2002). Results from the voice identification task show that all participants were highly accurate at identifying the vocal sources included in the fMRI voice processing task.
Extended Data Figure 1-1
Exemplar of nonsense word “keebudishawlt” used in fMRI task produced by NF female 1. Download Figure 1-1, WAV file
Extended Data Figure 1-2
Exemplar of nonsense word “peebudishawlt” used in fMRI task produced by NF female 1. Download Figure 1-2, WAV file
Extended Data Figure 1-3
Exemplar of nonsense word “teebudishawlt” used in fMRI task produced by NF female 1. Download Figure 1-3, WAV file
Extended Data Figure 1-4
Exemplar of nonsense word “keebudishawlt” used in fMRI task produced by NF female 2. Download Figure 1-4, WAV file
Extended Data Figure 1-5
Exemplar of nonsense word “peebudishawlt” used in fMRI task produced by NF female 2. Download Figure 1-5, WAV file
Extended Data Figure 1-6
Exemplar of nonsense word “teebudishawlt” used in fMRI task produced by NF female 2. Download Figure 1-6, WAV file
Age-related changes in neural response to human voices compared with nonsocial stimuli
A primary goal of our study was to identify brain areas which showed age-related changes in neural response to human voices compared with nonsocial stimuli. An influential model identifies adolescence as a “sensitive period” for social information processing characterized by increased sensitivity to social signals (Blakemore and Mills, 2014), and a prediction of this model is that the transition from childhood to adolescence is accompanied by a general increase in sensitivity to social signals, including those produced by both NF and familial sources, compared with nonsocial stimuli. We therefore used fMRI to measure brain activity in response to brief (0.95 s) samples of three sound sources in 46 neurotypical children and adolescents (for demographic and IQ measures, see Table 1). Auditory stimuli included mother's voice, unfamiliar NF voices, and nonsocial environmental sounds, an acoustical and cognitive control condition of the same duration and intensity as the vocal stimuli (for stimulus design, see Fig. 1a).
Compared with nonsocial environmental sounds, both NF voices (Fig. 2) and mother's voice showed extensive age-related increases in neural activity (Fig. 3; for main effects for all contrasts, see Fig. 5). Age-related increases in activity in response to both vocal stimuli were evident in voice-selective STS, salience processing regions, including AI and dorsal anterior cingulate cortex (dACC), and a key node of the DMN, instantiated in PCC. A critical distinction between these response profiles is that age-related changes in response to NF voices (Fig. 2), but not mother's voice (Fig. 3), were evident in NAc and vmPFC. These results indicate that, compared with nonsocial environmental sounds, both mother's voice and NF voices show age-related increases across a wide expanse of auditory, salience, and social evaluative processing systems. Findings suggest that a large extent of the social brain increasingly “tunes in” to a range of social stimuli, including both mother's voice and NF voices, as children progress into adolescence.
Age-related changes in neural response to mother's versus NF voices
A critical next step in our analysis was to examine age-related changes in neural activity associated with the direct comparison between mother's and NF voices. This analysis provides an opportunity to examine whether children and adolescents show stimulus-specific neural preferences for these stimuli across different ages during child and adolescent development. Results from this analysis confirmed age-related increases in activity for NF voices compared with mother's voice in NAc of the reward processing system and vmPFC of the DMN (Fig. 4). Specifically, younger children showed a preference for mother's voice compared with NF voices in the NAc and vmPFC, whereas older adolescents revealed a preference for NF voices compared with mother's voice in these brain regions. The y axis zero-crossings for the fitted regression lines (scatterplots, dotted blue vertical line) show that the neural transition between mother's and NF voice preference occurs between 13 and 14 years of age in the NAc and vmPFC.
Control analyses
We performed multiple control analyses to further probe these age-related increases in neural activity for NF voices compared with mother's voice. First, an analysis was performed to examine the influence of low-level acoustical features in the vocal stimuli on neural results. Acoustical characteristics of voices, which include pitch, harmonic, and amplitude features, are idiosyncratic between speakers, enabling listeners to rapidly discriminate between vocal sources (Hecker, 1971). Acoustical features of each mother's voice stimulus were extracted, including mean pitch, pitch SD, pitch slope, spectral center of gravity, and spectral SD, and were then averaged across pseudoword stimuli for each mother. The relation between acoustical values for each mother and signal levels measured in ROIs identified in Figure 4 for the [NF – mother's voice] contrast was then examined. Results showed that none of the correlations between acoustical features of mother's voice and brain activation in response to mother's voice survived FDR correction (q < 0.05), indicating that age-related changes in brain responses to vocal stimuli did not simply reflect low-level acoustical features of the stimuli. Rather, results suggest that neural response properties reflect the biological salience of these stimuli.
A second control analysis examined whether brain activation differences for the two NF voices vary as a function of age, potentially influencing the observed patterns of results. Results from this analysis failed to reveal a relationship between brain activation differences for the two NF voices and age in either of the brain regions highlighted in Figure 4 (p > 0.30 for all regions).
A third control analysis examined whether sex differences influence age-related increases in neural activity for NF voices compared with mother's voice, consistent with previous studies that have shown that sex differences are associated with behavioral (Kret and De Gelder, 2012) and neural processing (Whittle et al., 2011) of social-emotional stimuli. Regression analysis focused on the brain regions identified in the direct comparison between mother's and NF voices (Fig. 4). Regression results revealed that male and female groups showed similar age-related increases in neural response for NF compared with mother's voice (p > 0.15 male vs. female group comparisons for both NAc and vmPFC). Results from this analysis revealed that sex differences did not contribute to age-related increases in neural activity for NF voices compared with mother's voice.
A fourth control analysis examined whether behavioral accuracy (Fig. 1b) or RT (Fig. 1c) measured during the voice identification task were related to neural activity for NF voices compared with mother's voice. Results from this analysis failed to reveal a relationship between neural activity for NF voices compared with mother's voice and either voice identification accuracy (p > 0.35) or RT (p > 0.10).
Robustness of age-related neurodevelopmental changes
To examine the robustness and reliability of brain activity levels for predicting age, we used SVR to perform a confirmatory CV analysis that uses a machine-learning approach with balanced fourfold CV combined with linear regression (Cohen et al., 2010; Abrams et al., 2016). Results showed that the strength of neural activity was a reliable predictor of age across child and adolescent participants for all regions identified in Figures 2-4 (R ≥ 0.34; p ≤ 0.006 for all regions). These results demonstrate that child and adolescent development is associated with robust age-related increases in neural activity across a wide array of brain systems in response to human voices compared with nonsocial environmental sounds, including both NF voices and mother's voice. Importantly, in a direct comparison between mother's voice and NF voices, stimulus-specific preferences in NAc and vmPFC for these stimuli were observed at different ages during development, reflecting social preference for mother's voice in young children and preference for NF voices in adolescents.
Discussion
A hallmark of children's social worlds is a focus on parents and caregivers while adolescents show a shift in social orientation toward NF social targets, and here we demonstrate stimulus-specific neural preferences for mother's voice and NF voices in key nodes of the reward and default mode networks across neurodevelopment. Specifically, younger children showed greater neural activity in NAc of the reward circuit and vmPFC of the DMN, a brain network associated with social valuation, in response to mother's voice compared with NF voices, whereas older adolescents revealed greater neural activity in response to NF voices compared with mother's voice in these brain systems. Findings provide new information by highlighting neural features underlying changes in social orientation that occur during adolescent development.
Findings from the current study inform our understanding of several key principles regarding adolescent social development. First, a prominent developmental model identifies five distinct stages of social development across the lifespan, each of which is defined by a primary social target whose engagement is vigorously pursued within each developmental stage (Nelson et al., 2016). For infants, mothers and caregivers are the primary social target. During the juvenile phase, which extends from weaning to puberty, the mother/caregiver is considered a core target (with an increasing focus on peers), whereas the adolescent phase, which extends between puberty and full maturity, focuses on integration with NF peer social groups. This model further states that shifts across developmental stages are accompanied by a reduction in the motivation to engage with the target from the previous stage. For example, the model predicts that adolescence would be associated with both increased motivation to engage with NF social targets and reduced motivation to engage with parents and caregivers. Findings from the current study, which spanned the juvenile and adolescent phases from this model (Nelson et al., 2016), are consistent with this model by revealing that regional activity profiles within the NAc and vmPFC reflect the primary social targets identified for these two developmental stages, with younger children showing increased activity for mother's compared with NF voices and older adolescents showing the opposite effect. Increased activity for NF voices in the NAc is particularly relevant given extensive evidence from behavioral (Steinberg et al., 2008; Somerville et al., 2017) and neural studies (Galvan et al., 2006; Steinberg, 2008) showing heightened sensitivity to novelty in reward systems during adolescence. Current findings add to this literature by showing that brief (<1 s) auditory social cues produced by novel social targets elicit increased NAc activity in adolescents relative to mother's voice.
Findings further inform key components of a related developmental model which identifies adolescence as a “sensitive period” for social information processing characterized by increased sensitivity to social signals (Blakemore and Mills, 2014). For example, previous studies have consistently shown that adolescents are more adept at key aspects of social perception (Fuhrmann et al., 2016) and cognition (Güroğlu et al., 2009; Dumontheil et al., 2010) compared with pre-adolescent children. Therefore, a plausible hypothesis is that adolescence would be accompanied by a general increase to social signals, including those produced by both NF and familial sources, compared with nonsocial stimuli. Consistent with this hypothesis, results from the current study revealed age-related increases in neural activity in voice-selective STS, AI, and dACC of the salience network, and PCC of the DMN, in response to both NF and mother's voice stimuli compared with nonsocial environmental sounds (Figs. 2 and 3). While a comprehensive test of a sensitive period model requires longitudinal data and additional samples with adult participants, our findings are consistent with an important component of this model by showing a shift in neural activity profiles across sensory, salience, and default mode regions during social stimulus processing between childhood and adolescence. It is further hypothesized that “tuning into” perceptual aspects of social information, such as human vocal stimuli, during adolescence may serve as a critical precursor for increased higher-order social cognitive processing, including understanding the perspectives and intentions of others during interactions (Güroğlu et al., 2009; Dumontheil et al., 2010).
A consideration regarding the stimuli used in this study is that there are two major differences between the mother's voice and NF voice conditions: familial versus NF (social relationship) and familiar versus unfamiliar (familiarity). Therefore, there is some ambiguity regarding whether age-related changes are because of changes in processing familiarity or changes in processing social relationship. The “familial” versus “familiar” question is inherent to all mother's voice studies, of which there is an extensive behavioral literature dating back many decades (Mills and Melhuish, 1974). Because of the unique nature of mother's voice, none of these studies address, or could possibly address, the question: do infants recognize mother's voice because they are familial or familiar? In the context of the current study, the crucial finding is that, consistent with mother's voice being a unique and biologically salient signal associated with social and language learning, children show increased neural selectivity in reward processing regions for mother's voice versus nonmaternal voices. Remarkably, our study is the first to show that adolescents show the opposite effect, with increased neural selectivity in reward processing regions for nonmaternal voices compared with mother's voice. We believe that our neurodevelopmental study provides an exciting starting point for future research to disentangle the contribution of these factors.
In conclusion, we have examined the developmental features underlying changes in neural sensitivity to familial and NF individuals across childhood and adolescence. Our findings demonstrate that brain systems involved in reward and social valuation processing show stimulus-specific preferences for mother's and NF voices during different ages across child and adolescent development. Findings provide a neurobiological template for understanding dynamic changes in social orientation throughout the lifespan in both neurotypical populations and clinical psychiatric populations who experience deficits in social function, such as individuals with autism (Abrams et al., 2013b, 2019).
Footnotes
This work was supported by National Institutes of Health Grants K01 MH102428 to D.A.A., DC011095 and MH084164 to V.M., and DC017950 and DC017950-S1 to D.A.A. and V.M.; Brain and Behavior Research Foundation NARSAD Young Investigator Grant to D.A.A.; the Singer Foundation; and Simons Foundation/SFARI 308939 to V.M. All fMRI activation maps reported in the manuscript will be made available at NeuroVault (https://neurovault.org/collections). Full single subject raw data will be made public on the National Institutes of Health NDAR repository, as per National Institutes of Health rules (procedure is ongoing). We thank all the children and their parents who participated in our study; Emma Adair and the staff at the Stanford Lucas Center for Imaging for assistance with data collection; Shelby Karraker for assistance with data processing; and Heidi Abrams and Cindy Anderson for help with stimulus production.
The authors declare no competing financial interests.
- Correspondence should be addressed to Daniel A. Abrams at daa{at}stanford.edu or Vinod Menon at menon{at}stanford.edu