Abstract
As adolescents transition to the complex world of adults, optimizing predictions about others' preferences becomes vital for successful social interactions. Mounting evidence suggests that these social learning processes are affected by ongoing brain development across adolescence. A mechanistic understanding of how adolescents optimize social predictions and how these learning strategies are implemented in the brain is lacking. To fill this gap, we combined computational modeling with functional neuroimaging. In a novel social learning task, male and female human adolescents and adults predicted the preferences of peers and could update their predictions based on trial-by-trial feedback about the peers' actual preferences. Participants also rated their own preferences for the task items and similar additional items. To describe how participants optimize their inferences over time, we pitted simple reinforcement learning models against more specific “combination” models, which describe inferences based on a combination of reinforcement learning from past feedback and participants' own preferences. Formal model comparison revealed that, of the tested models, combination models best described how adults and adolescents update predictions of others. Parameter estimates of the best-fitting model differed between age groups, with adolescents showing more conservative updating. This developmental difference was accompanied by a shift in encoding predictions and the errors thereof within the medial prefrontal and fusiform cortices. In the adolescent group, encoding of own preferences and prediction errors scaled with parent-reported social traits, which provides additional external validity for our learning task and the winning computational model. Our findings thus help to specify adolescent-specific social learning processes.
SIGNIFICANCE STATEMENT Adolescence is a unique developmental period of heightened awareness about other people. Here we probe the suitability of various computational models to describe how adolescents update their predictions of others' preferences. Within the tested model space, predictions of adults and adolescents are best described by the same learning model, but adolescents show more conservative updating. Compared with adults, brain activity of adolescents is modulated less by predictions themselves and more by prediction errors per se, and this relationship scales with adolescents' social traits. Our findings help specify social learning across adolescence and generate hypotheses about social dysfunctions in psychiatric populations.
Introduction
As social networks grow in size and complexity, adolescents have to continuously adapt to ever more challenging environments (Crone and Dahl, 2012). This adaptation requires improving Theory of Mind (ToM) abilities, that is, improving predictions about others' preferences, mental states, and behaviors (Crick and Dodge, 1994). However, a mechanistic account of ToM development over the course of adolescence is dearly needed, especially given that aberrant social development during adolescence increases the risk for psychiatric disorders (Mrazek and Haggerty, 1994).
Mounting evidence suggests that multiple decision processes, including social decision making and their neural implementation, follow a nonlinear (inverted) U-shaped trajectory (Crone and Dahl, 2012; Pfeifer and Blakemore, 2012; Hartley and Somerville, 2015). Studies, including broad age ranges and longitudinal designs, highlight adolescence as a sensitive period, in which individuals show more explorative behavior, cognitive flexibility, and reward sensitivity in nonsocial contexts (Cohen et al., 2010; Galvan, 2010; Hauser et al., 2015). Yet, emerging evidence indicates that adolescents are more rigid in social contexts (Jones et al., 2014).
Adolescents' reduced flexibility in social decision making and ToM capabilities may result from the dramatic changes in brain regions that support social information processing. Regions assigned to the ToM network, such as the mPFC (Yang et al., 2015; Rosenblau et al., 2016), undergo the most prominent structural and functional development during adolescence (Blakemore, 2010). Some evidence relates ToM abilities to brain plasticity over the course of adolescence (Blakemore and Mills, 2014); for instance, adolescents display greater mPFC activity during mental state inference than adults (Blakemore, 2007). But it is unclear to what extent the mPFC reflects ongoing development in making predictions about others. A precise characterization of social development, in particular ToM development, during adolescence is crucial for understanding adolescents' increased risk for developing neuropsychiatric disorders (Malti and Krettenauer, 2013; Laible et al., 2014).
Here, we capitalized on combining computational modeling with functional neuroimaging for elucidating social learning in typical adolescents. Computational models, in particular variants of reinforcement learning (RL) models, provide a mechanistic description of how humans learn about rewards in the nonsocial domain; these models rely on prediction errors (PEs), the differences between expected and experienced outcomes, to update expectations about the future (Montague et al., 2006; Dayan and Niv, 2008). Recently, such models have been harnessed to describe how adults make dynamic social decisions (Hampton et al., 2008; Behrens et al., 2009; Ruff and Fehr, 2014; Garvert et al., 2015). But simple RL models fall short of capturing the complex dynamics of tracking another person's preferences (Hampton et al., 2008; Behrens et al., 2009), especially when self-related information processing and attribution biases may interfere with learning (Korn et al., 2012, 2016).
Here, we devised a novel task, in which participants had to predict other people's mental states, specifically their preferences for activities, fashion, and food items, over time, and could learn about these people's actual preferences from trial-by-trial feedback. By adapting RL models to describe learning about others, we followed a burgeoning literature (Behrens et al., 2008; Hampton et al., 2008; Ruff and Fehr, 2014). Specifically, we pitted variants of the simple RL model against more specific combination models, to describe how adolescents and adults updated predictions about a person's preference over time. The combination models assume that participants' predictions are based on trial-by-trial feedback about the person's actual preference and their own preference (OP) for a given item.
First, we expected adults to make more accurate predictions (resulting in overall lower PEs). Second, adolescents may rely more on their OPs than adults when making judgments. Third, we expected developmental changes in neural encoding of model variables, notably predictions and PEs in the mPFC. Finally, we hypothesized that adolescents' tendency to update predictions would vary with their social traits, measured with the Social Responsiveness Scale (SRS) (Constantino and Gruber, 2012), a parent-report questionnaire calibrated for typical adolescent samples and adolescents with autism (Payakachat et al., 2012). Given that the current literature suggests nonlinear development of decision-making skills, including social decision making (Casey et al., 2008; Hartley and Somerville, 2015), we tested linear as well as quadratic effects of age and social traits on adolescents' behavioral and neural responses (Shaw et al., 2012; Somerville et al., 2013; Braams et al., 2015).
Materials and Methods
Participants
Adult participants were recruited via mailing lists and advertisements at Yale University. Twenty-one adults took part in the study (12 female; mean ± SD, age = 28.4 ± 4.0 years; age range: 23–36 years). Twenty-eight adolescent participants, matched for gender, were recruited via mailing lists and existing participant databases (12 female, age = 13.8 ± 2.3 years; age range: 10–17 years). Participants met MRI inclusion criteria; they did not have any neuropsychiatric disorder and did not take psychotropic medication. Included participants did not show head motion deviation from the initial position >4.5 mm or 4.5 degrees on any of the three translational or three rotational axes at any point throughout the scan. Four adolescents were excluded from the analysis due to excessive motion or insufficient behavioral responses (we excluded participants with <50% of valid responses in any run, N = 2). This resulted in a final sample of 24 adolescents (10 female; age = 13.5 ± 2.2 years; age range: 10–17 years). The study was approved by the Human Research Protection Committee at Yale University. All participants provided written informed consent and received $100 compensation. We consented adolescent participants in the presence of at least one parent guardian. We obtained both parental written informed consent and additional written assent from the adolescents.
Preferences survey
We designed a preference survey to acquire preference profiles for the main fMRI study. None of the survey participants took part in the main fMRI study. The survey comprised pictures of activities, fashion, and food items and a short demographic questionnaire. It was available through the Yale Qualtrics Survey Tool (www.yalesurvey.qualtrics.com) and took ∼25 min to complete. Survey participants rated how much they liked each item on a 10 point Likert scale ranging from 1 (not at all) to 10 (very much) and were offered a $5.00 gift card upon completion. Six adult coworkers, who were not otherwise involved in the study, and 6 adolescents, who participated in previous studies, came to the laboratory to complete the survey. We selected 3 adult and 3 corresponding adolescent profiles that were maximally distinguishable (i.e., the preferences of the 3 selected adult and adolescents should show the lowest absolute agreement in ratings using intraclass correlations [ICC]). The average absolute agreement of the three selected profiles in both groups was low to moderate (Hays and Reviki, 2005) (adult ICC = 0.495 with a 95% CI from 0.395 to 0.581 (F(343,686) = 1.987, p < 0.001); adolescent ICC = 0.713 with a 95% CI from 0.655 to 0.762 (F(343,686) = 3.565, p < 0.001). The participants who took part in the main fMRI study took the same survey after completing the preference task (see Experimental design and statistical analysis).
Stimuli
The 343 picture items of the preference survey were assigned to one of three categories with four subcategories each. The survey included 120 activity items (arts and crafts; music; sports; toys, gadgets, and games), 106 fashion items (accessories; bags; cosmetics; shoes), and 117 food items (fast food; healthy savory food; raw fruits and vegetables; sweets). We chose categories, which were broad enough to reflect the interests of adolescents and adults (the stimuli are available by contacting the corresponding author). The picture stimuli either originated from validated stimulus sets (Brunyé et al., 2013) (http://ase.tufts.edu/psychology/spacelab/pubs.htm; http://cvcl.mit.edu/MM/) or were freely available online. All pictures were postprocessed with Adobe Photoshop (version CS5.1, Adobe Systems). To make the stimulus set more homogeneous, we replaced the backgrounds with a white background, removed writing such as brands from the objects, and saved all images with a resolution of 500 × 500 pixels (5.208 × 5.208 inches).
fMRI task stimuli
For the fMRI experiment, we randomly chose 49 activity, 30 fashion, and 41 food items (120 items in total), including a relatively equal number of subcategory items from the survey item pool. These were divided into three equivalent item sets of 40 items each. We subsequently assigned each of the three adult and adolescent preference profiles to one of the three item sets. This way, adolescents and adults saw the same items during the fMRI experiment but rated different individuals who were part of their own peer group. The overall preference distribution did not differ between the groups (overall: F(1,119) = 1.066, p = 0.727). Furthermore, variances of preference ratings for each category across the three profiles did not differ between the groups (activities: F(1,2) = 1.142, p = 0.934; fashion: F(1,2) = 0.058, p = 0.109; food: F(1,2) = 0.359, p = 0.528).
Experimental design
We first acquired preference profiles of adolescents and adults with an online survey (see Preference survey). The survey asked for a person's preference for a number of activity, fashion, and food items depicted by pictures. In the following fMRI study, participants who had not previously completed the survey were asked to infer the preferences of 3 people from their peer group. Following their rating for an item, they received trial-by-trial feedback about the other's actual preference rating (i.e., rating outcomes). The task consisted of three functional runs (one run for each person's preference ratings). One run lasted for ∼9 min 30 s (total task duration was ∼30 min).
fMRI experiment
First, participants read a short task introduction while in the magnet. This stated that they were going to judge how much a person likes certain items and that they should provide the answer by choosing a number on a 10 point Likert scale: from 1 (not at all) to 10 (very much). After rating each item, participants would see the person's actual rating. They were also instructed to memorize what people liked or not, but they were not given any specific memory strategies. Thereafter, participants practiced making the choices using a cylinder button box. They pressed either index or middle finger to slide up and down the scale, and selected their response by pressing the confirmation button with their thumb. Once they confirmed, a red box lit up around the number on the screen, indicating their final response. The confirmation button had to be pressed for the response to be included into the subsequent data analysis. The overall frequency of missing responses did not differ between adults and the remaining adolescent sample (adults: 3.4 ± 3.4; range: 0–14; adolescents: 3.3 ± 3.3; range: 0–12; t(43) = 0.05, p = 0.962).
Before each run, a short vignette introduced participants to the person for whom they would subsequently rate the items. The vignette contained name, age, and profession of the adult (e.g., “Mathew is 25 years old. He is a third-year medical student”) or adolescent person (e.g., “Lisa is 14 years old. She is in middle school in the ninth grade”). The rating outcomes were the true ratings of people who had participated in an initial preference survey (for more information please refer to the preference survey section). The names of the selected individuals were slightly changed to deidentify them. To ensure similar task difficulty for adults and adolescents and increase the task's resemblance to real life scenarios, participants were asked to infer preferences of individuals from their peer group: adults judged adult profiles and adolescents judged adolescent profiles (Fig. 1A). The vignettes for both age groups consisted of the same number of characters (t(4) = −1.21, p = 0.292) and did not significantly differ in their usefulness to adults and adolescents (as indicated by a lack of group difference in the three initial PEs; t(43) = −0.831, p = 0.4101).
Preference task. A, Before each run, participants were introduced to the person whose preferences they would subsequently rate. Adults and adolescents rated preferences of persons from their own peer group on a 10 point Likert scale (1 = “not at all” to 10 “very much”; rating phase) and received trial-by-trial feedback about the person's actual rating for the item (feedback). B, After the preference task, participants rated their OPs for the same and similar additional items on the same rating scale. C, The reinforcement learning (RL) framework in the context of the preference task. The combination model, which best described participants' ratings in both groups, assumes that ratings rely on RL and participants' OPs for the items. γ, free parameter, which formalizes the assumption that participants use a weighted combination of RL and their OPs to predict others' preferences.
On each task trial, participants were asked how much the person to whom they were introduced liked a particular item. They rated the item on a 10 point Likert scale, as described above (i.e., rating phase; 5 s). After a jittered interval (1–5 s), they saw the person's rating for the item (outcome phase; 2 s). Participants judged 40 items per run (120 across the 3 runs). Pictures of items did not repeat. Participants could, however, take the person's rating (i.e., feedback) into account when judging the next similar item. For instance, to judge the other's preference for apples, they could take into account how much the other preferred oranges on a previous trial (for a detailed description of the models used to describe participants' ratings, see Computational models).
Post-fMRI assessments
After participants completed the fMRI task, they were asked to describe the individuals based on what they had learned from the task. We chose an open answer format for the question to probe whether participants were able to form an impression based on a person's preferences for specific items. We summarized the attributes participants used to describe each of the 3 persons in word clouds (Fig. 2B). We also investigated whether participants formed a social impression of each person, by generalizing from the items presented to our predefined categories and even further to character traits. Two raters, unfamiliar with the objective of this study, coded the frequency of classifications (i.e., categories, subcategories, and personality inferences) in both adult and adolescent groups (on average, raters agreed on 84.6% of their classifications). We computed and compared between-group differences for classifications, if both raters agreed. The frequency of overall classifications, and specific classification type did not differ between groups (χ(1, N = 184)2 = 1.39, p = 0.238; χ(10, N = 184)2 = 13.48, p = 0.198; Figure 2A). Last, participants completed the same preference survey as the initial survey respondents whose profiles we used in the fMRI task (Fig. 1B; see Preference survey). We assessed participants' OPs after the scanner task to avoid priming them toward their OPs when judging those of their peers. Participants did not know beforehand that they would be asked for their OPs in this study. We also tested whether adolescents had more rigid preferences for the fMRI task items compared with adults. This was not the case. The variance of OP ratings for the three categories (activities, fashion, and food) did not differ between adults and adolescents (Box's 2.419, F(6,44080) = 0.386, p = 0.888).
Free recollection of preference profiles in the adult and adolescent groups. A, Frequencies of predefined categories, subcategories, and personality trait inferences mentioned in the adult and adolescent groups. Both groups provided a similar amount of general descriptions of the other persons in a free recall task after the fMRI experiment. B, Word clouds represent the descriptions of adults and adolescents for the rated persons.
Social skill assessment in the adolescent group
For most adolescents (N = 21), we were able to obtain an additional, independent assessment of social skills. In a previous unrelated study, parents completed the SRS first (N = 5) (Constantino and Gruber, 2005) or second (N = 16) (Constantino and Gruber, 2012) edition. For 4- to 18-year-olds, the SRS-2 represents exactly the same item set as the SRS. The SRS measures the presence of impairments in reciprocal social behaviors, typically associated with autism spectrum disorder. This measure has been validated as a measure of autistic traits in typical and autism spectrum disorder samples (Payakachat et al., 2012). All adolescents in this sample scored in the normal, nonautistic range (i.e., their total SRS T-score was <59; 44.5 ± 4.5; range: 37–53). Even in this typically developing sample, with scores indicating typical social functioning, higher scores indicate less social traits.
Behavioral data analysis
On the fMRI task, the difference between participants' ratings and a person's actual preference represents an error independent of the direction of deviation (positive or negative). Overall accuracy was thus defined as the average of absolute PEs (i.e., the absolute differences between participants' ratings and the feedback they subsequently received).
We first identified computational models that best described the behavior of adolescents and adults (see Computational models). Second, we investigated whether the free parameters of the winning model (e.g., learning rates) differed between adolescents and adults. Finally, to further specify individual differences and the developmental trajectory of social learning in adolescence, we performed a hirarchical multiple regression to test linear and quadratic effects of age (i.e., age in days was transformed into age in years as a variable with 5 decimals) and SRS scores on the free parameters of the winning model. The variance of the behavioral variables of interest, such as PEs and parameter estimates, did not differ between groups. We assessed whether variables were normally distributed using the Kolmogorov–Smirnov test. Nonparametric group comparisons were performed for non-normally distributed variables.
Computational models
RL models have been instrumental in revealing behavioral and neural mechanisms of reward-based learning in animals and humans. Here we applied the RL framework to the social domain. Specifically, we probed the performance of various RL models and nonlearning models to describe developmental differences in learning about another person's preferences.
Model space
Main models.
We tested three main models. Model 1 (no-learning) assumes that participants do not learn about the others' preferences over the course of the experiment. Specifically, Model 1 assumes that participants perform a simple linear transformation of their OPs to predict the preferences of the other persons as follows:
where ER indicates estimated ratings of the other persons and OP indicates participants' own preferences. To crosscheck, we estimated this model using the MATLAB function regress, which gave the same results as our own implementation using fminsearch.
Model 2 (RL ratings) applied a variant of a Rescorla–Wagner RL rule to our task. Participants are assumed to adjust their upcoming rating of the other person's preference ERt+1 (for a given subcategory of items) on the basis of their current estimates (ERt) and the PE between this current estimate and the current Feedback (Ft), which is the other person's actual preference rating. The PE is weighted by the learning rate α, which is a free parameter as follows:
with
At the first occurrence of an item from a new subcategory, participants cannot infer the estimated rating from past experience. In these instances, we initialized ERt to the midpoint of the scale 5.5, which can be regarded as an uninformative prior.
Model 3 (combination) combines the logic of the former two models (Fig. 1C). Participants are assumed to update their estimates of the other persons' ratings as in Model 2. At the same time, participants are assumed to use their OPs for the current item to infer those of the other persons as in Model 1. Specifically, Model 3 includes the free weighting parameter γ, which formalizes the assumption that participants use a weighted combination of RL and their OPs to predict the others' preferences.
Again, ERt was initialized to 5.5 for the first item of a new subcategory.
Supplemental models.
In addition to these three main models, we explored additional models, in particular plausible extensions of the combination model (Model 3), which we found to be the best-fitting model among the main models. Model 4 (RL-self-other-diff) assumes that participants use RL to update estimates of the differences (DIFF) between the other persons' ratings and their OPs. On each trial, participants use their current estimate of this difference DIFFt and their OP for the current item to infer the estimated rating of the other person. This contrasts to Models 2 and 3, in which participants are assumed to directly update the estimates of the other persons' ratings (regardless of the difference to their OPs) as follows:
with
and
Similar to Models 2 and 3, DIFFt was initialized to 0 for the first item of a new subcategory (and additionally for each category).
Model 5 (RL ratings α-cat) was an extension of Model 2 (RL ratings) with the only difference that the model included three free parameters to capture different learning rates αactivities, αfashion, and αfood for the three main categories of the items.
A similar logic was incorporated in Models 6, 7, and 8. These models extended Model 3 (combination) to account for potential differences between item categories. Model 6 (Combination-α-cat) allowed for different learning rates αactivities, αfashion, and αfood but included a single weighting parameter γ. Model 7 (Combination-γ-cat) allowed for different weighting parameters γactivities, γfashion, and γfood but had a single learning rate α. Model 8 (Combination- α-γ-cat) incorporated both category-specific learning rates and category-specific weighting parameters. Models 9 and 10 extend Models 2 and 3 with a decay parameter, d. This decay parameter was added to account for the possibility that participants might forget about subcategories that they did not receive information about on the immediately preceding trial. Subcategory information (SI) was decayed to the initial ER of 5.5 according to the following rule:
Model 9 is a “decay RL ratings model” with two free parameters, α and d. Model 10 is a “decay combination model” with three free parameters: α, γ, and d.
Model 3 assumes one overall α and one overall for all three profiles (i.e., all three runs) and for all categories. Models 11, 12, and 13 extend Model 3 such that the free parameters α and/or γ are allowed to vary by the preference profile or run. In Model 11, the learning rates α are estimated separately for each of the three profiles, that is, the three persons (αperson 1, αperson 2, and αperson 3), for whom participants made preference inferences, but included only one weighting parameter γ. Model 12 allowed for separate weighting parameters (γperson 1, γperson 2, and γperson 3), but only one learning rate α. In Model 13, both α and γ parameters were separately estimated for each of the three person's preference profile.
Testing initialization of estimated ratings
In Models 2 and 3, we initialized ERt to the midpoint of the scale (5.5) for the first item of a new subcategory. We additionally tested whether initializing these values of ERt to participants' OPs resulted in better fit. For this, we compared two model families with each other (Rigoux et al., 2014). The first family comprised Models 2 and 3 with ERt initialized to 5.5, whereas the second comprised Models 2 and 3 with ERt initialized to OP. This model comparison showed that initializing to 5.5 resulted in a better model fit (see Fig. 3B).
Model comparisons. A, Fixed-effect comparisons of main and supplemental models reveal that the combination model is the best-fitting model in the adult and adolescent groups. That is, the difference in relative log-group Bayes factor (BF) between the combination model and next-best model is >3 for both groups. BF is calculated with respect to the No-learning model (bars for the poorly fitting RL models partly exceed the right-hand scale of the graph). Bounding model parameters between 0 and 1 did not change the results of the model comparison. B, Family random-effect analysis compares the model fit of the combination and RL models when initialized to 5.5 for the first item of a new subcategory versus participants' OPs. The exceedance probability plots show that the 5.5 initialization provides a better fit in both adult and adolescent groups. TEENS, Adolescents; ADU, adults.
Model estimation and comparison
We used linear least-squares estimation to determine best-fitting model parameters. Optimization used a nonlinear Nelder–Mead simplex search algorithm (implemented in the MATALB function fminsearch) to minimize the sum of squared errors of prediction over all trials for each participant. The maximum number of iterations allowed was set to 107. All other tolerances and stopping criteria were kept at default values. Each parameter was initialized at 0.5. All model estimations converged, and estimations converged to the same values when exploring different initializing parameters. We additionally rerun the models with constrained parameter ranges and obtained the identical pattern of results. For each model and each participant, we approximated model evidence by calculating the Bayesian Information Criterion (BIC), according to the following standard formula:
where n is the number of trials and k the number of free parameters in the model. The latter is used to penalize model complexity. We report both fixed- and random-effects model comparison. For fixed-effects analyses, we computed log-group Bayes factors by summing BIC values for each tested model across participants and then subtracting the value of the reference model (Model 1). According to the convention used here, smaller log-group Bayes factors indicate more evidence for the respective model versus the reference model. For random-effects analyses, we used the Bayesian Model Selection (BMS) procedure implemented in the MATLAB toolbox SPM12 (http://www.fil.ion.ucl.ac.uk/spm/; spm_BMS) to calculate protected exceedance probabilities, which measure how likely it is that any given model is more frequent than all other models in the population. This procedure has been established for comparing models of functional connectivity in fMRI studies and can equally be applied for comparing models of participants' behavior (Stephan et al., 2010; Rigoux et al., 2014; Vossel et al., 2014; Korn et al., 2016).
Simulations of noise in participants' ratings
The underlying assumption of our task and thus of the computational models is that participants performing the task have a similar representation of the scale as the person who completed the preference survey. To test whether model estimates are robust when assuming noisy rating choices, we performed two simulations, which show that the model-derived estimates are robust against random noise.
First, we added five different levels of normally distributed random noise to the model estimates of adults and adolescents. The distributions had a mean of 0 with a varying SD between 1 and 2 in 0.25 increments. For each noise level, we generated 100 different random variables and added these to the choices predicted by the winning combination model (Model 3, see Results). In subsequent model comparisons, we recovered the learning rates from these noisy estimates. Averaged parameter estimates using the noisy model-estimated ratings did not differ from the noise-free parameter estimates (adults' model estimates: t(20) = −0.237, p = 0.815; adolescents' model estimates: t(23) = 0.092, p = 0.928).
Second, we performed the same type of simulations, but this time we added additional noise to participants' actual ratings. We fitted the winning combination model using this noisy data. Averaged parameter estimates using noisy actual ratings also corresponded to the noise-free parameter estimates reported in Results (adults' actual ratings: t(20) = −1.237, p = 0.231; adolescents' actual ratings: t(23) = −0.457, p = 0.652).
fMRI data acquisition
Images were collected at the Yale University Magnetic Resonance Research Center on a Siemens 3T Tim Trio scanner equipped with a 12-channel head-coil. Whole-brain T1-weighted anatomical images were acquired using an MPRAGE sequence (TR = 2530 ms; TE = 3.31 ms; flip angle = 7°; FOV = 256 mm; image matrix 256 mm2; voxel size = 1 mm3; 176 slices). Field maps were acquired with a double echo gradient echo field map sequence, using 51 slices covering the whole head (TR = 731 ms; TE = 4.92 and 7.38 ms; flip angle = 50°; FOV = 210 mm; image matrix = 84 mm2; voxel size = 2.5 mm3). The experimental paradigm data were acquired in three runs of 285 volumes each (TR = 2000 ms; TE = 25 ms; flip angle = 60°; FOV = 220 mm; image matrix = 64 mm2; voxel size = 3.4 × 3.4 × 4 mm3; 34 slices). The first five volumes of each run were discarded to obtain a steady-state longitudinal magnetization.
fMRI data analysis: preprocessing
fMRI data processing was conducted using FEAT (fMRI Expert Analysis Tool) version 6.00 of FSL. The fsl_prepare_field_map tool was used to correct for geometric distortions caused by susceptibility-induced field inhomogeneities. Further preprocessing steps included motion correction using MCFLIRT (Jenkinson et al., 2002), slice-timing correction, nonbrain removal using BET (Smith 2004), spatial smoothing using a Gaussian kernel of FWHM 5 mm, and high pass temporal filtering. Images were registered to the high-resolution structural and to the MNI template using FLIRT (fMRIB's Linear Registration Tool) (Jenkinson and Smith, 2001; Jenkinson et al., 2002). Because of significant group differences in mean ± SD absolute displacement (adults: 0.26 ± 0.14 mm; adolescents: 0.51 ± 0.42; Mann–Whitney U = 128.5, p = 0.005), group analyses additionally comprised this variable as a covariate of no interest.
fMRI data analysis: statistical model
The GLM for each participant included for each of the three runs two regressors for the distinct phases of the task: the rating and feedback phases. Our main question was how brain activity during these phases was modulated by parameters derived from the winning behavioral model on a trial-by-trial basis. Evidence that model parameters were reflected in brain activity would provide biological validity to the computational model of participants' behavioral responses. To address this question, we only entered the model-estimated ratings and PEs, and not participants' actual ratings and PEs, as parametric regressors into the GLM. That is, we investigated whether model-derived ratings correlated on a trial-by-trial basis with brain activity in rating phases and PEs with brain activity in feedback phases. We also included participants' OP ratings and the received feedback as parametric regressors (these two metrics constitute inputs into the winning model). OP ratings were additionally included as parametric modulators in rating phases. Feedback, displayed as integers, was entered as a control variable to account for variance explained by seeing different numbers on the screen, which avoids that such variance may be erroneously assigned to the PE regressor. Finally, the six head motion parameters obtained after realignment were entered into the model as additional regressors of no interest.
We did not orthogonalize regressors, to avoid allocating the shared variance to only one of the regressors (Mumford et al., 2015). Correlations between parametric regressors for both groups are reported in Table 1.
Correlation between parametric regressors in fMRI analysisa
The runs were combined in a second-level within-subject analysis; and at the group level, we performed mixed-effects analyses on the contrast images using FLAME (fMRIB's Local Analysis of Mixed Effects Stages 1 and 2 with automatic outlier detection and deweighting) (Beckmann et al., 2003; Woolrich, 2008). We ran two group analyses. One analysis addressed group differences (adults vs adolescents) in the strength of correlations between ratings, PEs, and OPs with brain activity in rating and feedback phases. The second group analysis investigated individual differences across adolescence. We tested whether age and social reciprocal behavior measured with the SRS modulated brain activity related to ratings, PEs, and OPs in the adolescent group. Similar to the behavioral analyses, we tested linear and quadratic effects of age (in days transformed into age in years as a variable with 5 decimals) and SRS scores. We included age and SRS scores as linear and quadratic regressors. The age-squared and SRS-squared vectors were orthogonalized to the linear age and SRS vectors, respectively.
In light of the recent results by Eklund et al. (2016) showing that whole brain multiple-comparison cluster corrections at a p < 0.05 threshold is unacceptably prone to producing false-positive results, new quality standards have emerged demanding more conservative significance levels and a more thorough statistical reporting (Kessler et al., 2017; Nichols et al., 2017). In line with these recent guidelines, we report clusters of maximally activated voxels that survived family-wise error correction for multiple comparisons at a statistical threshold of p < 0.001 and z = 2.3.
Results
Behavior and main computational models
In line with our first hypothesis, we found that adolescents made less accurate predictions about the other persons' preferences compared with adults (i.e., they had higher average PE throughout the experiment) (Fig. 4). In both groups, PEs diminished over time and both groups were similarly able to generalize from the persons' preferences for individual items to predefined categories and even form an impression about the persons' character traits (as revealed by postscan questionnaires; see Materials and Methods; Fig. 2A).
A, Mean PEs decreased over time in both adult and adolescent groups. Pearson's correlation coefficients were as follows: r = −0.330, p = 0.0376 in the adult group and r = −0.377, 1.65 × 10−2 in the adolescent group. B, Adults had significantly lower mean PEs than adolescents (t(43) = −4.037, p = 2.19 × 10−4). TEENS, Adolescents; ADU, adults. Asterisk indicates a significance level of p < 0.01.
Crucially, we compared the suitability of a baseline nonlearning regression model (Model 1), a pure RL model (Model 2), and a “combination model” (Model 3) in describing participants' changes in predictions over time (Fig. 5C). In both groups, the winning combination model (Model 3) included two components: RL based on past feedback and participants' OPs for the item at hand. The two model parameters, the learning rate α and the weighting parameter γ, which formalizes the trade-off between RL and participants' OPs, were not significantly correlated within groups (adults: r = − 0.06, p = 0.811; adolescents: r = 0.182, p = 0.395) or over all participants (r = 0.131; p = 0.393).
Model comparison. A, Gamma parameters encoding the tradeoff between RL and OP did not significantly differ between groups (Mann–Whitney U = 234, N1 = 21, N2 = 24, p = 0.682). B, Adolescents had lower learning rates (t(43) = −4.22, p = 2.44 × 10−3 Bonferroni corrected). Asterisk indicates a significance level of p < 0.01. C, Random-effect model comparison (using BIC) showed that the evidence for the combination model was higher in both groups. The combination model or a variant thereof also emerged as winner when considering a larger model space (see Materials and Methods). Random-effect group analysis comparing adults and adolescents using the same model (both Combination or Simple-RL) versus different models (one group Combination and the other one Simple-RL). The exceedance probabilities provide conclusive evidence for groups using the same versus a different model. D, Regression analyses revealed a quadratic relationship between participants' ages and learning rates. Dashed confidence bands indicate the 95% CIs for the fitted regression line. The age regressor, calculated from days, is a continuous variable with 5 decimals. In the figure axes, age is rounded to years. TEENS: Adolescents; ADU, adults.
Contrary to our second hypothesis, the groups did not differ in their tendency to base their predictions on RL versus their OPs (as determined by a lack of evidence for a difference in the weighting parameter γ; Fig. 5A) and a lack of group difference in the optimal model initialization (5.5 vs OP; Fig. 3B). Surprisingly, however, adolescents were less flexible than adults when making social predictions as indicated by lower learning rates (i.e., a reduced tendency to update based on immediate feedback; Fig. 5B). Notably, individual differences in learning rates indicated ongoing social development across adolescence: Learning rates decreased from early to mid-adolescence and increased thereafter. This quadratic effect was significant after controlling for the linear effect (Table 2; Fig. 5D).
Hierarchical multiple regression regression testing the linear and nonlinear relationships of learning rates from the winning model with age and social traits in the adolescent group
Extended model comparison
A more comprehensive model comparison of 13 models showed clearly that, in both age groups, variants of the combination models outperformed the other model types considered here (Fig. 3A, blue bars). Specifically, in the adult group, the simplest variant of the combination model with one free parameter each for learning rate and weighting parameter (Model 3) best fitted participants' choices. In the adolescent group, a variant of the combination model with separate weighting parameters for each of the three preference profiles (Model 12) best described participants' choices according to a fixed-effects analysis. The simplest combination model variant (Model 3) performed second-best in adolescents. Adolescents' learning rates estimated by Model 12 did not significantly differ from those estimated by the Model 3 (t(23) = 0.224, p = 0.824) and were significantly lower than the learning rates of adults according to Model 12 (t(43) = 4.29, p = 1.86 × 10−4). A subsequent random-effects group comparison of Model 12 versus the simpler Model 3 did not yield conclusive evidence that adolescents indeed use a different variant of the combination model than the adult group. The probability of age groups using a different variant of the combination model is 0.5, and thus at chance level. For stringent comparability across both age groups, we investigated whether and how model parameters estimated by the simplest combination model (Model 3) are encoded in brain activity of both age groups.
Behavioral control analyses
Our behavioral result that adolescents showed more conservative updating than adults held against plausible alternative explanations. First, group differences in learning rates could arise if the adult preference profiles were easier to infer than the adolescent profiles. This was not the case: We tested how a nonsocial learning algorithm (a simple RL model) performs when predicting the adult and adolescent profiles (in the presented trial order for each participant). This model adjusts upcoming preference ratings by the veridical feedback about the other's preference, independent of participants' responses or other factors, such as their OPs. In our simulations, we found that adolescents and adults had an equal opportunity to learn on the task. There were no group differences in learning rates for adult and adolescent profiles (χ(1, N = 43)2 = 0.44, p = 0.509), demonstrating that an RL model per se can capture both adult and adolescent profiles equally well and that pure RL about adults and adolescents on this task results in similar updating behavior.
Another possibility for the observed group differences in learning rates could be that adolescents and adults differed in the extent to which they remembered the person's feedback on previous trials. To test that hypothesis, we modified the simple RL model and the combination model to include “forgetting” about the learned estimates (for a similar approach in the nonsocial domain, see Niv et al. (2015). The models accounted for forgetting by introducing a decay parameter. For each trial in which an item did not belong to a given subcategory, the decay parameter pulled the previously learned estimate of that subcategory toward the noninformative midpoint of the rating scale. Put differently, the decay parameter scales the forgetting according to the number of trials that passed without receiving feedback about the other person's preference for the given subcategory (see Materials and Methods). As described in the previous section on extended model comparison, for both adolescent and adult participants, the winning combination model outperformed the models, including the decay parameter, which rules out that forgetting feedback contributed to the observed group differences in learning rates.
Group differences in learning rates could not be due to the fact that the adolescent profiles or OPs were more rigid overall or in one of the three item categories. Preference profiles and OPs did not differ for either category and across categories between adult and adolescent groups (see Materials and Methods). Furthermore, we did not find evidence that participants who were more stable in their OPs regardless of age group performed better on our task. We did not find significant correlations between learning rates and overall variance (adults: r = −0.086; p = 0.711; adolescents: r = 0.044, p = 0.839) or between learning rates and variance of the three item categories separately in either group (adults: activities: r = −0.204, p = 0.374; fashion: r = 0.106, p = 0.648; food: r = −0.166, p = 0.471; adolescents: activities: r = 0.012, p = 0.955; fashion: r = 0.006, p = 0.978; food: r = 0.018, p = 0.978) and across groups (activities: r = −0.129, p = 0.400; fashion: r = 0.090, p = 0.558; food: r = −0.133, p = 0.386).
Finally, we tested whether the fact that participants saw feedback from other people could have biased their OP ratings, which were elicited after the scanner task. Previous studies showed that feedback from peers, mostly from experts or large groups, can affect one's own judgment (Meshi et al., 2012; Izuma and Adolphs, 2013). To exclude that the feedback about the other persons' preferences (received during the scanner task) influenced participants on OPs for the respective items, we calculated how much the other persons' preferences differed from participants' self-ratings. Crucially, we compared these self-other discrepancies between pictures for which participants had received feedback (114 of 120 total task items; 6 items could not be matched) with additional similar items for which they had not received feedback (114 of 244 remaining unseen items; for a complete list of matched pairs, see Table 3). We found no significant differences in either group (adolescents: t(113) = −0.82, p = 0.415; adults: t(113) = −1.19, p = 0.237), suggesting that participants did not shift their OP ratings of previously seen items to more closely match the preference ratings of the other person.
Pictures of items seen in the preference task and matched unseen picture items
Brain systems for social learning
Our fMRI analyses investigated whether group differences in the extent to which parameters of the winning combination model are encoded in brain activity reflect the observed behavioral differences. We were particularly interested in potential developmental differences, that is, whether model-estimated predictions, PEs, and OPs were encoded to a greater extent in brain activity of adults versus adolescents and vice versa. Table 4 provides a comprehensive overview of performed analyses.
Brain activity correlated with parameters of the winning modela
Paralleling the behavioral differences in updating, our fMRI data demonstrated a regional shift in encoding PEs from adolescence to adulthood. Activity in the fusiform cortex correlated more strongly with estimated PEs in adolescents compared with adults (Table 4; Fig. 6B). In contrast, mPFC activity represented the actual predictions (i.e., estimated preference ratings) more strongly in adults than in adolescents (Table 4; Fig. 6A). Age and social traits modulated the relationship between brain activity and PEs in the adolescent group. Activity in the bilateral fusiform cortex was less related to PEs in mid-adolescents (Table 5; Fig. 6C), and brain activity in both fusiform cortex and mPFC was more closely linked to PEs for adolescents with less social traits (Table 5; Fig. 6D).
Group differences in neural encoding of model-estimated ratings and PE on a trial-by-trial basis. A, Estimated ratings according to the winning model correlated more strongly with brain activity in adults compared with adolescents. B, Estimated PEs correlated more strongly with brain activity in adolescents compared with adults. C, Quadratic relationship between the magnitude to which adolescents encoded PEs in brain activity and age. The age regressor, calculated from days, is a continuous variable with 5 decimals. In the figure axes, age is rounded to years. D, The magnitude to which adolescents' brain activity encoded PEs correlated positively with the SRS total score (higher values indicate less social traits). TEENS, Adolescents; ADU, adults.
Relationship between brain activity encoding model-based variables and social traits in the adolescent groupa
Discussion
We devised a novel, ecologically valid paradigm, in which adults and adolescents rated preferences of persons from their respective age groups. They subsequently received veridical feedback about these persons' actual preferences. Computational model comparison revealed that, of the tested models, the combination strategy best approximated participants' preference inferences across age groups. Participants adjusted ratings based on previous feedback and their OP for the item at hand.
Participants were not encouraged to think about their OPs before or during the task. The fact that they did supports the notion of egocentric processing of social information (i.e., humans typically rely on their OPs and experiences to make inferences about others' mental states) (Gallese and Goldman, 1998; Mitchell, 2009). Contrary to our expectations and previous studies (Lapsley and Murphy, 1985; Frankenberger, 2000), we found that adolescents were not more egocentrically inclined. The extent to which participants relied on their OPs to rate those of others did not differ between the two age groups.
Adolescents, however, were overall less accurate (i.e., they had on average higher absolute PEs) and were more conservative reinforcement learners, as indicated by lower learning rates. Interestingly, Davidow et al. (2016) found that adolescents show more conservative updating behavior (i.e., lower learning rates) in a nonsocial RL task. Because of the probabilistic nature of their task, this led to better learning performance of adolescents compared with adults. In our task, lower learning rates of adolescents were associated with less accurate preference predictions.
Most studies have shown that adolescents' choices are suboptimal in both social and nonsocial domains. In nonsocial reward settings, nonoptimal choices typically arise from greater impulsivity and cognitive flexibility (Galvan et al., 2006; Cohen et al., 2010). This is thought to be caused by an asymmetry in the underlying neurodevelopment: subcortical regions that support reward processes, such as the striatum, mature earlier than regions for cognitive control, in particular the mPFC (Casey et al., 2008; Steinberg, 2008). An emerging literature shows that anatomical and functional development of the mPFC also plays an important role for ongoing social development across adolescence. mPFC development, however, seems to affect social decisions in a different way than nonsocial decisions. In line with our results, studies have found that adolescents are less flexible when switching between their own and another person's perspective (Choudhury et al., 2006) and that social interactions are more effortful, which is why adolescents tend to incur performance deficits compared with adults (Mills et al., 2015).
The U-shaped relationship between age and learning rates on our preference task further suggests that conservative updating may be specific for adolescence. Mid-adolescents showed the most conservative updating behavior. This finding cannot be explained by differences in preference rigidity between the preference profiles or OP ratings of the two age groups. On average, preference distributions for both preference profiles and OPs did not differ between groups. Adolescent-specific performance peaks or valleys have been observed in a number of decision making contexts, ranging from risk taking to reversal learning and feedback processing (van der Schaaf et al., 2011; Casey and Caudle, 2013; Crowley et al., 2014; Jones et al., 2014). These extremes likely reflect nonlinear changes in the neurobiological mechanisms underlying task performance (Giedd et al., 1999; Luciana, 2013; Braams et al., 2015). For instance, longitudinal analyses confirmed that quadratic age patterns for nucleus accumbens activity to rewards, coincided with the same quadratic pattern for risk tasking in adolescents (Braams et al., 2015).
Consistent with the notion that adolescent neurodevelopment supports changes in social learning, we find that the extent to which variables of the winning learning model are encoded in brain activity differs between age groups. The bilateral fusiform cortex, which showed greater PE sensitivity in adolescents compared with adults, has been repeatedly implicated in encoding PEs in nonsocial and social contexts (Garrison et al., 2013; Gu et al., 2016). The observation of larger PE signals in the adolescent group concurs with previous reports that adolescents' performance is more strongly influenced by worse-than-expected feedback compared with better-than-expected feedback (van Duijvenvoorde et al., 2008; Hauser et al., 2015). In our study, negative feedback constitutes feedback that strongly deviates from the initial rating, thereby producing a large PE. That is, participants' goal was to accurately predict the other persons' preferences; therefore, both higher and lower predictions indicated undesired inaccuracy. In line with the literature, we find ongoing nonlinear neural development across adolescence (Jones et al., 2014). Specifically, PE encoding in the fusiform cortex decreased with adolescents' age, whereby mid-adolescence showed less PE encoding than younger and older peers. This nonlinear relationship between neural encoding of PEs and age paralleled the U-shaped relationship between adolescents' learning rates and age. The adolescent-specific findings suggest that adolescence, in particular mid-adolescence, may be a unique developmental period with respect to social cognition.
Our approach of combining computational modeling and neuroimaging sensitively detected nonlinear, adolescent-specific, neurodevelopment. We acknowledge the relatively small sample size of adolescents and adults, who were recruited from the Yale University area. We also acknowledge the possibility of untested models to account for social learning about other persons' preferences. While the sample sizes and the model selection strategies are typical for most current studies, future studies should replicate and further fine-tune the behavioral and neural models of social learning in larger and more diverse samples.
Importantly, adolescents who showed increased neural responses to PEs were reported to have less social traits. This correlation could indicate that adolescents with more social traits do not need to rely on PEs as much as adolescents with less social traits. Alternatively, the correlation between neural encodings of PEs and social traits could suggest that overly strong PE sensitivity may sometimes hinder, rather than support, social learning.
Compared with adolescents, adults' neural responses in the mPFC were more closely related to their inferences about others' preferences. This perigenual part of the mPFC has been repeatedly implicated in ToM and strategic decision making, in particular in keeping track of another player's mental state (Behrens et al., 2009; Rosenblau et al., 2016). The fact that activity in the mPFC, which is one of the latest-maturing brain regions, correlated with adults' ratings of another person's mental state, but did not correlate significantly with ratings of adolescents, suggests that the mPFC is more attuned to higher-level social inferences in adults compared with adolescents.
In conclusion, we identify a computational model that describes ongoing development in updating inferences about another person. In contrast to nonsocial reward settings (Cohen et al., 2010), adolescents were overall more conservative; that is, they made smaller updates based on immediate social feedback. Instead, they averaged feedback over a longer time horizon leading them to assume overly rigid preference structures. This developmental change was accompanied by a shift in encoding predictions and the errors thereof within medial prefrontal structures. The fact that neural encoding of OPs and PEs scaled with parent-reported social traits in everyday settings provides external validity for our task design and the computational model. Future studies could profitably apply a similar approach to investigate social decision making of adolescents suffering from psychiatric conditions.
Footnotes
This work was supported by the Hilibrand Foundation to K.A.P. and G.R., the Carbonell Family to K.A.P., and National Institute of Mental Health Grant R01 MH100028. C.W.K. was supported by the SFB TRR 169. We thank Allison Jack, Daeyeol Lee, and Yael Niv for valuable discussions; Heidi Tsapelas for help with recruitment; Jessica Reed and Abigail Dutton for assistance with data acquisition; Megan Braconnier and Sebiha Abdullahi for help with data analysis.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr. Gabriela Rosenblau, Autism and Neurodevelopmental Disorders Institute, George Washington University and Children's National Health System, 2115 G Street NW, Washington, DC 2005. grosenblau{at}gwu.edu