Abstract
Human language learning differs significantly across individuals in the process and ultimate attainment. Although decades of research exploring the neural substrates of language learning have identified distinct and overlapping neural networks subserving learning of different components, the neural mechanisms that drive the large interindividual differences are still far from being understood. Here we examine to what extent the neural dynamics of multiple brain networks in men and women across sessions of training contribute to explaining individual differences in learning multiple linguistic components (i.e., vocabulary, morphology, and phrase and sentence structures) of an artificial language in a 7 d training and imaging paradigm with functional MRI. With machine-learning and predictive modeling, neural activation patterns across training sessions were highly predictive of individual learning success profiles derived from the four components. We identified four neural learning networks (i.e., the Perisylvian, frontoparietal, salience, and default-mode networks) and examined their dynamic contributions to the learning success prediction. Moreover, the robustness of the predictions systematically changes across networks depending on specific training phases and the learning components. We further demonstrate that a subset of network nodes in the inferior frontal, insular, and frontoparietal regions increasingly represent newly acquired language knowledge, while the multivariate connectivity between these representation regions is enhanced during learning for more successful learners. These findings allow us to understand why learners differ and are the first to attribute not only the degree of success but also patterns of language learning across components, to neural fingerprints summarized from multiple neural network dynamics.
SIGNIFICANCE STATEMENT Individual differences in learning a language are widely observed not only within the same component of language but also across components. This study demonstrates that the dynamics of multiple brain networks across four imaging sessions of a 7 d artificial language training contribute to individual differences in learning-outcome profiles derived from four language components. With machine-learning predictive modeling, we identified four neural learning networks, including the Perisylvian, frontoparietal, salience, and default-mode networks, that contribute to predicting individual learning-outcome profiles and revealed language-component-general and component-specific prediction patterns across training sessions. These findings provide significant insights in understanding training-dependent neural dynamics underlying individual differences in learning success across language components.
Introduction
Learning is a dynamic process entailing the interaction of multiple neural systems (Zatorre, 2013). Although neuroscience research in recent decades has begun to shed light on learning-related neural dynamics across different domains (Costa et al., 2004; Kleim et al., 2004; Peach and Wong, 2004; Yin et al., 2009; Ettlinger et al., 2014; Mohebi et al., 2019), it has yet to characterize the neural dynamics underlying tremendous individual differences in learning.
In recent years, one area of research where individual differences in both brain and behavior have been a focus is language and its various components (P. C. Wong et al., 2007; Price, 2010; F. C. Wong et al., 2011; Morgan-Short et al., 2012, 2015; Yang et al., 2015; Deng et al., 2016; Stamps, 2016; Kepinska et al., 2017a; Birdsong, 2018; Kidd et al., 2018; Feng et al., 2019). However, these previous studies have failed to consider neural dynamics, because the assessment was limited to one or two time points (e.g., a pretraining and-post-training design), and their focus was on one language component. Learning different components of language is associated with partially shared and distinct cognitive and neural processes (Skehan, 2016; Saito, 2017; Tagarelli et al., 2019); therefore, each learner seems to have a unique “learning profile” when considering multiple learning components at the same time. Here we use the term learning profile to refer to each learner's learning attainment pattern across language components. A comprehensive understanding of the neurocognitive architectures or “fingerprints” underlying individual language learning profiles would require assessments of the neural network systems response to training over time and the study of the learning process across components of language (Herholz, 2013; Zatorre, 2013; Herholz et al., 2016). The term fingerprint is used here to specifically highlight the intricate neurocognitive details unique to individual learners.
The main goal of this study is to identify the neurocognitive fingerprints underlying individual learning profiles. We examine the extent to which different neural networks contribute to the individual differences in learning across language components and learning phases. Adult learners learned an artificial language that includes sound-symbol associations (i.e., words), and morphologic, phrasal, and sentence word-order rules (Fig. 1A). BOLD data were collected during the process of learning across 4 d (days 1, 2, 3, and 7) of the 7 d training (Fig. 1B).
Our design enables us to identify neural fingerprints of individual learning profiles. We hypothesize that the neural fingerprints of individual learning profiles entail multiple neural networks that only partly overlap with the classic frontotemporal-hippocampal regions for word meanings and grammatical rules (Musso et al., 2003; Opitz and Friederici, 2003, 2004; Breitenstein et al., 2005; Newman-Norlund et al., 2006; Hauser et al., 2012; Tagarelli et al., 2019). This multiple network hypothesis of individual learning profiles is supported by recent findings showing distributed neural systems engaging in learning and processing linguistic knowledge across language components (Ullman, 2004; Davis and Gaskell, 2009; Chandrasekaran et al., 2014a), which may include the following: (1) the Perisylvian language network (PSN), consisting of core regions in the inferior frontal and posterior temporal areas for lexicon and grammar learning (Tagarelli et al., 2019; Morgan-Short, 2020); (2) the reward-related corticostriatal salience network (SAN), which concerns procedure-based aspects of language acquisition and automatization (Ullman, 2004); (3) the default-mode network (DMN) consisting of the mPFC, posterior cingulate cortex, angular gyrus, and anterior and medial temporal regions (Greicius et al., 2003), which contributes to resource allocation (Spreng, 2012) and memory consolidation (Xue, 2018); and (4) the frontoparietal network (FPN), consisting of the prefrontal and inferior parietal lobule, which is flexibly connected to the above three networks because of its role in executive, working memory, and attention (Braver and Barch, 2006; Cocchi et al., 2013; Cole et al., 2013). We examine whether the dynamics of these candidate networks across learning phases jointly contribute to learning profile, and whether different networks come into play at different time points of learning and contribute differently to the learning profile prediction.
Materials and Methods
Participants
We recruited 33 healthy young adults (14 males; ages: 19-27 years, mean = 22.34 years) to participate in the fMRI training experiment. They were all native speakers of Mandarin. All participants were college students from South China Normal University and had a formal learning experience in English as a second language (years of learning: mean = 15.15, SD = 2.00). None of the subjects had previously studied a Romance language or been immersed in a Romance language environment for more than 3 weeks, given that the artificial language was designed to be similar to Romance languages. One participant began the study but dropped out in the second session. None of the participants reported having a history of hearing or neurologic disorders. All participants signed the consent form approved by the ethics review board of the School of Psychology at the South China Normal University and the Joint Chinese University of Hong Kong–New Territories East Cluster Clinical Research Ethics Committee before participating in the experiment.
Artificial language materials
The artificial language, Brocanto2 (Morgan-Short, 2007), has a productive structure that is consistent with natural language; that is, novel sentences can be generated, spoken, and understood within a meaningful context (Morgan-Short et al., 2012). There are 13 lexical items in Brocanto2: four nouns (pleck, neep, blom, vode), two adjectives (triose/o, neime/o), one article (li/u), four verbs (kiln, nim, yab, praz), and two adverbs (noyka, zayma). All Brocanto2 words were recorded in isolation, rather than as part of a phrase or sentence, and were subsequently concatenated to form phrases or sentences with 300 ms gaps between each word (for sample stimuli, see Table 1 and Fig. 1A). Each Brocanto2 sentence was exemplified by a move on a board game displayed on a computer screen (Fig. 1A, bottom). The four tokens were represented by distinct symbols, which correspond to the four nouns in Brocanto2. Each token was presented within a circle or a square background (describing the adjectives). The tokens can be moved, swapped, captured, and released, with each of these actions corresponding to a verb. The tokens can also move horizontally or vertically, corresponding to the adverbs. Each noun in Brocanto2 has a formal grammatical gender designation, either masculine or feminine. Both adjectives and articles appear post-nominally and are morphologically marked to agree with the grammatical gender of the noun. Brocanto2 uses a fixed subject-object-verb (SOV) word order. Adverbs, when used, appear at the end of the sentence, immediately following the verb.
Examples of the grammatical and ungrammatical Brocanto2 stimuli for the three grammar learning tasks
The grammar training materials consist of 288 Brocanto2 phrases or sentences, with half of the trials (i.e., 144) grammatical and the other half ungrammatical. The 144 grammatical sentences consist of three trial types: 48 noun phrases (NPs), 48 subject-verb (SV), and 48 SOV-adverb sentences. The ungrammatical trials, each derived from a grammatical trial, contain gender agreement or phase order or sentence structure violations (for violation samples, see Table 1). The grammatical violation is always the gender agreement for NPs, and the post-nominal modifiers order for SV sentences, and the word order for SOV sentences. For the grammar errors in NPs, it is either the article or the adjective that does not agree with the gender of the noun. For the SV sentences, the word order within the subject NP was completely scrambled to generate violation variants, while the sentence order SV is never violated. To generate grammar errors in the SOV sentences, the words were scrambled between the verb, object noun, and adverb, while the subject noun always occurred as the first word in the sentences. It is important to point out that, in the learning phase, the auditorily presented sentences (albeit half contained grammatical violations) always matched with the game moves shown on the screen; whereas in the generalization tests, mismatches between the game moves and the meaning of the sentences were used to test subjects' ability to apply the grammar they had learned to novel sentences (for details, see below). The complete artificial language training paradigm consists of vocabulary exposure and recall tests, grammar learning, and two generalization tests (one for testing the three grammatical rules and the other for the semantics of the sentences).
Experimental design and statistical analysis
Overall training procedure
Participants were scheduled to take part in seven experimental language training sessions over 7 consecutive days (for the training and imaging schema, see Fig. 1B). During Session 1, participants were first provided informed consent, filled out background questionnaires, and completed a cognitive battery, including tests of IQ, working memory, declarative, and procedural learning abilities. After completing the cognitive test battery, participants commenced the first session of the artificial language training. Each language training session consisted of a passive vocabulary exposure session, three grammar learning sessions, and a vocabulary test. On the last training day (i.e., Session 7), participants completed generalization tests outside the scanner after completing the artificial language training. Each element of the training protocol is described in detail below.
Vocabulary and grammar training procedures
At the beginning of each language training session (i.e., day), participants first completed a vocabulary exposure session (i.e., the Vb learning task). Each Brocanto2 lexical item was presented auditorily, accompanied by a visual symbol that represented its meaning (Fig. 1B). Each item was randomly presented 4 times during the exposure phase, lasting for 7 min. A vocabulary test was administered after the completion of language training outside the scanner in each session. Participants were asked to state out loud the lexical item that corresponded to the visual symbol shown on the computer screen. Each symbol representing a lexical item was presented twice.
Following vocabulary exposure, participants began the grammar training tasks (Fig. 1B, bottom right). The grammar training was administered in the format of a grammatical judgment task (GJT) with trial-by-trial corrective feedback. Learning was assumed to occur implicitly, and no explicit grammar rules or explanations regarding any aspect of the language were provided. Instead, participants were exposed to auditorily presented phrases and sentences of the language. As each phrase or sentence was presented, participants also viewed the corresponding game token or move. They were then asked to judge the acceptability of the phrases or sentences (i.e., GTJ). Participants were instructed to make a judgment based on their immediate intuitive impression (i.e., guessing based on “gut feeling”).
For each grammar learning task (e.g., SOV), the trials were divided into two experimental blocks (i.e., scanning runs), with grammatical and their matched ungrammatical phrases or sentences occurring in different blocks. Thus, the grammar training phase contained six experimental blocks, with two blocks of NPs, two blocks of SV sentences, and two blocks of SOV sentences for each day. The participants always started with the NP blocks, followed by the SV sentence blocks, and lastly the SOV sentence blocks. Each block consisted of 48 phrases or sentences, of which half (i.e., 24) were grammatical and the other half ungrammatical. The presentation order of the stimuli within each block was randomized across participants. For each trial in the GJT, a fixation cross was first presented for 100 ms, followed by a 100 ms blank. Sentences or phrases were then presented auditorily, and the corresponding game tokens or moves were shown on the screen simultaneously. The stimulus presentation time was different for each trial type: NP for 2400 ms, SV sentence for 6300 ms, and SOV sentence for 7600 ms. A prompt asking for a grammaticality judgment (“?”) appeared for 2000 ms after the final word of each sentence, during which participants responded. After a response was made, participants received corrective feedback on whether their response was correct or incorrect. To better estimate the brain responses related to stimulus and feedback presentation separately, a random jittered interval (0-4000 ms) was added between each response and feedback presentation. The feedback was shown on the screen for 1000 ms. After the feedback screen, a blank screen that jittered variously from 2000 to 6000 ms (i.e., intertrial interval) was shown before participants moved on to the next trial.
Generalization tests
In the last training day (i.e., Session 7), participants were tested on their ability to apply the grammar they had learned to novel Brocanto2 sentences in two generalization tests: a GJT and a picture-sentence mapping task. For the GJT, participants judged the acceptability of 144 unseen Brocanto2 phrases or sentences (i.e., stimuli not used in the training phase), consisting of 72 grammatical and 72 ungrammatical trials. As in the training phase, the ungrammatical phrases or sentences were derived from their corresponding grammatical phrases or sentences, containing violations of gender agreement in NP, and word order in SV and SVO sentences. Each ungrammatical sentence only contained one type of grammatical violation. In this test, the game tokens and moves always matched with the meaning of the sentences; therefore, participants only need to detect grammatical abnormities. The procedure of the test was identical to that of the GJT as used in the fMRI training phase. For the picture-sentence matching task, participants had to judge whether the Brocanto2 sentences that they heard correctly described the game tokens and moves displayed on the screen. Subjects were instructed to focus on the correspondence between sentences and game tokens/moves. The test comprised 144 novel Brocanto2 sentences (used neither in the training phase nor in the generalization GJT), of which 72 sentences correctly described the game moves while the other half incorrectly described the game moves. The 72 incorrect trials were each derived from a correct trial, and can be classified into three types: (1) the sentences contained a grammatical violation, but the game moves matched the meaning of the sentences; (2) sentences were grammatically correct, but the game moves did not match the meaning of the sentences (e.g., the word vode presented but the visual symbol pleck was shown on the screen); and (3) the sentences were ungrammatical, and a mismatch between game moves and sentence meanings. The procedure was identical to that of GJT. All the experimental learning tasks and generalization tests were created and presented with E-Prime 2.0 (Psychology Software Tools).
Cognitive assessments
Before the language training, we administered a battery of standardized cognitive tests, including an IQ assessment (Test of nonverbal intelligence, fourth edition), working memory tests (Digit Span Backward and Reading Span tests), and declarative and procedure learning ability tests, including the vocabulary learning subtest in the LLAMA Language Aptitude Test, the Continuous Visual Memory Task (CVMT), and the Weather Prediction Task (WPT). A description of each of the five tests is provided below.
Digit Span Backward
This test assesses verbal working memory (Wechsler, 1997) and involves the processing and storage of digits. Participants were asked to repeat the digits that they heard in reverse order. The test consisted of seven blocks (two trials per block) with increasing digits (from 2-8 digits). If a participant could correctly repeat one trial or both trials for a block, she/he could move on to the next block. If a participant failed both trials of a block, the test was discontinued. An individual's digit span was defined as the longest series of digits that she/he could repeat (even only once).
Reading Span
This test measures working memory capacity and involves the processing and storage of sentences and words. A Chinese version of the auditory reading span task was used, which was modeled after the original version (Daneman and Carpenter, 1980). In this task, participants had to judge whether the sentence statement was correct and to remember the last word of each sentence for later recall. When a sentence was presented on the screen, participants judged the sentence by pressing the “correct” or “wrong” key on the keyboard. Once a judgment was made, the next sentence was presented. After all sentences in a trial had been presented, the participant had to recall the last word of each sentence in order. Participants' responses were recorded by the experimenter. There were three trials in each block and five blocks in total. The number of sentences for one trial increased from two to three, and so on to six, across the five blocks. The test was discontinued if a participant failed to recall the last words for two trials within a block. A participant's reading span was defined as the largest number of last words that she/he could recall for a single trial.
LLAMA Language Aptitude Test
The LLAMA test (Meara, 2005) was developed based on the standardized Modern Language Aptitude Test (MLAT) (Carroll and Sapon, 1959), which incorporates four separate elements: vocabulary learning, phonetic memory, sound-symbol correspondence, and grammatical inferencing. The subtest of vocabulary learning was used in this study as a measure of verbal declarative learning ability. Subjects were asked to learn 20 word-object association pairs that consisted of pseudo-Kurdish words and their corresponding objects within 2 min in a computer interface. All 20 objects were displayed simultaneously on the screen and an object's name was shown by clicking on the object. Subjects were permitted to click on the objects as many times as they wished within the 2-minute learning phase. After the learning phase, each of the 20 objects' names was presented in turn, and for each name, the subject had to click on the corresponding object on the screen. Five points were scored for each object correctly identified, with a maximum score of 100.
CVMT
The CVMT was used to measure nonverbal declarative memory (Trahan and Larrabee, 1988). Participants viewed a series of complex, abstract designs and indicated whether each design was novel (“new”) or had appeared previously (“old”). The “old” items consisted of seven target designs, presented 7 times interspersed among 63 “new” distractor items. All items were presented in a randomized order, which was constant for all participants. Participants' responses were used to calculate a CVMT d′ score, with a higher score indicating better declarative learning ability.
WPT
The WPT is an implicit, probability-based task where participants predict the weather (“sunshine” or “rain”) based on the patterns of four different “tarot cards” presented on a computer screen (Foerde et al., 2006). Each combination of cards represents a different probability for “sunshine” or “rain.” For example, a screen showing a card of squares, a card of circles, and a card of pentagons represents a 75% chance of rain. A total of 320 trials were divided into eight blocks. Neither the sunshine nor rain stimulus occurred more than 4 times in a row. After a response was given, the correct response was shown on the screen. The weather prediction accuracy on the final block was used for analyses.
Learning progress estimation
The learning progress of each subject in each learning task was estimated by three metrics: (1) initial learning performance (IP), (2) learning rate (LR), and (3) learning outcome (LO). IP is defined as the learning performances (i.e., the word recall test accuracies for Vb and GJT accuracies for the three grammar learning tasks) on the first day of training. IP reflects the initial gain of training. To estimate learning gains between days 2 and 7, we calculated LR for each subject by fitting each learning curve with a quadratic Equation 1.
Three parameters were estimated where a determines the curvature of the parabola, b indicates the slope, while c is the y intercept. Both parameters a and b relate to the learning trajectory pattern; therefore, we combined the two parameters with a formula to determine LR: LR = |a| × (1 + |b|). Thus, learners with higher LRs indicate a faster and more increment in learning accuracy between days 2 and 7 of the training. Finally, LO is defined as the generalization test performances for the three grammatical rules (i.e., gender agreement in NPs, post-nominal modifiers order in SV sentences, and SOV order in SOV sentences) and the day 7 vocabulary test scores for the Vb learning task. IP, LR, and LO are complementary measures in describing the pattern of learning trajectories.
Imaging acquisition
MRI data were acquired using a Siemens 3T Tim Trio MRI system with a 12-channel head coil in the Brain Imaging Center at South China Normal University. The functional images were recorded by a T2*-weighted gradient EPI pulse sequence (TR = 2000 ms, TE = 30 ms, flip angle = 90°, 37 slices, FOV = 224 × 224 mm2, in-plane resolution = 3.5 × 3.5 mm2, slice thickness = 3.5 mm with 0.7 mm gap, acceleration factor = 2). T1-weighted high-resolution structural images were acquired using an MPRAGE sequence (176 slices, TR = 1900 ms, TE = 2.53 ms, flip angle = 9°, voxel size = 1 × 1 × 1 mm3). Imaging data were collected during the language training on days 1, 2, 3, and 7 for each participant (for the imaging schema, see Fig. 1B). Resting-state fMRI and diffusion tensor imaging were also collected for every imaging session.
Univariate activation analysis
All functional images were preprocessed using SPM12 (Wellcome Department of Imaging Neuroscience; www.fil.ion.ucl.ac.uk/spm/) following a pipeline described in previous studies (Feng et al., 2021b, 2015). The functional images were corrected for head movement. The high-resolution anatomic images were registered to the mean functional image and further normalized into the MNI space using the segmentation-normalization procedure. The realigned functional images were spatially smoothed using a Gaussian kernel (FWHM = 4 mm) and were entered into a GLM for univariate activation analysis. Specifically, for the subject-level analysis, a GLM with a design matrix including two regressors of interest (i.e., stimulus and feedback presentations) was constructed for each grammar learning task and imaging session. For the Vb learning task, only the stimulus regressor was included because no feedback was provided. The regressors corresponding to the onsets and durations of the trials from each task were convolved with the canonical HRF. Low-frequency drifts were removed by a temporal high-pass filter (cutoff at 128 s). The six head-movement parameters and the session mean were added into the models as nuisance regressors. The gray-matter image generated from the segmentation procedure was converted into a binary inclusive mask to define voxels of interest for each participant. For the group-level analysis, the one-sample t test was used, each statistical brain map was initially thresholded at voxel-wise p = 0.001, and all reported brain regions were corrected at the cluster-level p = 0.05 using the family-wise error rate approach as implemented in the SPM package.
Predictive modeling analysis
To determine whether the brain activations related to stimulus presentation during learning are significantly predictive of LO profiles, we used the multioutput Least Squares Support Vector Regression (LS-SVR) as the prediction algorithm and 10-fold cross-validation (CV) procedure to train and validate prediction models. The brain activations (i.e., t statistics of stimulus vs baseline) obtained from each learning task and imaging session were used as predictive features for training and testing prediction models. Brain activation maps from all subjects were combined into an S × F matrix where S is the number of subjects and F is the number of features (i.e., collapsing voxels across tasks and training sessions). Each value in the matrix represents the level of activation of a voxel in a specific learning task and training session. We used a nested 10-fold CV procedure for feature selection and fusion, model construction, and validation (for a graphical illustration of the procedure, see Fig. 2). This CV procedure avoids obtaining overfit models with many noisy features and ensures that the trained models can be tested with unseen data (i.e., model generalization ability) (Feng et al., 2018b). The nested CV procedure consisted of two levels of nesting (i.e., inner and outer) for feature selection, fusion, and model validation. At the inner level, we used the Pearson's correlation analysis and a feature selection procedure to remove irrelevant (uninformative) features based on each training set (i.e., 90% of the subjects) with a cutoff threshold of p = 0.01, where features showing significant correlations with LOs in at least one grammar task were selected. Therefore, the predictive powers of the models reflect how well those selected voxels performed in predicting learners' LO profiles. Different feature selection thresholds (i.e., p = 0.01 and 10% of total features) were tested to assess the consistency and stability of the models' predictive powers. To reduce the dimensionality of the data and fuse the survived features across the four tasks and sessions, we conducted the principal component analysis with the selected features and further selected the outcome-correlated principal components (p < 0.05) for model training. It is important to note that the feature selection and fusion procedures were conducted only with the training sets, which were independent of the outer-level model testing. In other words, 90% (i.e., ninefold) of the data were used for model training while the hold-out 10% were for testing, repeating 10 times (i.e., 10-fold CV). This CV procedure ensures accurate estimation of the model prediction. The LS-SVR algorithm with default parameters (i.e., C = 1, γ = 1/number of features) was used to access the multivariate predictive power of those neural features. We used functions from a MATLAB package LIBSVM (Chang and Lin, 2011) and in combination with in-house scripts to conduct the predictive modeling analyses. We examined the predictive power by calculating Pearson's correlation between the predicted and observed scores (i.e., r[observed, predicted]). The statistical significance of the predictions was evaluated using a nonparametric permutation procedure.
To test whether the predictive power of each model occurred by chance, we used a nonparametric permutation procedure to generate a null distribution of the predictive scores by fully shuffling the features (i.e., predictors) and LOs across learners for each CV. Each feature and LO were permuted across participants independently to generate a fully randomized data matrix. The 10-fold CV was conducted based on the randomized dataset. The data randomization and CV procedures were repeated 10,000 times, and the 95th percentile points of each distribution were used as a critical value for a one-tailed nonparametric test against the null hypothesis with p = 0.05. To further test the stability of the prediction, we used a bootstrapping procedure by dividing all the learners into 10 folds randomly and repeating the 10-fold CV procedure with 10,000 iterations. Each CV prediction would be slightly different because the composition of the training and testing subjects was different for each iteration.
We conducted a power analysis to estimate the sample size required for the predictive modeling. Our power analysis was conducted based on effect sizes reported by previous studies that examined the correlation between individual differences in language learning and neural responses. Eight studies that reported Pearson's correlation coefficients (r values) between learning and neural activation (task- and/or resting-state fMRI responses) were included (Musso et al., 2003; Finn et al., 2013; Morgan-Short et al., 2015; Deng et al., 2016; Weber et al., 2016; Barbeau et al., 2017; Kepinska et al., 2017b; Nevat et al., 2017). These r values ranged from 0.40 to 0.66 (mean = 0.54, SD = 0.08), with sample sizes between 8 and 40 (mean = 18.89, SD = 9). The mean r value was used in our power calculation, which resulted in a sample size of 28 at an α level of 0.05. Because there is no standard way of conducting power analysis for constructing multivariate predictive models as we conducted here and correlational approaches often overestimate the true effect size of prediction, we conservatively assumed a power of 95% and arbitrarily increased our sample by 15% in our final estimation.
Multivoxel pattern classification (MVPC) analysis for grammaticality decoding
We used MVPC to examine the extent to which the neural representations of the newly learned grammar rules emerged during training. Both searchlight- and ROI-based MVPC were conducted. To perform MVPC, we first estimated the single-trial brain activities with the least-squares-single (LSS) GLM approach (Mumford et al., 2012). The LSS approach was designed to model single-trial brain activities for each target event while controlling for the variance of other covariant events in the same block (i.e., fMRI run). Specifically, for each trial, a design matrix was constructed with a regressor of interest targeting the stimulus presentation. A regressor of noninterest consisted of other events (including feedback of the target trial and both feedback and stimulus presentation of other trials), six head-movement regressors, and a session mean regressor were included for each block individually. The LSS GLM analyses were performed on the functional images following realignment but without normalization or smoothing. Therefore, 288 subject-level GLM models (i.e., 96 trials per grammar task) were constructed and estimated for each subject and imaging session. This was a computationally intensive analysis and required a week to accomplish with a 16-core Intel Xeon processor for all participants. The t statistic brain images were calculated by contrasting the target regressor with baseline and further used for MVPC. The t statistic was used because it combines the effect size weighted by error variance and is therefore less affected by highly variable single-trial estimates than β estimation (Misaki et al., 2010).
The searchlight algorithm (Kriegeskorte et al., 2006), implemented in the CoSMoMVPA toolbox (Oosterhof et al., 2016), was used to identify brain areas whose local activation patterns can be used to classify trials based on their grammaticality. At each voxel, stimulus-induced activation patterns (t values) within a spherical searchlight (3-voxel-radius sphere, which contains ∼90 voxels on average) were extracted for all items for NP, SV, and SOV blocks separately. Different spherical sizes (e.g., 4-voxel-radius sphere) were also used to ensure that the classification accuracies did not significantly differ according to the size chosen. Therefore, in each spherical searchlight, a V × I × B matrix was generated, where V refers to the number of voxels, I refers to the number of stimulus items, and B refers to the number of blocks (e.g., 90 × 48 × 2). Finally, this matrix was entered into a linear support vector machine (SVM) classifier implemented in the LIBSVM toolbox (Chang and Lin, 2011) for model training and testing with the leave-one-block-out CV procedure, in which the SVM classifier was trained with data from one block, and the trained classifier was tested with data from another block, repeated twice. For the searchlight-based MVPC, the mean classification accuracy was calculated and mapped back to the voxel at the center of each sphere. We conducted the same MVPC procedure across all voxels of interest in the brain and generated classification accuracy maps for each imaging session and subject. For the group-level analysis, the classification accuracy maps were first normalized to the MNI space using the parameters estimated from the segmentation-normalization procedure, and then fed into a one-sample t test (against chance accuracy). In addition to the searchlight MVPC, we conducted ROI-based MVPC analysis by calculating the mean MVPC accuracy across voxels within each ROI for each grammar task, training session, and subject. The linear mixed-effects regression analysis was used to evaluate four fixed effects (i.e., main effects of learning task, training day, outcome, and day × outcome interaction).
Interregional representational similarity (RS) analysis and LO prediction
To further examine the interregional interactions between each pair of predictive regions and to which extent such interactions contribute to individual differences in learning success, we calculated the interregional RSs and used them as predictive features to predict individual LOs. RS is an index reflecting the similarity between two regions according to their neural representational dissimilarity matrices (nRDMs). Higher similarity in nRDM indicates greater similarity in terms of multivoxel representation of the training items, which also reflects the degree of information sharing between regions. To calculate RS, we first computed the nRDM for each target region based on their multivoxel activation patterns using a 1 minus Pearson's correlation approach. Therefore, nRDMs (i.e., 96 × 96 matrices; 96 trials per training session for each grammar task) of the ROIs were generated for each imaging session, grammar learning task, and participant. To control for the effects of the trial duration, the number of words, presentation block, grammaticality, and type of grammar violation derived from the stimulus sets, we used the partial Pearson's correlation to calculate the interregional RSs based on each pair of nRDMs while controlling for the variances of those factors. These interregional RSs were then entered into the 10-fold cross-task CV predictive modeling to access the overall predictive powers of the RSs and identify the significant predictive edges between ROIs.
Results
Behavioral learning performances
Four language components of Brocanto2 were learned by a group of adult participants across 7 d (for the sample stimuli and experimental procedure, see Materials and Methods; Fig. 1). Learning performances of the four learning tasks were accessed by the offline vocabulary recall test (for the vocabulary [Vb] learning task) and the online GJTs (for the NP, SV, and SOV grammatical rule learning tasks), respectively. Behavioral performances of the vocabulary recall test and GJT significantly increased over training sessions (for individual learning curves, see Fig. 3A). For the vocabulary test score, a significant main effect of training day (F(1,191) = 366, p < 2.20 × 10−16, ηp2 = 0.66, 95% CI = [0.58, 0.71]) was revealed using the linear mixed-effects regression (LMER). For the GJT accuracy (ACC) and reaction time (RT), we constructed LMER models with training days (1-7) and grammar learning tasks (i.e., NP, SV, and SOV tasks) as two fixed factors, and subject as a random factor. For ACC, we observed a significant main effect of training day (F(1,191) = 328.21, p < 2.2 × 10−16, ηp2 = 0.63, 95% CI = [0.55, 0.69]) and a significant main effect of learning task (F(2,267) = 11.30, p = 1.95 × 10−5, ηp2 = 0.08, 95% CI = [0.02, 0.14]). No significant day × task interaction effect was found (F(2,382) = 1.29, p = 0.28, ηp2 = 0.007, 95% CI = [0.00, 0.03]). Planned comparisons were conducted between different grammar tasks while the p values were corrected with the Bonferroni correction approach. We found that performance in the SOV task was significantly better than that of NP (t(62) = 6.59, p < 0.0001, Cohen's d = 1.07) and SV (t(62) = 6.05, p < 0.0001, d = 0.98), whereas the NP and SV did not significantly differ (t(62) = 0.55, p = 0.85, d = 0.08) in overall GJT scores. For RT, learners responded increasingly faster over training sessions (main effect of training day: F(1,191) = 75.75, p = 1.49 × 10−15, ηp2 = 0.28, 95% CI = [0.18, 0.38]). We also found a significant main effect of task (F(2,297) = 35.81, p = 1.17 × 10−14, ηp2 = 0.19, 95% CI = [0.12, 0.27]). No significant day × task interaction effect (F(2,382) = 1.44, p = 0.24, ηp2 = 0.008, 95% CI = [0.00, 0.03]) was observed. Planned comparisons showed that RTs in NP task were significantly longer than those of the SV (t(62) = 6.10, p < 0.0001, d = 0.91) and SOV (t(62) = 11.53, p < 0.0001, d = 1.72) tasks, whereas RTs in SV were significantly longer than SOV (t(62) = 5.43, p < 0.0001, d = 0.81). These results indicate that the group-level learning performances differed between learning tasks.
Artificial language training procedure, experimental design, and fMRI schema. A, Language elements, sample stimuli, and grammar rules. Bottom, Sample game token moves exemplifying the meaning of a Brocanto2 sentence. In this example, the spoken sentence was presented together with the visual symbol moves, meaning “The square pleck is captured vertically by the round vode.” B, The 7 d training and imaging schema and the fMRI experiment procedure for vocabulary and grammar learning tasks. The vocabulary stimuli were passively exposed to the learners (BI). A feedback-based training paradigm was used for the three grammar learning tasks (BII). The same fMRI experiment procedure was applied to the four imaging sessions.
Predictive modeling for LO profile prediction with a CV procedure. The 10-fold CV procedure was used to construct and validate prediction models. Initial feature selection was applied with Pearson's correlation to remove noninformative neural features for each imaging session and task, respectively. All the selected features from different days and tasks in the training set (90%) were fused using principal component analysis (PCA) to reduce data dimension. Multioutput linear SVR was used to model the relationship between the PCs and LO profiles in the training set. The held-out 10% of the unseen testing dataset was used to evaluate the predictive performance. This process was repeated 10 times so that each testing fold was used to evaluate the trained models. A final predictive score was calculated using Pearson's correlation between the predicted and observed outcomes. This 10-fold CV procedure was repeated 10,000 times with a bootstrapping approach to evaluate the stability of the predictive power. Permutation procedure was applied to generate null (chance) distribution to access the statistical significance of the predictions.
Behavioral learning performance and individual variability for each learning task. A, Day-by-day behavioral learning performances for the Vb, NP, SV, and SOV tasks. Individual learning curves were shown for each task over training days. G, Generalization score. A mean learning curve was shown in red for each task. B, The intersubject variability in LOs for each task. The SD of the outcomes was calculated for each learning task separately based on the bootstrapping approach (randomly selected 50% of the subjects) with 10,000 iterations. Error bar indicates SD. **p < 0.001, nonparametric test.
LOs were defined as the word recall accuracy on day 7 for the Vb and generalization GJT scores for the three grammar learning tasks, respectively. LOs were close to the ceiling for Vb for most of the learners. For the grammar LOs, a significant main effect of learning task was found (F(2,62) = 21.19, p = 9.70 × 10−8, ηp2 = 0.41, 95% CI = [0.21, 0.55]). Planned comparisons between tasks showed that LOs in NP (mean ACC = 65.1%) were significantly worse than SV (mean ACC = 77.3%; t(62) = 5.49, p < 0.0001, d = 1.37) and SOV learning tasks (mean ACC = 77.9%; t(62) = 5.78, p < 0.0001, d = 1.45). LOs in the SOV task were not significantly better than that in the SV task (t(62) = 0.29, p = 0.95, d = 0.07). Moreover, significantly larger interindividual variabilities in LO were observed for the grammar learning tasks than that of the Vb task (p values < 0.001, bootstrapping nonparametric test; see Fig. 3B).
Individual learning profiles and day-by-day learning-related brain activations
We characterized individual learning profiles by simultaneously considering the LOs across the four language components (Fig. 4A). There were large interindividual differences in the LO profile (Fig. 4A, right; for the learning profile of each learner, see also Extended Data Fig. 4-1). To examine how individual differences in learning trajectory were associated with individual differences in LO, we estimated two indices reflecting learning trajectory (IP and LR), which summarized test scores across 7 training days. IP and LR are two complementary measures in describing individual learning trajectories. IP represents the learning gains on the first day of training, whereas LR was estimated by fitting a quadratic equation to each learning curve (Fig. 4B), reflecting learning gains between days 2 and 7. This quadratic learning-curve modeling outperformed simple linear regression and power function in goodness of fit across four tasks (p values < 0.01). LRs in the Vb task were significantly higher than the grammar learning tasks (main effect of task: F(3,93) = 33.70, p = 7.84 × 10−15, ηp2 = 0.52, 95% CI = [0.38, 0.62]). LRs of the three grammar tasks did not significantly differ from each other (Bonferroni-corrected p values > 0.1).
Individual learning profiles, learning trajectory modeling, and brain activations across training days and learning tasks. A, Spider plots reveal individual learning profiles derived from the outcomes of the four learning tasks. Left panels, Three representative learners' profiles. Black represents an overall successful learner. Blue represents a learner with good vocabulary but poor grammar acquisition. Red represents a learner with good word-order grammar but poor vocabulary and gender-agreement acquisition. Right panels, LO profiles for all learners (for each learner's profile, see Extended Data Figure 4-1). B, LR modeling and IP. LRs were estimated using the quadratic curve fitting for individual learning curves across 7 training days. Higher LRs reflect faster and more gains in performance between days 2 and 7, whereas higher IPs reflect more gains in performance on day 1. C, Pearson's correlations between individual differences in learning trajectory measures (i.e., IP and LR) and LOs within and across learning tasks. *Uncorrected p < 0.01. D, Univariate brain activations during the stimulus presentation for each task across four imaging sections. Unthresholded whole-brain maps are displayed for visualization purposes. Warm red represents more activations during stimulus presentation than baseline. Cool blue represents the reverse contrast.
Figure 4-1
The LO profile for each of the learners. Each dimension of the spider plot represents the LOs of a task based on rank. The LOs were converted from raw accuracy to rank to ensure they are comparable across learning tasks (i.e., Vb, NP, SV, and SOV). Download Figure 4-1, TIF file.
The measures of learning trajectory (i.e., IP and LR) were closely related to LOs both within and across learning tasks (Fig. 4C). Higher IPs in the Vb task were moderately associated with better LOs across grammar learning tasks, indicating that better initial vocabulary gains may facilitate grammar acquisition. However, higher LRs in the Vb task were associated with poorer LOs in the grammar learning tasks, especially for the NP task. Within and across the three grammar tasks, higher LRs were associated with better LOs, suggesting that faster learners during training were often more successful learners in learning grammatical rules. To further demonstrate the relationships between the learning trajectory patterns summarized from the four tasks and LO profiles, we calculated the dissimilarities (i.e., Euclid distances) between each pair of learners (i.e., interindividual differences), based on each learner's learning scores (i.e., IP, LR, and outcome, respectively) in the four learning tasks. We then generated a 32 × 32 interlearner dissimilarity matrix for each of the learning indices. We found that the interlearner differences in LO profile were significantly associated with interlearner differences in IP (r = 0.189, p = 2.36 × 10−5) and LR (r = 0.321, p = 2.29 × 10−13). These behavioral correlation results demonstrated a close relationship between learning trajectory patterns and LO profiles.
Brain activation patterns during online learning of the four components across four imaging sessions of 7 training days are illustrated in Figure 4D. Distributed frontal, parietal, temporal, and visual occipital regions that related to language, auditory, and visual processes were activated similarly across learning tasks and imaging sessions. By visual inspection, more activations of these regions were observed for the three grammar learning tasks than for the Vb learning task. The activation patterns were also consistent across imaging sections, with a minor decrease in activation following training. In addition, these stimulus-related activations were distinct from the feedback-processing-related activations for the three grammar learning tasks (for the feedback-related activation maps, see Fig. 5).
Feedback-processing-related activation maps in each grammar task and training day. Online feedback was provided for the grammar learning tasks only. The feedback-related activation patterns were distinct from the language-stimulus-related activations shown in Figure 4D. Unthresholded whole-brain maps are displayed for visualization purposes.
Neural dynamics of multiple networks predict individual learning profiles
To identify the neurocognitive fingerprints underlying individual learning profiles, we used the 10-fold CV procedure with the multioutput LS-SVR algorithm (for the predictive modeling procedure, see Fig. 2) to construct and evaluate prediction models with cognitive and neural measures (i.e., predictive features). The multioutput SVR can be used to model the relationships between multivariate predictive features (e.g., activation patterns across days and tasks) and multiple LOs derived from our four learning tasks simultaneously.
We used the non-neural (i.e., cognitive and memory measures) and neural measures (i.e., activation patterns) to build prediction models separately to predict LO profiles. We then evaluated the model performances and identified the significant neural predictive features (i.e., voxels in a specific training day) contributing to the model prediction. Bootstrapping and permutation procedures were used to determine the stability and statistical significance of the models (see Materials and Methods). We found that neural activation patterns derived from the four learning tasks and across the four imaging sessions were highly predictive of LO profiles overall (Fig. 6A, red distribution: median r[observed, predicted] = 0.352, p = 0.0001, permutation test). By contrast, neither cognitive nor memory measures were predictive of learning profiles (Fig. 6A, gray distributions: p values > 0.1, permutation test). The prediction model with neural measures outperformed models with cognitive and memory measures in profile predictions (p values < 0.001). For LO prediction of each learning task, we found that the neural activation patterns were significantly predictive of LOs in NP (median r[observed, predicted] = 0.444, p = 0.003; permutation test), SV (median r[observed, predicted] = 0.611, p = 0.0001), and SOV (median r[observed, predicted] = 0.563, p = 0.0002) task but not in the Vb task (median r[observed, predicted] = −0.209, p = 0.846) (Fig. 6B). The nonsignificant prediction for Vb was mainly because of the ceiling outcomes and small interindividual variability (see Fig. 3). Moreover, to further identify the predictive contribution of the neural patterns in each imaging session for each task, we conducted outcome predictive modeling with neural activation patterns derived from each imaging session and learning task separately. Activation patterns derived from different imaging sessions showed varying degrees of predictive powers across the four learning tasks (Fig. 6C). No significant outcome prediction was found across imaging sessions for the Vb task. For the NP task, the neural patterns were significantly predictive of outcomes only on the second day of training, whereas significant predictions were found on days 1, 2, and 7 for the SV task and on days 1, 3, and 7 for the SOV task. These findings not only demonstrate the general LO profile prediction but also reveal the fluctuations in outcome prediction across learning tasks and training phases.
Learning profile prediction performance and outcome prediction for each learning task and training day. A, Learning profile predictions with neural and non-neural cognitive and memory measures using the 10-fold CV with bootstrapping and permutation procedures (for the predictive modeling procedure, see Fig. 2). Neural measures were significantly better than chance and non-neural measures in predicting learning profiles. Perm, Permutation-based chance prediction distribution. B, LO prediction performances were estimated for each learning task separately and compared with their corresponding permutation-based chance distributions to derive statistical significance. C, Outcome prediction performances for each learning task and imaging session. Error bar indicates SD. *p < 0.05; **p < 0.01; nonparametric permutation test. Bootstrapping, Bootstrapping-based prediction distributions; Perm, permutation-based chance prediction distributions.
To further determine which brain regions in which training days significantly contributed to the LO profile prediction, we estimated the prediction contribution for each voxel in each imaging session by calculating the statistical significance of the voxel-selection rate (by further comparing to the selection rate distributions derived from the permutation-based prediction) using the nonparametric permutation test procedure. Across 4 d of imaging sessions, we identified distributed brain regions that significantly contributed to learning-profile predictions, including the lateral prefrontal, middle temporal, inferior parietal, insular, and subcortical striatum regions (Fig. 7A). These regions can be classified into two broad categories based on the relationships between their responses and outcomes: positive and negative predictive regions (Fig. 7A, left and right). More activations in those positive predictive regions were associated with better outcomes (Fig. 7A, left), whereas less activation or more deactivation in the negative predictive regions was associated with better outcomes (Fig. 7A, right). We further categorized these regions into four brain networks based on previous neuroimaging studies on language learning and resting-state connectivity studies (Yeo et al., 2011; Kepinska et al., 2017a) as well as their correlation patterns as follows: (1) PSN, including the bilateral inferior frontal gyrus (IFG) and left middle temporal gyrus (LMTG); (2) FPN, including the bilateral middle frontal gyrus (MFG), bilateral inferior parietal lobule (IPL), and bilateral inferior temporal cortices (ITC); (3) SAN, including the bilateral insula (Ins), left middle cingulate cortices (MCC), bilateral ventral striatum (Vstr; including bilateral caudate nuclear and left putamen), thalamus (Tha), supplementary motor areas (SMA), and cerebellum; and (4) DMN, including the posterior cingulate cortex (PCC), mPFC, left angular gyrus (LAG), left anterior temporal gyrus (LATL), and left hippocampus (LHip).
Brain networks highly contributed to the learning profile prediction. A, Conjunctive brain maps, which summarized all regions that significantly contributed to the profile prediction across imaging sessions. These regions were identified by the voxel-wise permutation test with a threshold of p = 0.001. They were categorized into four networks based on their response patterns and previous literature. B, Prediction contribution maps for each imaging session. Response patterns of these brain networks systematically contributed to the profile prediction differently across training days. Pos, Positive predictive features; Neg, negative predictive features; R, right hemisphere. Voxel-level p = 0.001, cluster-level FWE-corrected p < 0.05.
The four predictive brain networks showed distinct prediction contributions across training sessions. Voxel-wise predictive contributions were projected back separately to each training day (Fig. 7B). The contributions of the PSN regions were prominent at the early stage of training (i.e., day 1 and 2), whereas those of the SAN regions were more salient around the halfway point (i.e., day 2 and 3). The contributions of the FPN regions sustained across the training sessions but were slightly more prominent in the first 3 d, whereas those of the DMN were most prominent at the end of the training.
These dynamic patterns in predictive contributions were further validated by the grammar-general outcome prediction procedure (for a graphical schema, see Fig. 8A). To further quantify the time-dependent contributions of those networks in outcome prediction across the grammar learning tasks, we trained predictive models with data of two grammar tasks (e.g., NP and SV) with 90% of the learners and validated the trained model with the held-out task (e.g., SOV) and the unseen 10% learners. This grammar-task-general 10-fold CV procedure enabled us to test the generalization ability of the predictive models across learners and grammar learning tasks. We found that different brain networks showed distinct but systematic time-dependent predictive contributions (Fig. 8B): contributions of PSN decreased over training sessions with most prominent effects observed at the early stage of training (i.e., the first 3 d); sustained contributions of FPN were identified across all the training days; and an inverted-U contribution pattern was found for SAN with most prominent contributions found in the middle of the training (i.e., day 3). The contribution of the DMN increased over training sessions with the most prominent effects observed at the end of training (i.e., day 7). Figure 8C illustrates in greater detail the prediction fluctuations for each network node across the four imaging sessions. These results together suggest that these language-learning-related neural networks dynamically and systematically change during learning to contribute to learning success.
Grammar-general outcome prediction for each network and network node across training days. A, Grammar-general outcome prediction procedure. SVR models were trained with two of the grammar tasks (e.g., NP and SV), and the models were validated with data from the held-out task (e.g., SOV), repeated 3 times (see Materials and Methods). This cross-task CV procedure ensures that the predictive models are generalizable across tasks. B, Day-by-day grammar-general predictive performances were summarized based on the four brain networks. Bootstrapping, Bootstrapping-based prediction distributions; Perm, permutation-based chance distributions. Error bar indicates quantile. *Permutation-based FDR-corrected q < 0.05. **FDR-corrected q < 0.01. C, Regional predictive powers for each network node and each of the 4 training days. Four colors in the circular graphs represent four brain networks. Each region is labeled within the circular graphs. The height of each bar for each region represents the predictive power (r[obs, pred]). Color of the bar represents FDR-corrected q. D2, day 2; D3, day 3; D7, day 7.
We also identified the prediction patterns of the brain networks for each grammar learning task across imaging sessions, respectively (Fig. 9). Distinct contributions in outcome predictions were found across networks, especially for the PSN, FPN, and SAN as well as across the three tasks and four imaging sessions (Fig. 9A). For example, for the PSN, similar decreased patterns in prediction were found in the SV and SOV tasks, whereas a flat contribution line across the training days was observed for the NP task. The predictive powers of individual regions for each task are shown in Figure 9B. Distinct contribution patterns in outcome prediction across the network nodes and training days were found among the three grammar learning tasks, suggesting that the four neural networks dynamically reconfigured in response to learning different grammatical rules over time.
LO prediction performance for each imaging session, grammar learning task, and brain network. A, The prediction distribution for each grammar task and imaging session. Perm, Permutation-based prediction distributions. Error bar indicates 95% percentiles. *Predictive powers were significantly different across tasks (i.e., the main effect of learning task) at the threshold of p = 0.05 (permutation test). B, Regional predictive powers for each task and imaging session. D1, day 1; D2, day 2; D3, day 3; D7, day 7. The FDR approach was used for multiple comparison correction (FDR, q < 0.05).
To further examine whether the activation patterns derived from these networks also subserve learning-task- or grammar-specific information, we conducted a task decoding or classification analysis (i.e., classifying the three tasks) based on the network response patterns of the 23 selected ROIs derived from the outcome prediction procedure. We found that the task decoding accuracies were all significantly better than chance (33.3%) across training days (day 1 median accuracy = 61.5%, p < 0.001, permutation test with 10,000 iterations; day 2 median accuracy = 58.3%, p < 0.001; day 3 median accuracy = 47.9%, p = 0.0067; day 7 median accuracy = 62.5%, p < 0.001). These results further indicate that the network responses not only underlie individual differences in LO profiles but also reflect language-component- or task-specific learning dynamics in addition to the general language learning patterns.
Neural representations of grammatical knowledge emerged during training
We conducted a multivariate grammaticality decoding analysis to examine to which extent the local activation patterns of these network nodes represented newly acquired grammar knowledge during learning and contributed to individual learning success. Single-trial brain responses were estimated using the least-squared single approach (Mumford et al., 2012; Feng et al., 2018a, 2021b). Grammaticality decoding (i.e., grammatical vs ungrammatical trials) was conducted using a support vector machine classifier (SVM-C) with a leave-one-block-out CV procedure (Feng et al., 2018a). ROI-based decoding analysis was performed for each grammar task and each imaging session and learner individually. The 23 ROIs were derived from the profile predictive modeling. The LMER was used to model the ROI-based decoding accuracies with three fixed effects, including the main effect of training day (i.e., 1, 2, 3, and 7), LO, and the day × outcome interaction effect. We found that LIFG from PSN, bilateral MFG and IPL from FPN, and bilateral Ins and SMA from SAN showed significant main effects of training day, where the grammaticality decoding accuracies for these regions increased over training sessions, suggesting that the multivoxel patterns of these regions increasingly encode newly acquired grammar knowledge (Fig. 10). The four FPN regions also showed significant day × outcome interactions, suggesting that the time-dependent changes in the robustness of neural representations of grammaticality differed across learners with different LO profiles. For visualization purposes, we split the learners into two groups (i.e., successful and less successful learners) based on the median outcome scores (N = 16 per group), and displayed the decoding accuracies over four imaging sessions (see Fig. 10, right, line graphs). More successful learners displayed increasingly more robust neural representations of grammaticality (i.e., better in classifying grammatical vs ungrammatical trials).
Neural decoding of grammaticality across training days for the ROIs derived from the predictive modeling. The linear mixed-effects regression model was used to evaluate three fixed effects (i.e., main effects of day, outcome, and day × outcome interaction). D, main effect of day; I, day × outcome interaction. *q < 0.05; **q < 0.01; FDR correction. Error bar indicates SEM.
Because of these frontoparietal grammaticality representation regions being associated with decision-making processes previously, we further examined whether these regions also showed increasing representations of response time following training. Trials were split into two classes (i.e., fast and slow trials) based on the median RT in each imaging session and learning task. We did not find any ROIs that show a significant main effect of day, nor did we find a day × outcome interaction effect in RT-decoding accuracy (Fig. 11). However, we found that the decoding accuracies in those fronto-temporo-parietal regions were significantly better than chance (i.e., 50%) across sessions, which suggests that these regions are also encoding response speed, but such representations did not change as a function of training and learning success. Together, these network nodes in PSN and FPN that showed strong predictive powers at the early stage of training also revealed significantly increasing grammaticality representations over the training sessions. These findings suggest that neural responses to training stimuli at the early stage of training may be associated with the later neural encoding of the acquired grammar knowledge.
Neural decoding of response time across the 4 imaging sessions for the ROIs selected from the outputs of predictive modeling. Trials were split into two classes (i.e., fast and slow trials) based on the median score in each imaging session and learning task. We did not find any ROI that showed a significant main effect of day, nor did we find a day × outcome interaction effect. This is a control analysis to demonstrate that the regions that showed emerging representations of grammaticality (see Fig. 10) did not show emerging representations of decision time following training. Error bar indicates SEM.
RSs between the grammar-representation regions contribute to grammar learning success
To further examine whether the interregional coupling in activation patterns underlies individual differences in learning success, we calculated the interregional RS between each pair of the predictive ROIs for each learner, imaging session, and grammar learning task as an informational connectivity measure (for the RS analysis procedure and an average RS matrix, see Fig. 12A). On average, higher RSs were found between the LIFG and PSN regions (Fig. 12A), suggesting similar representational structures and close functional coupling between these regions. We used these RSs as predictive features to predict LOs with 10-fold CV and 10,000-iteration permutation and bootstrapping procedures. Both positive and negative correlated RSs contributed significantly to predicting individual LOs across tasks (Fig. 12B). We further found that regions that showed significant increments in grammar representations in FPN were widely connected to other networks (i.e., higher RS), and the RSs between the FPN regions and the other network nodes significantly contributed to predicting LOs (Fig. 12C). We also found that RSs between the SAN region (i.e., bilateral insula and SMA) and other network nodes contributed most to the LO prediction. The RSs of these connections were positively associated with LOs, indicating that a more similar pattern in representational structure between these regions relates to better LOs. By contrast, we found that the RSs between DMN core regions (especially the mPFC, PCC, LAG, and LHip) and the other three networks were negatively predictive of LOs, suggesting that less similar in neural representations between these regions is associated with better outcomes. These findings demonstrate that not only regional multivoxel patterns but also interregional coupling or similarity in neural representations subserves individual differences in learning success.
Interregional RSs contribute to LO prediction across the three grammar learning tasks. A, Interregional RS analysis procedure. Local spatial activation patterns were extracted from the ROIs (e.g., LIFG and RIFG) for all stimulus items. For each ROI, dissimilarity (i.e., 1 – Pearson's correlation coefficient) was calculated based on the spatial activation patterns between each pair of stimulus items to construct an nRDM. The RS (i.e., Pearson's correlation) between every pair of nRDMs was calculated to create an interregional RS matrix for each learner. These RSs were then used as predictive features for the LO predictions. The RS matrix displayed in the right panel was summarized across the three grammar tasks and four imaging sections and all the learners. B, Interregional RSs were significantly predictive of LOs for each of the three grammar learning tasks. Color-scale violin plots represent bootstrapping-based prediction distributions. Grayscale violin plots represent permutation-based prediction distributions. *p < 0.01, permutation test. C, We identified edges where their interregional RSs between those grammar representation regions were positively associated with grammar LOs across tasks (i.e., positive predictive edges). D, We identified negative predictive edges to show that decreased RSs between the DMN core regions (e.g., mPFC, PC, and AG), and the other network nodes were associated with more successful grammar learning. Color lines within the circular graphs indicate significantly predictive RSs with a threshold of p < 0.001 (permutation test). Gray lines indicate RSs with a threshold of p < 0.01. Gray bars outside the color rings represent the number of significant predictive edges connecting to a network node.
Discussion
By combining a multiday training protocol with fMRI as well as leveraging predictive modeling and multivariate pattern analyses, we demonstrate spatiotemporal neural network dynamics underlying a phenomenon that is ubiquitous yet far from being understood in language learning: extensive individual differences in LOs within and across components of language. Neural dynamics derived from four imaging sessions across 7 training days served as fingerprints predicting individual LO profiles summarized across language components. Four neural networks (PSN, FPN, SAN, and DMN) were identified, and their responses dynamically contributed to LO predictions across training tasks and sessions. The learning prediction shows varying fluctuations over training sessions across networks: (1) the early predictive contribution of PSN (i.e., decreasing); (2) the sustained contribution of FPN; (3) the robust contribution of SAN emerging around the halfway point of the training (i.e., an inverted-U pattern); and (4) the late involvement of DMN (i.e., increasing). A subset of network nodes in the frontoparietal regions, insula, and supplementary motor areas represents newly acquired linguistic knowledge with increasing accuracy; the robustness of the representations made a marked contribution to differentiate successful and less successful learners. These representational regions are interconnected during training by shared local multivoxel representational patterns, whereas the degrees of interregional representation similarities are highly predictive of learning success. These findings reveal the neural sources of individual differences in language learning by attributing not only the degree of success in learning one language component but also the profiles of LOs across components, to time-dependent neural dynamics.
The dynamic contributions of multiple networks to explaining individual differences in language learning profiles in attainments emphasize the close relationships between the time-dependent neural dynamics and the learning success across aspects of language. The prediction performances of the neural model outperformed models of cognitive and memory measures. The cognitive and memory measures may only reflect static characteristics of the learners, which may not capture essential information about individual differences in LOs that are tightly linked to the learning process and neural dynamics. We further identified that the four networks play different roles in predicting individual learning profiles across training phases. The PSN, consisting of the bilateral IFG (BA44), posterior middle temporal gyrus, and the functional and anatomic connections between these regions, has been previously associated with language comprehension, production, and complex linguistic processes (Saur et al., 2008; Fedorenko et al., 2011; Friederici, 2011; Price, 2012; Feng et al., 2015, 2016). These Perisylvian regions, especially the left IFG, play an important role in learning rule-based linguistic structures (i.e., grammar acquisition) (Tagarelli et al., 2019). Moving beyond these previous findings, we demonstrate that the dynamic responses of the PSN showed significant predictive contributions to individual differences in learning success across components of grammar learning. The prediction models were generalizable across grammar tasks, which demonstrates that increased involvements of the PSN regions are associated with more successful grammar learning in general. The contributions of PSN are most prominent on the first 3 d of training, and its predictive powers gradually decrease as training progresses. At this early learning stage, most of the learners were still novices (mean GJT accuracy across grammar tasks on day 3 was 62%). Increased activations in PSN in response to training at this early learning stage are associated with enhanced LOs, which may reflect early neuromarkers of later learning attainment. These findings also suggest that the initial engagements of PSN in processing and learning to classify grammatical from ungrammatical stimuli are critical to grammar acquisition.
The FPN contributes significantly to the profile prediction across training sessions, more prominently in the first 3 d of training. An active training paradigm, as used in this study, may require attention and executive resources for item memorization and rule abstraction. FPN regions are dynamically and flexibly involved in active demanding tasks, for example, when participants categorize confusable speech categories (Feng et al., 2018a), undertake difficult linguistic and nonlinguistic tasks (Cocchi et al., 2013; Waskom et al., 2014), effortfully process a non-native language (Abutalebi and Green, 2008; Feng et al., 2015), or switch between tasks (Cole et al., 2013) and languages (Abutalebi et al., 2007). FPN regions are also highly connected to PSN regions when participants engage in language tasks (Cole et al., 2013; Feng et al., 2015). Our findings demonstrate that early involvements of FPN and PSN regions during online grammar learning (instead of offline processes) in an active judgment task contribute to successful learning. In the memory literature, the engagements of the frontoparietal regions during the encoding phase are predictive of subsequent success in item recognition and recall (Xue, 2018). Thus, increased responses of both FPN and PSN regions during training may play critical roles in online learning and contribute to future learning attainment.
The SAN shows an inverted-U pattern of prediction across training sessions where its predictions reach a plateau in the middle of training and decrease at the end. Most of the predictive regions are in the corticostriatal pathway, including the bilateral insula, left putamen, right caudate, and left middle cingulate cortex. These regions are related to reward-related learning, including syntactic acquisition (Ullman, 2001, 2004, 2016) and non-native speech learning (Chandrasekaran et al., 2014b; Feng et al., 2019, 2021a). In our paradigm, learners are reliant on trial-by-trial feedback to abstract items and update internal representations of grammar rules. Individual differences in updating neural representations with feedback may be closely associated with individual differences in future learning success. Increased corticostriatal responses may facilitate the formation of item-to-rule representations in the cortex for efficient and automized grammaticality judgments. In contrast, decreased DMN responses are associated with enhanced outcomes. DMN has previously been linked to stimulus monitoring and cognitive resource allocation (Chadick and Gazzaley, 2011; Spreng, 2012). Successful DMN suppression (i.e., less activation) has been associated with better task performances (Daselaar et al., 2004; Anticevic et al., 2012). Thus, the DMN activations may not directly relate to the online learning process per se but post-training language processes. More efficient DMN suppression at the late stage of learning as observed among more successful learners may reflect inhibiting internal self-referential processing and consequently facilitating the processing of acquired linguistic knowledge compared with less successful learners.
The multivoxel representations of grammaticality emerge during training and associate individual differences in learning success, especially in the left IFG from the PSN, frontoparietal FPN regions, and bilateral insula and SMA from the SAN. Previous studies showed that the IFG activities increased throughout learning in response to dependency violation of linguistic structures, whereas learners with higher proficiency showed more activations (Tettamanti et al., 2002; Sakai et al., 2004), which suggest that the IFG may assist ungrammaticality detection (Opitz and Friederici, 2007; Hauser et al., 2012). Moreover, the robustness of the grammaticality representations in the frontoparietal regions is increasingly associated with the degree of learning success, more prominently in the late stage. These FPN regions are also involved in the early learning stage and contribute to learning success. We speculate that the level of frontoparietal activations in the early training stage may reflect the degree of effective learning processes, whereas the multivoxel representations of grammaticality that emerged at the late stage may reflect the neural encoding of acquired grammar knowledge. These two neural measures may be closely related to each other. Increased neural activations in response to the learning items may result in better neural encoding of the items and therefore associate with better LOs. Future studies should systematically test this possibility.
Interregional communications between those representational regions also contribute to individual learning success. We calculated the interregional RSs between each pair of the regions based on their multivoxel patterns. The RS represents the degree of interregional information sharing or communication. Those grammar-representation regions are highly interconnected during learning, and their RSs are predictive of LO, which suggests that not only regional responses, but also interregional RSs, subserve individual differences in learning success. The robustness of RSs among the PSN, FPN, and SAN nodes is associated with learning success, whereas increased RSs between the representation regions and DMN regions relate to decreased outcomes. Previous studies have demonstrated that functional couplings among the language-learning-related regions related to the acquisition of different aspects of an artificial or foreign language (Kepinska et al., 2017a, 2018; Feng et al., 2019; Qi et al., 2019). Our findings build on these previous results and demonstrate that increased RSs between PSN, FPN, and SAN regions during learning are associated with future learning success.
The neural responses of four networks, especially the PSN, FPN, and SAN, and their neural reconfiguration during training subserve individual differences in learning profiles. This study represents a pioneering exploration of neural fingerprints underlying individual learning profiles across components of language and training phases. Future research will involve a larger sample size and developing prior hypotheses concerning the exact contributing patterns of the networks and how they evolve together to predict language learning profiles holistically.
Footnotes
P.C.M.W. is a founder of a company in Hong Kong supported by a Hong Kong SAR government startup scheme for universities. The remaining authors declare no competing financial interests.
This work was supported by grants from the Research Grants Council of Hong Kong 14619518 to G.F. and 34000118 to P.C.M.W.; and the Direct Grant for Research from the Chinese University of Hong Kong 4051137 to G.F.
- Correspondence should be addressed to Gangyi Feng at g.feng{at}cuhk.edu.hk or Patrick C. M. Wong at p.wong{at}cuhk.edu.hk