Abstract
Language comprehension requires the rapid retrieval and integration of contextually appropriate concepts (“semantic cognition”). Current neurobiological models of semantic cognition are limited by the spatial and temporal restrictions of single-modality neuroimaging and lesion approaches. This is a major impediment given the rapid sequence of processing steps that have to be coordinated to accurately comprehend language. Through the use of fused functional magnetic resonance imaging and electroencephalography analysis in humans (n = 26 adults; 15 females), we elucidate a temporally and spatially specific neurobiological model for real-time semantic cognition. We find that semantic cognition in the context of language comprehension is supported by trade-offs between widespread neural networks over the course of milliseconds. Incorporation of spatial and temporal characteristics, as well as behavioral measures, provide convergent evidence for the following progression: a hippocampal/anterior temporal phonological semantic retrieval network (peaking at ∼300 ms after the sentence final word); a frontotemporal thematic semantic network (∼400 ms); a hippocampal memory update network (∼500 ms); an inferior frontal semantic syntactic reappraisal network (∼600 ms); and nodes of the default mode network associated with conceptual coherence (∼750 ms). Additionally, in typical adults, mediatory relationships among these networks are significantly predictive of language comprehension ability. These findings provide a conceptual and methodological framework for the examination of speech and language disorders, with additional implications for the characterization of cognitive processes and clinical populations in other cognitive domains.
SIGNIFICANCE STATEMENT The present study identifies a real-time neurobiological model of the meaning processes required during language comprehension (i.e., “semantic cognition”). Using a novel application of fused magnetic resonance imaging and electroencephalography in humans, we found that semantic cognition during language comprehension is supported by a rapid progression of widespread neural networks related to meaning, meaning integration, memory, reappraisal, and conceptual cohesion. Relationships among these systems were predictive of individuals' language comprehension efficiency. Our findings are the first to use fused neuroimaging analysis to elucidate language processes. In so doing, this study provides a new conceptual and methodological framework in which to characterize language processes and guide the treatment of speech and language deficits/disorders.
Introduction
The ability to rapidly extract meaningful information from language is a fundamental human skill needed to navigate the social world. Adequate language comprehension (LC) ability in the context of complex language (i.e., beyond single-word processing) requires the engagement of multiple brain networks responsible for accessing and combining appropriate concepts (i.e., “semantic cognition”). The breakdown of this complex process is a key clinical marker across a range of neurologic disorders (Mueller et al., 2018; Smith and Caplan, 2018; O'Sullivan et al., 2019), but data-driven neurobiological characterization of LC and LC ability is limited. Evidence from functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) studies identify spatial and temporal patterns of semantic cognition, respectively. Theories that span both of these modalities agree to some extent that meaning processes stem from rapid interactions between word retrieval processes in the temporal lobe and combinatorial meaning processes in the frontal lobe (Friederici and Gierhan, 2013; Hagoort, 2013). However, despite numerous studies of complex language processes, beyond single words, the “interplay [of these networks] in the service of language understanding remains to be specified” (Friederici, 2015). Underspecification of these processes is due in large part to the spatial and temporal limitations of single-modality neuroimaging that prevent a real-time elucidation of brain networks that support LC (Osterhout et al., 2012; Dronkers et al., 2017). This is a major impediment given the rapid sequence of processing steps that have to be coordinated to accurately comprehend language. Here we use joint independent component analysis (jICA; Calhoun et al., 2006a) of fMRI and EEG to track the rapid (millisecond) interactions of widespread cortical networks necessary for semantic cognition beyond the single word, and to identify which networks and network interactions are most critical for LC ability.
Previous fMRI and lesion studies have highlighted a number of frontal, temporal, and parietal brain regions that support semantic cognition (i.e., where semantic cognition occurs), though the specific nature of the contribution of each region is still debated (Friederici, 2011; Friederici and Gierhan, 2013; Hagoort, 2013; Jefferies, 2013). Brain studies point to a semantic memory network centered in temporal structures, such as the middle temporal gyrus (MTG) and anterior temporal lobe (ATL), which support the retrieval of word meaning from long-term memory (Hagoort, 2013; Jefferies, 2013). However, both regions have also been implicated in context-dependent meaning access (i.e., semantic control; Ferstl et al., 2008; Whitney et al., 2012; Davey et al., 2016), and it is unclear whether the MTG or ATL is the primary retrieval hub. Lesion and neuroimaging work has suggested that the MTG may act as a connection point between the ATL-centered retrieval network and the frontotemporal semantic control network, primarily comprising the MTG and the inferior frontal gyrus (IFG). The IFG is also consistently linked to a more general role in combinatorial processes that unify individual words into a multiword context (Hagoort, 2005; Friederici, 2011; Davey et al., 2016). Hagoort (2013) suggest that the left IFG contains a gradation of functionality related to the binding of information, with ventral IFG supporting semantic combinatorial processes, and dorsal IFG supporting syntactic processes. At the highest level of comprehension, studies have found a consistent set of domain-general processing areas in the default mode network (DMN) whose interregional correlations are associated with sentence-level and story-level information (Ferstl et al., 2008; Mar, 2011; Simony et al., 2016; Baldassano et al., 2017), with a specific role for the posterior midline [precuneus cingulate (PCU)/posterior cingulate (PCC)] in building the coherence of ideas (Ferstl et al., 2008; Whitney et al., 2009). fMRI studies consequently reveal that adequate semantic cognition requires interactions between temporal and frontal language areas that support the access and integration of meaning. However, the limited temporal resolution in fMRI prevents characterization of when and how these networks interact.
In a largely separate body of literature, event-related potential (ERP) studies measure rapid synchronized firing of neuronal ensembles that occur after a stimulus of interest (i.e., when semantic cognition occurs). Peak amplitudes of the corresponding waveforms vary based on semantic cognition demands. This task sensitivity allows insight into the underlying cognitive functions of ERP signals. To date, ERP work has demonstrated that meaning access and integration occur ∼300–800 ms after word onset (Kutas and Federmeier, 2011). ERP studies have focused on characterizing cognition related to waveforms within this time window to develop a model of subprocesses that result in adequate semantic cognition. This includes studies on the canonical N400 effect, a negative waveform whose amplitude reflects the difficulty of meaning access (Kutas and Federmeier, 2011). Recent attention has also been given to two positive waveforms, the P300 and P600, implicated in semantic memory and combinatorial processes, respectively, but their role in semantic cognition is heavily debated (Osterhout, 1997; Kuperberg, 2007; Brouwer et al., 2012; Riby and Orme, 2013; DeLong et al., 2014). While ERP work provides a critical temporal framework in which to examine semantic cognition, limitations in source localization have prevented more complete, data-driven spatiotemporal models of semantic cognition, particularly in the context of sentence comprehension (Friederici, 2015).
The respective temporal and spatial limitations of fMRI and EEG have prevented a comprehensive, data-driven model of semantic cognition and, consequently, restricted our understanding of language-related disorders. In the present study, we address the single-modality limitations of fMRI and ERP through a novel application of fused fMRI and ERP analysis, called jICA, to track both where and when semantic cognition occurs during sentence comprehension and, via this approach, to identify the neural progression that underpins typical language comprehension ability (Calhoun et al., 2006b; Mijović et al., 2012). jICA takes advantage of cross-subject variability to simultaneously estimate independent components from subjects' fMRI spatial maps and ERP time courses. The outputs are joint components (JCs) that include a spatial map and corresponding time course, and subject-specific loadings that indicate how similar a subject's brain signatures are to the component. This approach allows us to identify network activation changes on the millisecond timescale (Mijović et al., 2012). Here, we identify (1) the progression of rapid brain network exchanges needed to extract meaning from language, and (2) which of these brain signals contributes to LC ability.
Materials and Methods
Participants and behavioral metrics
Participants.
Thirty right-handed male and female adult participants were recruited from the community. All participants were native English speakers with normal or corrected-to-normal vision, no history of major psychiatric illness, and no contraindication to MRI. To ensure that subjects had an IQ within the normal range and did not have dyslexia, we administered the Kaufman Brief Intelligence Test—matrices subtest (Bain and Jaspers, 2010) and the Woodcock-Johnson Reading Mastery Test (WRMT) III—Letter Word Identification (LWID) and Word Attack (WA) subtests (McGrew et al., 2007). Behavioral metrics confirmed that subjects had IQ within the normal range [minimum > 85 standard score (SS); mean = 111.52 ± 8.48] and basic reading (BR) ability (minimum > 85 ss; mean = 105.94 ± 7.9). Of the original subject pool, n = 4 were excluded because of motion artifacts (n = 3) and the inability to complete the two sessions (n = 1). The final analysis included 26 adults (mean age = 25.36 ± 3.69; 15 female). Participants gave written informed consent at the beginning of the study, with procedures conducted in accordance with the Vanderbilt University Institutional Review Board (IRB). Participants received compensation for behavioral and neuroimaging testing as per the study IRB.
Behavioral metrics.
To assess subject LC ability, we administered the WRMT-III Passage Comprehension subtest. The WRMT-III requires subjects to read a sentence and fill in a missing word. To determine BR ability, scores on the WRMT-III LWID and WA subtests were averaged and converted to z scores. To parse the individual contributions of P600 subcomponents, we additionally collected behavioral metrics on vocabulary, and syntactic and conceptual integration ability. Receptive vocabulary was measured using the Peabody Picture Vocabulary Test (Dunn and Dunn, 2007); syntactic ability was measured using the Woodcock-Johnson IV—Sentence Fluency subtest (Schrank et al., 2014). Working memory was assessed using the Weschler Adult Intelligence Scale (Drozdick et al., 2012), backward and sequential digit span subtests. Conceptual integration was measured using an in-house paradigm based on previous work (Graves et al., 2010) in which subjects viewed a grid of four pictures and were instructed to combine two of the pictures to create a compound word (e.g., a picture of a house and a picture of a tree would be identified as a “treehouse”). Only two of the pictures could form a compound word, with the other two pictures acting as distractors. Distractor pictures were target pictures from other trials in the test. A total of 23 subjects completed all behavioral tests, and regression analyses were run in this subset (see Statistical approach).
Experimental design and statistical analysis
Stimulus construction.
Stimuli were created to capture both word-level and sentence-level congruence effects, but only sentence congruence effects are within the scope of the present study. Specifically, we used a novel 2 × 2 sentence reading design that manipulated lexical congruency (i.e., whether or not embedded word pairs were semantically related to one another) and sentence congruence (i.e., whether or not the sentence “made sense”; Fig. 1). Stimuli were constructed in sets of two sentence frames (4–11 words). Though not examined in the present study, the sentence final critical word was either primed by a preceding word (≥0.07 association strength; South Florida Association Norms; Nelson et al., 2004), or was not primed (<0.07 association strength), and either made the sentence congruent or incongruent in meaning (Fig. 1). All critical words were included in both congruent and incongruent sentence conditions. Incongruent sentences contained either information contradictory to known world properties (e.g., “The bird spread its fingers”) or to world experience (e.g., “Amy got in trouble for getting dirt on her mud”), or contained internally contradictory information (e.g., “The glasses did not work and made her eyes see”). A task probe on whether the sentence made sense (see below) revealed that subjects were able to distinguish incongruent sentences from congruent sentences with a high degree of accuracy (>90%). The paradigm resulted in the following four conditions: (1) congruent word pairs, congruent sentence (CWCS); (2) incongruent word pairs, congruent sentence (IWCS); (3) congruent word pairs, incongruent sentence (CWIS); and (4) incongruent word pairs, incongruent sentence (IWIS). Word congruency effects were insignificant and outside of the scope of the present article. In the analytical pipeline (described below) sentence congruency effects were determined by comparing congruent sentence conditions (CWCS and IWCS) to incongruent sentence conditions (CWIS and IWIS). The methodological approach ensured that congruent and incongruent sentences were syntactically identical, with only the final word differing across the comparison.
Stimulus presentation.
Two separate lists of stimuli were constructed. Each list contained a total of 192 sentences (48 sentences/condition) presented across four runs (duration = ∼6 min/run). To minimize repetition effects related to repeated sentence frames, sentence presentation order was randomized within the list. Lists were counterbalanced across fMRI and EEG per subject. FMRI and EEG administration order was counterbalanced across subjects (see below). During each session, sentences were presented one word at a time (Fig. 2). Words had white letters centered on a black background, Comic Sans MS font, size 32, at a visual angle of 25°. Each word was presented for 500 ms, with a 100 ms pause between words. To ensure task attention and sentence comprehension, subjects were probed after the end of each sentence about whether the sentence did or did not make sense (i.e., sentence congruency measure). The sentence terminal word was followed by a 1000 ms break (indicated by a plus sign), then a probe of “yes/no,” during which the subject had 1250 ms to respond on a button box. All subjects had high accuracy (>90%) for the sentence probe, confirming that incongruent sentences were highly identifiable, and that subjects stayed on task. Because of high performance on the probe, all sentences were included in the final analysis.
fMRI/EEG acquisition.
To counteract any learning effects related to the task, the fMRI and EEG sessions were separated by an average of 6.1 ± 3.5 months (range: 3 d to 1.19 years), and fMRI/EEG administration order was counterbalanced across subjects (n = 13 subjects performed the EEG session first). Subjects were additionally counterbalanced on which of the two stimuli lists they received for their first session, as well as the response hand.
fMRI data acquisition and preprocessing.
All fMRI scans were acquired at Vanderbilt University Institute of Imaging Sciences on one of two Philips Achieva 3 T MR scanners with a 32-channel head coil. Functional images were acquired using a gradient echoplanar imaging sequence with 40 (3-mm-thick) slices with no gap and consisted of 4 runs (single run duration = 6 min; 160 dynamics/run). Slices were collected parallel to the anterior commissural–posterior commissural plane. Additional imaging parameters for functional images included the following: TE, 30 ms; FOV, 240 × 240 × 120 mm; flip angle, 75°; TR, 2200 ms; and 3 mm3 voxels. Image processing was completed using MATLAB R2018b and SPM12 (Friston et al., 1994). We used an event-related design, with the events timed to the sentence final critical word. Preprocessing included slice-timing correction (corrected to central slice; n = 20), realignment of volumes to the mean functional image, coregistration of the T1 to the MNI template, segmentation-based normalization of functional images to a standardized space, smoothing (kernel, 8 mm3), and motion correction using artifact detection tools (ART; https://www.nitrc.org/projects/artifact_detect/). Subjects with >20% motion outliers (defined with a z threshold of 9) were excluded from the analysis (n = 1). For each subject, contrast maps of the sentence final word were generated per condition (CWCS, IWCS, CWIS, IWIS) versus an explicitly modeled plus sign baseline. These subject-level contrasts were input into the jICA pipeline.
EEG data acquisition and preprocessing.
All EEG data were acquired at the Vanderbilt Kennedy Center, using a 128-channel geodesic sensor net (EGI). Data were sampled at 250 Hz with filters set to 0.1–30 Hz. The vertex was used as the reference during data acquisition. Data processing was completed using NetStation and MATLAB. EEG data were segmented into epochs of 1000 ms, starting 100 ms before the onset of the critical word. For all conditions, the critical word was in the sentence final position. Recordings were rereferenced to an average reference. Ocular and muscle artifacts were identified through automated and manual artifact identification processes; contaminated electrodes in each trial were rejected and trials with >10 rejected electrodes were excluded from analysis. To be included in the statistical analysis, individual condition ERPs were based on a minimum of 20 trials; n = 2 subjects were excluded because of excessive motion artifacts. Confirmatory analysis of the waveforms revealed expected N400 effects across sentence conditions. The N400 was defined as the mean voltages in a 300–600 ms latency window when compared with the 100 ms prestimulus baseline, pulled from centroparietal electrodes (electrodes 54, 55, 62, 80, 81, 32, 7, 107; Kutas and Federmeier, 2011). Preprocessed time signals for each condition were averaged across subjects, then entered into a grand average across subjects per condition. These grand averages for each of the four conditions were input into the jICA pipeline (i.e., the same conditions as the fMRI jICA inputs). Difference waves were generated for incongruent–congruent words, and incongruent–congruent sentences, and the maximum negative peak within the N400 time window (300–600 ms) per subject was input into a one-sample t test. The N400 effect was significant for word congruence (t(25) = −4.19; p < 0.001; d = −1.68) and sentence congruence (t(25) = −12.12; p < 0.001; d = −4.85) manipulations. These findings confirm that our paradigm captured expected EEG patterns (Fig. 1b).
fMRI ERP jICA.
Fusion analysis was performed using the Fusion ICA Toolbox (FIT) in MATLAB, and followed processing protocols established by Calhoun et al. (2006b) and Mijović et al. (2012; but see Edwards et al., 2012; Ouyang et al., 2015), which were developed for parallel fMRI/EEG acquisition (notably, parallel acquisition has been found to be more ideal for this approach than simultaneous acquisition; Calhoun et al., 2006b; Mijović et al., 2012). In jICA, independent components for fMRI and EEG are simultaneously estimated. Compared with other multimodal analysis approaches, jICA allows for the spatial and temporal components of EEG and fMRI, respectively, to influence each other, and is consequently considered to truly be a “fused” data analysis approach (Mijović et al., 2012). In jICA, the spatial fMRI maps and the ERP component time course are concatenated into a subject × data input matrix (the ERP time course is upsampled using a cubic spline interpolation so that it is the same dimensionality as the spatial fMRI vector; Mijović et al., 2012). The fMRI and ERP data are first-level contrast map and grand average time course (averaged across centroparietal electrodes), respectively, for one condition. Consequently, the only within-subject data in the pipeline are for condition. The model assumes that ERP peaks and BOLD responses change in a similar way across subjects. This approach provides robust, high-quality data decompositions (Mijović et al., 2012) that have been validated across a number of cognitive substrates and populations (Calhoun et al., 2006b; Calhoun and Adali, 2009; Edwards et al., 2012; Ouyang et al., 2015). The jICA algorithm outputs group-level, joint independent components that include information for each modality (i.e., one component includes both an ERP time course and a spatial map). Condition-specific maps and time courses are back-reconstructed to allow identification of how each condition contributes to the cross-condition components. The strength of this contribution is reflected by a subject and condition-specific scalar parameter loading (i.e., a measure of how “strong” the component signal is within that subject and condition), which can be used to statistically identify condition differences per component. This means that each subject had four scalar loadings (one per condition) that could be included in statistical models. A limitation of this ICA stacking method is that it assumes each condition has a similar underlying signal that only differs in magnitude. However, in the present study, a measurement of spatial divergence (Renyi divergence) revealed that the average divergence across conditions was very low (divergence, <2). As voxel-by-voxel statistical tests were not run in the spatial maps, multiple-comparison correction was not used. For display, the spatial maps are normalized and voxels with a value of z > 2.6 were displayed, which identifies the voxels that are the highest contributors to the component (i.e., >2.6 SDs above the mean contribution, equivalent to p < 0.005). This is consistent with the procedures of ICA in MRI (Himberg et al., 2004; Calhoun et al., 2006b, 2009; Edwards et al., 2012; Mijović et al., 2012).
ICA parameters.
The Infomax algorithm was used to identify joint components. To determine the ideal number of components, we followed protocols established by Artoni et al. (2014) and Himberg et al. (2004). First, we used ICASSO to identify the number of stable components. ICASSO iteratively runs an ICA to determine the stability of generated components. As recommended by Himberg et al. (2004), we set the component number, k, to the subject number (k = 26), and performed 50 ICASSO iterations. Final component number was determined using the stability index (Iq), which reflect the internal stability of a component. K = 15 components was the smallest k value to meet a high Iq threshold (mean Iq, >0.95; minimum Iq, >0.90; Himberg et al., 2004; Turner et al., 2012; James et al., 2014). Qualitative follow-up examination revealed that adjacent component amounts (e.g., k = 14 and k = 16) resulted in nearly identical findings, while examinations of (1) low total components (e.g., k = 4) showed a merging of significant subcomponents (e.g., the P600 was a single component, whereas higher component amounts subdivided the P600 into subcomponents that significantly contributed to the raw data); and (2) high total components (e.g., k = 21) showed a division of relevant subcomponents so that they did not meet contribution thresholds.
Component selection.
As performed in Calhoun et al. (2006b), we took advantage of the ERP signals to identify components that reflected noise versus true brain signals. We applied the following criteria: (1) components had to contribute >1 SD of variance to the grand mean of the EEG signal (n = 5 components removed; Edwards et al., 2012); (2) we used the findpeaks MATLAB function to identify any components with an excessive number of peaks (outliers were defined as >30 peaks; n = 1 component removed); and (3) after non-noisy components were identified, the remaining components were screened to identify positive or negative temporal waveforms that fell within the expected time window of semantic cognition (300–800 ms poststimuli; Pulvermüller, 2012; Hoedemaker and Gordon, 2017; Mollo et al., 2018). This screening resulted in five spatiotemporal components that shared characteristics with the P300 (n = 1 component), N400 (n = 1 component), and P600 (n = 3 components; referred to in the present article as P600a, P600b, and P600c, in order of peak latency). To determine the replicability of the results, we performed a split-halves validation analysis (Vabalas et al., 2019). Subjects were randomized into two subgroups of 13 subjects each (with four conditions per subject). Separate jICAs, identical to the full group analysis, were run for each group. We used the MATLAB findpeaks function to identify components with peaks within the range of the peak of each joint component from the full analysis, and resulting time courses were manually reviewed. This resulted in five JCs per subgroup that showed temporal characteristics of the P300, N400, P600a, P600b, and P600c. To determine whether spatial maps were replicated, first, MNI labels were generated from the full analysis at a reduced z-threshold (z > 1.5). Subgroup MNI labels were then compared with the full analysis labels. For each label and subgroup, the subgroup received a value of 1 if it had activation in the full analysis region, and 0 if it did not (at z > 1.5). This resulted in binary vectors for each subgroup, which were then compared with one another using intraclass correlation coefficient (ICC). The subgroup jICA spatial maps showed fair (0.4–0.6) to good (>0.6) ICC values with one another (Mickela et al., 2022) with the exception of JC3 (see below). Whole-brain conjunction maps across subgroups (z > 1.5) for all components (JCs 1–5) also revealed overlap in the key language areas seen in the full analysis. Here we provide the ICC values for each spatial comparison (all significant at p < 0.005), the r values for each temporal comparison (all significant at p < 0.001), and whole-brain overlap in regions of interest within the language and comprehension network across subgroups (Table 1, notation of all replicated regions in whole-brain conjunction analysis): (1) JC1: ICC = 0.64, r = 0.40 [regions: bilateral ATL; medial prefrontal cortex (mPFC); left supramarginal gyrus (SMG)]; (2) JC2: ICC = 0.47, r = 0.91 (regions: left MTG, left ATL); (3) JC3: ICC = 0.30, r = 0.83 [regions: bilateral parahippocampal gyrus/hippocampus, superior temporal gyrus (STG)]; (4) JC4: ICC = 0.63, r = 0.38 (regions: left ventral IFG, left ATL); and (5) JC5: ICC = 0.58, r = 0.64 [regions: bilateral precuneus (PCU)].
Spatial activations for each joint component, along with the BA, cluster volume in cubic centimeters, and maximum z value size and location for left/right sides
Statistical approach.
Once peaks were identified, confirmatory one-sided t tests were run to ensure that the joint component loadings showed expected significant effects related to semantic cognition demands [incongruent vs congruent sentences; one sided; all p values were Benjamini–Hochberg false discovery rate (FDR) corrected for five tests]. To characterize behavioral correlates of the joint components, two ANCOVAs per component were run to ascertain the relationship between the JC loadings. The first model per component included language-related metrics (basic reading, vocabulary, syntax, and conceptual integration, controlling for condition). The second model per component included working memory metrics (digit span backward and sequential, controlling for condition). The dependent variable was the component loading per condition across subjects, and the independent variables were the behavioral metrics, which allowed for the control of covariance across behavioral measures and allowed us to specifically isolate independent behavioral predictors of the component loading. Conditions were treated as repeated measures (i.e., there were four component loading values per subject), and, as there were no significant interactions across analyses with condition, condition was included as a control variable. These regressions revealed significant associations between the JCs and behavior. All p values across variables in the 10 tests were corrected together using Benjamini–Hochberg FDR correction.
To ascertain which joint components were predictive of LC ability (as defined by standard scores from the WRMT-III passage comprehension subtest), five multiple regression analyses were run (one per component) with LC ability as the dependent variable, and the joint component loadings as the independent variable (with condition as a control). All p values were corrected using Benjamini–Hochberg FDR correction for five tests. Condition did not show any significant interactions with the component loadings in predicting LC ability, and so the condition interaction was removed from the models. As recommended by Luck (2005), separate analyses were run per component to see which of the independent components predicted LC ability. Three components were significantly predictive of LC ability (JC1, JC2, and JC5). Last, mediation analysis was run to determine whether, within the set of components significantly related to LC ability (JC1, JC2, and JC5), the relationships between early components and LC ability are mediated by later components (i.e., whether the effects of early components on LC ability “go through” later components). To test this, we used the Mediation Toolbox (https://github.com/canlab/MediationToolbox) to run pairwise mediation analyses in the order of the temporal sequence, with the mediator the later components in the time series (as recommended by the mediation time series analysis in Ager and De Boeck (2017; with results FDR corrected). This resulted in the following three mediation analyses: (1) JC1 and LC ability, mediated by JC2; (2) JC1 and LC ability, mediated by JC5; and (3) JC2 and LC ability mediated by JC5. All results were bootstrapped in the Mediation Toolbox with 10,000 iterations, and the resulting p values were corrected using Benjamini–Hochberg FDR correction for three tests.
Neurosynth meta-analytic key word identification for JC's 1-5.
To provide additional evidence for the role of each JC, we used Neurosynth (www.neurosynth.org) to identify cognitive terms that are significantly associated with the primary nodes of each JC (Table 2). Neurosynth is a meta-analytic database with >14,000 studies, which can estimate the key terms most closely associated with a particular coordinate (search radius, 6 mm). We input peak coordinates for the five largest clusters of each JC into the Neurosynth localization software. Neurosynth performs an association test that runs an ANOVA to test whether a region is more consistently present in studies that mention a key term versus studies that do not mention a key term. After excluding the following term categories: brain regions, clinical populations, clinical diagnoses, methods (e.g., “stimulation”), generic, noncognitive terminology (e.g., “task”), and close repeats of key words (i.e., “semantic” and “semantics”), we selected the the top three key terms for each of the peak coordinates. In addition to providing a z score for the term to seed match, Neurosynth also provides an r value for the correlation between the meta-analytic functional connectivity map of a specific term (e.g., “semantic memory”) and the meta-analytic resting-state connectivity of the seed area (e.g., connectivity to the left MTG). The r score is another view of how the network of the seed corresponds with the associated network of the term, and we included the term with the top r value for each seed region in Table 2.
Cognitive terms significantly related to each brain region per JC for two measures of association: (1) z score for a meta-analytic association test and (2) r score that determines the correlation between the meta-analytic connectivity map and the connectivity map of the associated term of the coordinates
Data availability
The stimuli and data that support the findings of this study are available on reasonable request from the corresponding author (K.S.A.). The data are not publicly available due to research participants not providing consent to share their data outside of Vanderbilt University and other institutions specified in the consent form. All figures have associated raw data.
Results
Adult participants (n = 26) read sentences during fMRI and, in a separate session, while EEG data were collected. Sentences were constructed to vary in semantic cognition demands, with the sentence final word determining whether the sentence was congruent or incongruent in overall meaning (i.e., whether the sentence did or did not make sense; see Materials and Methods; Fig. 1). Sentence frames were identical across congruent and incongruent conditions; thus, comparisons of incongruent versus congruent conditions allowed for isolation of semantic rather than syntactic processing (Fig. 1a). After each sentence, participants indicated whether the sentence made sense or not via button press.
a, Example stimulus for the 2 × 2 congruency design, in which word pairs embedded in a sentence were incongruent [first column; incongruent words (IW)] or congruent [second column; congruent words (CW)], and sentence pairs were implausible [top row; orange boxes; incongruent sentences (IS)] or plausible [bottom row; blue boxes; congruent sentences; (CS)]. The current study examines CS versus IS. b, Grand average ERP time courses from centroparietal electrodes for incongruent sentences (IS; orange line) and congruent sentences (CS; blue line) show expected N400 and P600 effects in the 1 s after the sentence final word (significant at p < 0.05; see Materials and Methods).
Subject-level fMRI spatial maps and ERP time courses (averaged across trials) were input into the jICA pipeline (see Materials and Methods). JICA allows for the identification of spatial maps and corresponding time courses related to a stimulus, in this case the sentence final word. As done in previous studies, we used the time course of the ERP waveforms as the orienting framework for the joint components so that the peak of the joint component ERP time courses indicates the temporal order of the corresponding spatial activations (Mijović et al., 2012). We then examined which joint components corresponded with subjects' written language comprehension ability (see Materials and Methods).
Spatiotemporal progression of LC
The jICA approach revealed five spatiotemporal components within the canonical time window of semantic cognition (300–800 ms after the critical word; Fig. 3). Temporally, these signals reflected characteristics of the P300 (n = 1 joint component), N400 (n = 1), and P600 (n = 3). After FDR correction for multiple comparisons (Benjamini–Hochberg FDR correction for five comparisons), results revealed significant sentence congruence effects (congruent vs incongruent) for each of the five joint components, which we include here as the confirmation of the appropriate component identification, as follows: JC1: semantic retrieval network and the P300 [t(25) = 5.93; corrected p value (pcorr) ≤ 0.001; d = 2.37]; JC2: semantic control network and the N400 (t(25) = −6.96; pcorr < 0.001; d = −2.78); memory network and P600a (t(25) = −1.75; pcorr = 0.046; d = −0.7); semantic–syntactic network and P600b (t(25) = −2.64; pcorr = 0.011; d = −1.06); and JC5: comprehension network and P600c (t(25) = −2.27; pcorr = 0.02; d = −0.91). To ascertain the distinct functions of each subpeak, we ran two ANCOVAs per component to identify which subject behaviors of (1) word reading, vocabulary, syntax, and conceptual integration; and (2) working memory ability best predicted jICA loadings on each component (see Materials and Methods; all p values Benjamini–Hochberg FDR corrected for all p values within the 10 tests).
Timing parameters for the sentence stimuli. Each stimulus screen was presented for 500 ms, with 100 ms between screens. After the sentence was finished, subjects saw a plus sign for 1000 ms, then a prompt of “No/Yes” for 1250 ms, during which time they had to press a button for whether the sentence did or did not make sense.
jICA analysis resulted in five spatiotemporal components (JC1–JC5) that mapped onto the following: JC1, ATL; JC2, the canonical frontotemporal language network; JC3, hippocampus and right STG; JC4, bilateral ventral and left dorsal IFG; and JC5, the DMN. Spatial network changes corresponded, respectively, with a positive temporal component in the P300 time window, one negative component in the N400 time window, and three positive components in the P600 time window, referred to here as P600a, P600b, and P600c. Spatial and temporal units are arbitrary; temporal components are filtered for display. Images displayed at z > 2.6.
Joint component 1—semantic memory
JC1 localized to the hippocampus, bilateral/right ATL [Brodmann areas (BAs) 22/38], MTG, mPFC, left SMG, areas in the dorsal attention network (frontal eye fields, inferior parietal lobule), and the anterior cingulate cortex (ACC; Table 1, for all coordinates). The component had a positive peak that occurred at ∼300 ms post-stimulus onset, consistent with the P300 component. An ANCOVA revealed that JC1 was significantly positively associated with basic reading ability (t(86) = 3.96, pcorr = 0.002; d = 0.85) and working memory (t(88) = 3.72, pcorr = 0.003; d = 0.79; digit span sequential).
Joint component 2—semantic control
The second joint component localized to canonical language areas, including bilateral temporal regions (STG, MTG, and ATL), bilateral frontal areas (IFG and dorsolateral prefrontal cortex), cingulate gyrus, PCU, and the motor/cerebellar regions. This component temporally had a negative peak at ∼400 ms poststimulus, which fell into the N400 effect time window for our average ERP results (Fig. 1b). An ANCOVA revealed that JC2 was significantly negatively associated with syntax (t(86) = −4.20, pcorr = 0.001; d = −0.91; sentence fluency) and working memory (t(88) = −3.64, pcorr = 0.003; d = −0.78; digit span sequential).
Joint component 3—memory schema
The first P600 component corresponded with coactivation of language and memory regions, including bilateral STG (BA 22), parahippocampal areas, right (SMG), ACC, PCU, premotor and sensory areas, insula, and the cerebellum. This component had a positive peak latency at ∼500 ms poststimulus, which fell into the early P600 effects window in the average ERP findings (Fig. 1b). The temporal component will henceforth be referred to as P600a. An ANCOVA revealed that JC3 was significantly negatively associated with vocabulary ability (t(86) = −2.62, pcorr = 0.034; d = −0.57) and working memory (t(88) = −3.03, pcorr = 0.012; d = −0.65; digit span backward). Syntactic ability was also associated with the JC3; however, syntactic ability did not predict JC3 loadings alone, and inclusion in the model decreased the R2 value of the model, revealing it as a suppressor variable.
Joint component 4—semantic–syntactic reappraisal
The second joint component within the P600 range (P600b) corresponded with coactivated clusters in bilateral ventral and left dorsal IFG, as well as activation in bilateral STG, parahippocampal and fusiform gyri, PCC, sensory areas, and the cerebellum. This component exhibited classic spatiotemporal properties of the “syntactic” P600, with a positive peak latency at ∼600 ms poststimulus. To ascertain the specificity of this component to semantic processes, we ran an ANCOVA to examine the relationship of this component to semantic and syntactic ability. JC4 had significant positive associations with both vocabulary ability (t(86) = 2.60, pcorr = 0.034; d = 0.56) and syntax (t(86) = 3.08; pcorr = 0.011; d = 0.66; sentence fluency), each contributing independent variance, with a trending relationship with working memory (t(88) = 2.23, pcorr = 0.076; d = 0.47; digit span sequential).
Joint component 5—conceptual coherence
The final joint component within the P600 range (P600c) had coactivation of regions within the DMN, particularly in the PCU, but with smaller loadings in the mPFC, left angular gyrus, and PCC. Additional coactivations could also be seen in bilateral STG, sensory areas, the insula, and the cerebellum. This component had a late positive peak latency at ∼750 ms poststimulus. An ANCOVA revealed that JC5 was significantly, positively predicted by conceptual integration (t(86) = 3.36, pcorr = 0.005; d = 0.72).
Spatiotemporal patterns of LC ability
In the second set of analyses, we aimed to identify how individual differences in LC ability are related to the spatiotemporal signals described above. We found that global LC ability (defined by standard scores from the WRMT-III passage comprehension subtest, controlling for condition) was positively related to JC1 (P300; bilateral ATL; t(89) = 3.13; pcorr = 0.006; d = 0.66); negatively related to JC2 (N400; left IFG and left MTG; t(89) = −3.11; pcorr = 0.006; d = −0.66),1 and positively related to JC5 (P600c; DMN; t(89) = 2.58; pcorr = 0.019; d = 0.55; Benjamini–Hochberg FDR correction for five comparisons). This suggests that during sentence comprehension, efficient LC is associated with greater reliance on earlier automated retrieval processes centered in the ATL (BAs 22/38) and hippocampus, and late comprehension processes in the DMN, with decreased reliance on activation within the extended semantic control network in the N400 time window (Fig. 4a). We were next interested in whether the effect of earlier components on LC ability would be mediated by (i.e., “go through”) later components. Specifically, we anticipated that the effect of JC1 on LC ability would be mediated by JC2 and JC5, and the effect of JC2 on LC ability would be mediated by JC5. We found all three analyses had trending or significant mediations (all p values Benjamini–Hochberg FDR corrected for three comparisons). JC2 had a trending mediation effect on the relationship of JC1 with LC ability (t(89) = 1.52; b = 5.10; d = 0.32; pcorr = 0.077), and JC5 had a significant partial mediation on JC1 and LC ability (t(89) = 1.55; b = 2.57; d = 0.33; pcorr = 0.040). Additionally, JC5 had a trending mediation effect on the relationship of JC2 with LC ability (t(89) = −1.69; b = −2.51; d = −0.36; pcorr = 0.073; Fig. 4b). This provides initial evidence that in addition to the impact of individual components on LC ability, there may be dependencies between brain networks that support LC ability, including a potential neural path between early semantic phonological brain areas and later comprehension areas.
a, A comparison of single subject grand averages of ERP data for two representative subjects with high (red) and low (blue) LC scores. The group spatial components and related time windows (indicated by box length) for P300 (light blue), N400 (orange), and P600c (green) are overlaid onto the subject ERP responses, demonstrating the dynamic relationship among the significant components, and how those dynamics contribute to LC ability (the P300, N400, and P600). b, Mediation analysis revealed that DMN comprehension areas activated ∼750 ms poststimulus (JC5; green box) partially mediated the relationship between early semantic phonological areas activated at ∼300 ms poststimulus (JC1; blue box) and LC ability.
Discussion
We applied a novel fused MRI/EEG approach to elucidate language processes and to identify a temporally and spatially specified neurobiological model for LC ability. Our combined results suggest that semantic cognition during language processing involves a system of rapid sequential engagements of multiple cortical networks (Table 3). Specifically, semantic cognition during sentence reading is first marked by (1) early signals centered in the hippocampus and bilateral ATL (peak, ∼300 ms; corresponding with the P300 component), closely followed by and overlapping with (2) ATL coupling with the broader frontotemporal language network (peak, ∼400 ms; corresponding with the N400 component). These early language network patterns, respectively mapping to regions associated with semantic retrieval and control (Whitney et al., 2011; Davey et al., 2015), are followed by three networks that fall within the P600 waveform and have the following distinct spatial, temporal, and behavioral characteristics: (3) a hippocampal and right STG network corresponding with sequential working memory ability (peak, ∼500 ms), (4) a ventral and dorsal IFG network corresponding with vocabulary and syntactic ability (peak, ∼600 ms), and, last, (5) the DMN (particularly the posterior midline) corresponding with conceptual integration ability (peak, ∼750 ms). The particular mappings for each component reflect and extend previous psycholinguistic theories of sentence comprehension (Friederici, 2002; Fengler et al., 2016).
Summary table of findings for each of the joint components, including Task, ERP component, top fMRI brain regions, behavioral correlates, the top Neurosynth terms for the fMRI findings (not in order), and our interpretation based on these combined findings
To better contextualize the potential functional contributions of each joint component, we used the meta-analytic platform Neurosynth (www.neurosynth.org), which provides information on the functions most associated with a given brain region (Table 2, meta-analytic term associations with each component). Although this approach relies on reverse inference rather than experimental manipulation, it provides a data-driven framework for interpreting the possible functions associated with the temporal flow of information observed in the present study.
Our earliest components, JC1 and JC2, map to phonological/semantic memory and semantic integration areas, respectively. Interestingly, they also have overlapping time windows with peaks at ∼300 and ∼400 ms for JC1 and JC2, respectively (i.e., they are independent, but co-occur). For JC1 (P300), the mappings to the hippocampus, temporal lobes, and mPFC areas are consistent with previously described memory circuits, specifically automated spreading activations during the retrieval of dominant semantic characteristics of an item (Davey et al., 2016). These areas are coactivated with frontal and temporal regions associated in Neurosynth with phonological processing. Covariate analyses revealed that JC1 is positively related to basic word reading processes (i.e., sounding out words) and working memory. The combined findings suggest that JC1 may support phonological–semantic binding processes related to word reading. The sensitivity of JC1 to sentence congruence (with greater loadings for congruent sentences) suggests that these word-level processes are assisted by prior semantic context (i.e., JC1 interacts with predictive processing before the sentence final word). This interpretation is consistent with broader theories of the P300 (Azizian et al., 2006; Polich, 2007). Integrative theories suggest that P300 sources support template-matching processes in which information that matches an internal representation (e.g., automated semantic memory processes facilitated by sentences that are congruent with a person's real-world experience) results in a larger P300 amplitude than nonmatched information (but see Table 2). Our findings suggest that these early activations in frontal and temporal areas work in parallel with language-processing areas in JC2 (N400). JC2 regions fall within the well described semantic control network (i.e., regions that constrain word meaning retrieval based on the thematic content of the previous text, also called thematic semantic processes; Whitney et al., 2012; Davey et al., 2015, 2016; Walenski et al., 2019). The negative relationship between JC2 and sentence fluency ability, working memory ability, and sentence congruence suggests that JC2 is more prevalent with increased difficulty, whether at the subject or stimulus level. This interpretation is consistent with theories suggesting that the N400 supports probabilistic, thematic interpretation of a word based on the preceding context, and traces to a wave of activation across frontotemporal language areas (Kutas and Federmeier, 2011). Combined, JC1 and JC2 appear to form a complex of memory and meaning processes that overlap in the left superior ATL. Of note, neither JC1 nor JC2 correlated with vocabulary ability, despite being sensitive to semantic congruence manipulations, and mapping to canonical semantic processing areas. It is possible that the functionalities of JC1 and JC2 are less related to the depth of vocabulary knowledge than to the predictive and contextual retrieval of semantic information. However, further study is needed to distinguish the precise form of semantic cognition related to these components.
The JC1/JC2 complex is followed by three components that all fall within the P600 time window, but have distinct etymologies. JC3 is the earliest peak in the P600 time window, peaking at ∼500 ms poststimulus, and its spatial map includes the hippocampus, and the prefrontal and posterior midline areas related to recognition memory, working memory, and language (Table 2). JC3 was also found to be significantly negatively related to an individual's working memory ability. The combination of latency, localization, and behavior in JC3 provides evidence that this component shares similarities with the characteristics of the late positive component (LPC; sometimes also referred to as the P300b). The LPC has been linked to post hoc memory schema updates, in which unexpected information either triggers greater long-term memory retrieval processes than expected information already stored in working memory (Olichney et al., 2000) or requires greater reliance on updates to an existing memory scheme (DeLong et al., 2014; Richter, 2019), resulting in a larger LPC amplitude. This effect is dependent on the ease of this process, and so individuals with more automated memory and language processes would be expected to have lower jICA loadings, as is the case in our findings. In the context of psycholinguistic theory, the joint component described here may reflect communication between semantic and syntactic binding areas (STG; Skeide et al., 2014; Frankland and Greene, 2015) and memory schema-encoding areas (Milivojevic et al., 2016) to support “structured semantic combinations” that reference previously stored structures (Frankland and Greene, 2015).
JC4 (P600b) shares spatial and temporal features with the canonical P600 effect, with a peak at ∼600 ms poststimuli and a spatial map centered on the left IFG. The P600 effect has been proposed to reflect syntactic-only reappraisal (Osterhout and Holcomb, 1992), semantic–syntactic reappraisal (Kuperberg, 2007), semantic integration (Brouwer et al., 2012), and/or domain-general processes (Burkhardt, 2007; Shen et al., 2016), among others. In the present study, our findings support the theory that the P600b effect is driven by both semantic and syntactic reappraisal processes during sentence comprehension. First, JC4 includes well known semantic and syntactic frontal areas (ventral and dorsal IFG, respectively; Hagoort, 2005), as well having significant and independent positive associations with vocabulary and syntactic ability. Additionally, our paradigm argues against a syntax-only interpretation: the sentence frames in which only the final word determined congruency ensure that syntactic processes for congruent versus noncongruent sentences are identical, and consequently allow for the isolation of semantic processes. Our findings support previous suggestions that reassessment in the context of sentence comprehension (in typical adults) is a dynamic process that does involve the semantic system at ∼600 ms poststimulus (Kuperberg, 2007).
The last component, JC5 (P600c), has a spatial map that includes key nodes of the default mode network that peaks at ∼750 ms poststimulus (Table 2). This component is also significantly associated with a subject's conceptual integration ability (see Materials and Methods). The location, latency, and behavior of JC5 is consistent with a role in situation model processing (i.e., the updating of new information into the reader's internal representation of the text; Whitney et al., 2009; Aboud et al., 2016; Baldassano et al., 2017). Specifically, the primary loading in the cuneus/PCU directly overlaps with several studies on narrative processing, which have found that the PCU is specifically sensitive to event boundaries within stories (Whitney et al., 2009; Baldassano et al., 2017). Whitney et al. (2009) have proposed that this portion of the PCU acts to update situation models at key narrative moments (i.e., “narrative shifts”), which corresponds with the more general association of the region with episodic memory (Zhang and Li, 2012). Previous work has also shown that incongruent information within a story elicits stronger responses in the PCU. Recent ERP work found strong evidence for a relationship between a P600 at similar latency and situation model updating (Burkhardt, 2007). Our finding that JC5 is sensitive to incongruent sentences is the first to consolidate fMRI and ERP literature findings to provide joint evidence that situation model updating may occur at the tail end of the P600 effect in sentence processing.
Overall, our results provide evidence that semantic processing of sentences involves early phonological semantic and thematic semantic processes occurring in the hippocampus/ATL/SMG and frontotemporal language circuit, respectively. These early activations are followed by hippocampal-based schematic memory processes potentially related to semantic–syntactic schema in the STG (Skeide et al., 2014; Frankland and Greene, 2015); inferior frontal semantic–syntactic reappraisal; and, last, conceptual integration in the DMN. From this high-resolution examination of semantic cognition, we were next able to provide evidence for a neurobiological model of typical LC ability. Through regression and mediation analyses, we found that less efficient LC is marked by early decreased reliance on a focal network for a phonological semantic memory network (JC1); this is followed by greater reliance on a wider language network related to the N400 (JC2), potentially via the left ATL hub shared by JC1 and JC2; then this complex is proceeded by a decreased reliance in the comprehension areas of the DMN (JC5). Mediation analysis showed that the relationship between early phonological semantic areas and LC ability is mediated by DMN comprehension areas, with trending support for a progression of mediation effects in which the impact of early semantic phonological signals on LC ability is mediated, or “goes through,” thematic semantic and comprehension networks. This suggests that the relationship across components, as well as the individual components themselves, are important in predicting LC ability. Because our methods do not provide evidence for directionality, future studies with causal designs should test the dependent relationship across components, including whether early semantic/phonological components drive comprehension networks that lead to LC ability, and/or whether later DMN comprehension areas gate the impact that early components have on LC ability. Additional studies should also test whether joint components centered on different electrode groupings and captured at different time points in a sentence are able to identify networks that predict or interact with the ones described in the current manuscript. It is unclear from our findings why JC3 and JC4 are not related to LC ability. It is possible that the memory and linguistic binding processes in JC3 and JC4 are more relevant to global reading success in longer texts, or that these signals would interact with LC ability in different populations with atypical LC ability. Additional study is needed to ascertain in which contexts these components may interact with LC ability.
The present study demonstrates the feasibility of applying a fused MRI/EEG approach to disentangle the complex temporal unfolding of processes necessary for LC. However, the data-driven approach in the present study has a key limitation: because of the absence of experimental manipulations in the paradigm, it was not possible to specifically test the neurocognitive profiles for each joint component. Nevertheless, the current results lay the groundwork for future studies using the joint ICA pipeline to test a range of language paradigms and clinical populations. Future studies should examine how interventions such as noninvasive brain stimulation may impact these signals so that causal relationships between components and LC ability/outcomes can be established.
By fusing MRI and EEG to elucidate language processes and identify a temporally and spatially specified neurobiological model for LC ability, this approach provides a new conceptual and methodological paradigm in which to examine speech and language deficits/disorders, and may have additional implications for the examination and treatment of clinical populations in other cognitive domains.
Footnotes
This research was supported by the following funding sources: National Institutes of Health (NIH)/Eunice Kennedy Shriver National Institute of Child Health and Human Development Grants P20-HD-075443, R01-HD-044073, R01-HD-067254, and U54-HD-083211; and NIH/National Center for Advancing Translational Sciences Grants UL1-TR-000445, P30-HD-015052, and DP5-OD-031843.
The authors declare no competing financial interests.
↵1Of note, jICA loadings indicate how similar a subject's spatial/temporal data are to the group data, which are not dependent on the directionality of the component itself. As such, a negative correlation between LC ability and loadings with a negative component like the N400 indicates that higher LC ability is associated with less of an N400-like waveform in the subject's ERP time course).
- Correspondence should be addressed to Katherine S. Aboud at Katherine.aboud{at}vanderbilt.edu