Abstract
Although it is uncontroversial that word meanings shift depending on their context, our understanding of contextualized lexical meaning remains poor. How is a contextualized semantic space organized? In this MEG study (27 human participants, 16 women, 10 men, 1 nonbinary), we manipulated the semantic and syntactic contexts of wordforms to query the organization of this space. All wordforms were noun/verb ambiguous and varied in the semantic distance between their noun and verb uses: unambiguous stems, polysemes with distinct but related meanings, and homonyms with completely unrelated meanings. The senses of each stem were disambiguated by a unique discourse sentence and the items were placed in syntactic contexts of varying sizes. Univariate results characterized syntactic context as a bilateral and distributed effect. A multivariate representational similarity analysis correlated one-hot models of the categorical factors and contextualized embedding-based models with MEG activity. Of all models representing ambiguity, only a model differentiating between syntactic categories across contexts correlated with the brain. An All-Embeddings model, where each contextualized word had a distinct representation, explained distributed neural activity across the left hemisphere. Finally, a Syntactic Context model and Within-Context-Stem model were significant in left occipitoparietal regions. While the noun versus verb contrast affected neural signals robustly, we saw no evidence of the homonym–polyseme–unambiguous contrast, over and above the evidence for fully itemized representations. These findings suggest that in contexts devoid of ambiguity, the neural representation of a word is mainly shaped by its syntactic category and its contextually informed, unique semantic representation.
- contextualized meaning
- magnetoencephalography
- representational similarity analysis
- semantic ambiguity
- semantic representation
- syntactic category
Significance Statement
A word's context can define its meaning. Context is an integral part of understanding language, yet the organization of the semantic space formed by words in context remains unclear. We used magnetoencephalography to investigate the dynamic interaction between contextualized semantic representations, syntactic categories, ambiguity, and local syntactic contexts. We find a left-lateralized network encoding a semantic space where each contextualized instance of a word has a distinct neural representation, while syntactic category had a broad bilateral representation. Our study provides a link between naturalistic multivariate studies of item/word-level semantic processing and more traditional controlled factorial investigations of lexical meaning. These findings enrich our understanding of the neural underpinnings of words in context and highlights the role of syntactic context.
Introduction
Context not only enriches a word's meaning—pink table is more specific than table—but it can also alter the word's meaning. For example, in “set the table,” “delete the table,” and “to table a discussion,” the meaning of “table” ranges from a household object to ending a conversation, showing how the same wordform changes conceptually across contexts.
Language is naturally ambiguous and most words are like table. Previous work on lexical ambiguity has investigated the resolution of ambiguity during sentence processing (MacDonald, 1994; MacDonald et al., 1994), differences in processing between ambiguous and unambiguous items (Chan et al., 2004; Hoenig and Scheef, 2009; Mizrachi et al., 2024), and differences in degrees of sense relatedness in ambiguous words (Beretta et al., 2005; Rodd, 2018). However, the role of context in shaping lexical representations remains unknown.
Ultimately, understanding the dynamics of sentence processing will require an understanding of how local syntactic and semantic contexts impact neural representations of words. The question of how language is affected by context is not new. A growing use of Large Language Models (LLMs) in neuroscience has begun to shed light on the neural instantiation of contextualized meaning. Contextualized vector embeddings have been shown to linearly predict brain activity in both fMRI and MEG (Caucheteux and King, 2022; Arana et al., 2023 for review), have semantic spaces with similar geometries (Goldstein et al., 2024), or even serve as a model for the shared communicative space between speakers and listeners (Zada et al., 2023). Akin to instance models of semantic representation (Jamieson et al., 2018), the idea behind a contextualized meaning space is that it contains each and every contextual modification of table from its use with a modifier to the change in syntactic category.
Multivariate investigations of semantic representation have established the existence of a distributed semantic feature space (Frisby et al., 2023) such as the one suggested in Hagoort’s Memory, Unification and Control framework (Hagoort, 2013), or work by Gaskell and Marslen-Wilson (1997) and Vigliocco et al. (2004). These distributed patterns of cortical activity, when coactivated, form word-level representations. In hub and spoke models (Patterson and Lambon Ralph, 2016), these features are bound into individual conceptual or word-level representations in the anterior temporal lobe. Although distributed word representations are not the focus of this study, they have paved the way for the concept of a distributed semantic space (Huth et al., 2016) where contextualized words have their own distinct patterns of activity.
What is the fundamental organization of the meaning space activated by words in context? Lexical items can be grouped into morphemes linking multiple related senses into a shared abstract representation, a standard assumption in linguistics. However, since we know that word meanings adjust to their context, neural representations of the same item could be sufficiently altered by their local context such that there is no consistent pattern. Each context effectively creates a specific representation of a word. This is the most fine-grained and least abstract hypothesis and thus the least controversial one. Different theories of lexical and syntactic representation then offer many levels of abstraction away from a fully contextualized space (Fig. 1).
A, Stimuli-specific semantic space. The trees contain the actual stimuli presented to the participants for three target words (unambiguous “dreams,” polyseme “flies,” and homonym “dates”) organized into a hierarchy of representations. B, RSA models of different organizational hypotheses of the semantic space shown in A. Models are hierarchically organized based on the level of representation in A that the model represents. This figure only contains models that formulated hypotheses about the organization of semantic space. C, Experiment design. The table contains the discourse sentence and stimuli for three example words from each ambiguity type.
We probed the organization of the contextualized semantic space with magnetoencephalography (MEG) and noun/verb ambiguous items with different degrees of semantic distance between their senses. The target words were embedded in local syntactic contexts of varying sizes to reveal the systematic effect of increased contextual modulation. The sentences were additionally preceded by a discourse sentence to maximally distinguish the trials from each other. In addition to a univariate analysis, we used representational similarity analysis (RSA) to correlate brain activity with different organizational hypotheses using both one-hot encoding of our categorical factors as well as contextualized embeddings extracted from RoBERTa (Liu et al., 2019). This approach provided a fairly complete univariate and multivariate characterization of the influence of semantic ambiguity, syntactic categories, and contexts on semantic space.
Materials and Methods
Participants
Twenty-seven participants (16 women, 10 men, 1 nonbinary; mean age = 23 years) participated in the MEG experiment; after exclusions due to handedness, failure to return for the second session (5 participants were unable to return for the second session), cognitive impairment and excessive noisiness of the recorded data, only 17 participants were entered into the analysis. However, we collected a relatively large amount of data from each participant, one hour and a half's worth, with an average of 45 min of recording per session per participant. To accomplish this, participants came to the lab twice, for Session 1 and Session 2. During each session, they were presented with half of the experimental materials. All participants were native or near-native speakers of English, gave informed consent, and had normal to corrected-to-normal vision.
Design
The experiment was a 2 × 3 × 3 factorial design with factors Syntactic Category (Noun and Verb), Ambiguity Type (Unambiguous, Polysemous, Homonymous), and Syntactic Context (Null, Minimal, Embedded), as shown in Figure 1C. There were 1,008 trials (504 trials per Session) with a total of 18 conditions. As each target word was in each syntactic context four times, we had a total of 56 trials per condition. Prior to the critical item and its local syntactic context (Null, Minimal or Embedded), we presented a discourse sentence which served to bias the sense of the critical item. The lexical material of the discourse sentences and the local syntactic material preceding the critical item included no lexical repetitions across the experiment. This created a large amount of variability across the stimuli, which served our goal of enhanced ecological validity. Although a categorical factor, the primary purpose of our ambiguity manipulation was to serve as an additional source of variance.
The target words were embedded in local syntactic contexts of varying sizes. In the Null context, the bare word was presented. In the Minimal context, the target word was embedded in a minimal phrase: either a verb phrase (VP) eliciting a verb use or a determiner phrase (DP) eliciting a noun use. In the Embedded context, the minimal phrase was in turn embedded within a six-word sentence in sentence-final position. The target word was always sentence final. Consequently, there was fuller disambiguation in Embedded contexts and less disambiguation in Null contexts. To increase naturalness, all other words preceding the embedded minimal phrase varied across sentences which created a large amount of conceptual and syntactic variation in the Embedded contexts. We chose a design where stimuli, as a function of the context manipulation, varied in their naturalness to relate our results to a wider range of literatures and to fully leverage the properties of the semantic space we created.
In line with testing different hypotheses about the nature of lexical ambiguity and how differences in sense and meaning are represented (for review, see Falkum and Vicente, 2015), no further restrictions apart from plausibility were imposed on the stimuli. A discourse sentence was presented before the critical stimuli to create bias toward one sense over another. These discourse sentences were of varying lengths and additionally served as a discourse context that further added to the naturalness of the experiment. We aimed to have ∼4–5 words in the discourse sentences (mean = 4.6 words, SD = 1.12; mean Null = 4.45 words, SD = 1.05, mean Min = 4.37 words, SD = 1.10, mean Emb = 4.98, SD = 1.08).
Stimuli
Forty-two lexically ambiguous words were selected with 14 words per ambiguity type (unambiguous, polyseme, homonym)—two specific senses of the words were chosen from the four most frequent senses in WordNet for each syntactic category.
Words were classified as homonyms if their senses had separate entries in Wordsmyth (https://www.wordsmyth.net/) while polysemes were words with senses under the same entry. Both a categorical and a continuous measure of the relatedness between the two senses of each target word were normed through an online norming study. All target words had noun and verb senses. For words with multiple verb or noun senses, only one noun and verb sense per target word were used throughout the stimuli. The ambiguity of the unambiguous stems was not semantic but rather due to the syntactic category of the stem in context. These words were further defined as stems whose noun and verb senses had a circular meaning. For example, to dream is to have a dream and a dream is something that is dreamt. This was additionally checked with Wordsmyth to see if they shared the same lexical entry. For clarity, the term “stem” will be used when referring to the wordform (dream) and “sense” when referring to a specific contextualized meaning (dreamn, a dream; see Fig. 1A for all possible divisions of the semantic space).
Our target words can be seen as a highly controlled minilexicon with specific linguistic and semantic properties and thus as constituting their own semantic space. The target words were selected controlling for word length, frequency, concreteness (Brysbaert et al., 2014), and intransitivity using the Google Syntactic NGrams “English 1 Million” Corpus (wilcoxeg, 2022) in order to facilitate their compatibility with stimuli sentence frames. All words were high frequency with a Zipf value above four and sampled from the SUBTLEX database (Brysbaert and New, 2009). Selection was further restricted by monitoring the dominant part of speech and its frequency relative to other parts of speech. This was determined by comparing frequencies of the WordNet synonym set associated with the chosen noun and verb senses. The dominant part of speech for each word was additionally indicated in the SUBTLEX corpus. Of the 42 target words chosen, 11 were noun-dominant, 13 were verb dominant, and 14 had equally dominant senses. We also verified that the target words in a minimal context could be sentence final by using the Corpus of Contemporary American English (Davies, 2008). We searched the corpus with the target word preceded by a PRON tag marking a pronoun and followed by a period to mark the end of a sentence, e.g., PRON flies_n. All minimal phrases had a frequency above 0.
Discourse-sentence bias ratings
In order to validate the biasing effect of the discourse sentence, we ran an online discourse-sentence bias experiment through Gorilla. Thirty-four participants were recruited using SONA, our university recruitment platform. All participants were native or near-native speakers of English without any language disorders and completed the Language Experience and Proficiency Questionnaire (LEAP-Q) questionnaire (Marian et al., 2007). Mean age, 19.2 years; SD: 1.11; 20 women, 10 men, 1 nonbinary person; mean age of acquisition: 1.45 years, SD: 3.04. Three participants declined to answer.
As in the sense-norming experiment, participants were presented with the biasing discourse sentence and beneath it, the noun and verb versions of the target words with short definitions or synonyms underneath to pinpoint the intended senses. The stimuli were divided into eight blocks and participants were asked to take as long as they needed to make a judgment. To maintain attention and exclude participants who were not attentively making judgments, simple math equations were presented before the breaks between each block. Participants with reaction times 3 SD away from the mean were excluded from analysis as well as participants whose distribution of responses on the Likert scale was 3 SD away from the participant mean. Only three participants were excluded from analysis.
For each trial, participants were asked to rate the biasing strength of the discourse sentence on a 7-point Likert scale ranging from strongly noun to strongly verb. Crucially, the stimulus sentence was not included in the experiment. The task instructions were as follows: “Which meaning of the ambiguous word do you associate with this sentence? Rate the following sentences on a scale from 1-7 ranging from 1: strongly noun, 2: somewhat noun, 3: a little noun, 4: both, 5: a little verb, 6: somewhat verb, 7: strongly verb.” Since the noun–verb distinction was the only dimension that separated the two senses, we asked our participants to provide noun–verb biasing judgments. None of the participants rating the stimuli participated in the final MEG experiment or the sense-norming experiment.
Results
In order to understand the distance between the intended sense and the one accessed by our participants, we calculated a mean distance score between their responses and the “true” sense, which was assumed to be 1 for intended noun senses and 7 for intended verb senses. We expected that the homonyms and polysemes would be easier to differentiate than our unambiguous targets and that the distance scores would be lowest for the homonyms and highest for the unambiguous items. The mean distance score across all items was 1.82 (SD = 1.02).
As shown in Figure 2, the judgment data yielded generally low distance scores, showing that our biasing sentences successfully disambiguated our target words. As we expected, the unambiguous targets elicited the highest distance scores (mean distance = 2.760, SD = 0.857), which makes sense given that the noun and verb senses of our unambiguous targets were extremely close to each other and thus neither was grossly incompatible with the biasing sentence. Homonyms elicited low distance scores (mean distance = 1.22, SD = 0.596) and polysemes only slightly higher ones (mean distance = 1.47, SD = 0.825), indicating good disambiguation for both.
Discourse-sentence bias ratings. Results from a stimulus norming experiment assessing the sense disambiguating force of our discourse sentences. Subjects rated the noun–verb disambiguating impact of each discourse sentence on a Likert scale (1–7). The violin plot shows the distances between the subjects’ ratings and the intended category membership of each target. Since unambiguous targets had noun and verb senses that were extremely close to each other, their distance scores were expected to be higher, which is what is observed here. Polysemes and homonyms were effectively disambiguated by the discourse sentences, as evidenced by the low distance scores.
The effect of Ambiguity Type was significant (F(2,990) = 406.57, p < 0.001, η2 = 0.450) in a three-way ANOVA on the distance scores, which also assessed whether the distance scores differed as a function of Syntactic Category or Syntactic Context. This large effect of Ambiguity Type was driven by unambiguous items having higher distance scores than polysemes (p < 0.001) or homonyms (p < 0.001) confirming the fact that ambiguous items are easier to disambiguate.
The main effect of Syntactic Category was significant (F(1,990) = 20.42, p < 0.001, η2 = 0.020), with verbs showing slightly better disambiguation (i.e., lower distance scores) than nouns (p = 0.016). The main effect of Syntactic Context was also significant (F(2,990) = 10.23, p < 0.001, η2 = 0.020), albeit small, with Minimal contexts showing stronger disambiguation than Embedded ones (p = 0.0195). This subtle effect was however limited to verbs, which manifested as an interaction between Syntactic Context and Syntactic Category (F(2,990) = 4.84, p = 0.008; η2 = 9.69 × 10−3). But overall, all effects apart from Ambiguity Type had small effect sizes and the distance scores of our ambiguous targets low, as desired.
Sense-norming experiment
To determine the semantic distances between the noun and verb forms of our stems, a sense-norming experiment was conducted online through Gorilla. Sixty-two participants were recruited through both social media and Prolific. All participants were native or near-native English speakers without any language disorders. They all completed the Language Experience and Proficiency Questionnaire (LEAP-Q) questionnaire (Marian et al., 2007). Mean age, 30.8 years; SD: 13.06; 25 women, 36 men, 1 nonbinary person; mean age of acquisition: 0.79 years, SD: 1.95.
Participants were presented with the two senses of a given stem with a short definition in parentheses underneath. For half of the trials, the participants were asked to give a categorical judgment about the relatedness of the two senses. In the other half, they were asked to rate the relatedness of the two senses on a Likert scale (1–7) ranging from 1: completely unrelated, to 7: share the same meaning. Which stems received each judgment was counterbalanced between participants. None of the participants norming the stimuli participated in the MEG experiment.
To mitigate against the common intuition that the noun and verb uses of a stem are different “words,” participants were warned that they would be presented with noun/verb ambiguous items. In the categorical judgment task, the general task instructions were, “For each noun-verb pair, please indicate if the two meanings are related.” Then for each trial, participants were asked, “Are these two meanings related?”. The task instructions for the continuous judgment task were, “For each noun-verb pair, please indicate how related the two meanings are”. For each trial, participants were asked “How close are these two meanings?”.
Unambiguous items had a mean Likert score of 6.239 (SD = 0.333) and all words were categorically judged as having related senses. Polysemes had a mean Likert score of 3.215 (SD = 1.086) and 10/14 words were judged as having related senses. Homonyms had a mean Likert score of 1.626 (SD = 0.516) and all words were judged as having unrelated senses. A one-way ANOVA (Fig. 3A) on the continuous judgments showed a significant effect of Ambiguity Type on the relatedness score between senses (F(2,39) = 148.2, p = 0.001, η2 = 0.610). A follow-up post hoc Tukey HSD test indicated that the relatedness score for unambiguous words was significantly greater than for polysemous words (p<1 × 10−6) or homonyms (p < 1 × 10−6) and the score for polysemous words was significantly greater than that of homonyms (p = 1.95 × 10−6).
Sense-norming experiment. Results from a sense-norming experiment which normed the ambiguity of the target words. Subjects characterized the sense relatedness of each word using both categorical yes–no judgments and by rating them on a Likert scale (1–7). We observed a categorical distinction between ambiguity types for both categorical and continuous judgements. A, Violin plot of results of an ANOVA on the Likert scores for each target word. The violin plot shows the mean Likert score for each Ambiguity Type. Significant differences between types which were evaluated with a post hoc Tukey HSD test are provided as asterisks above the violin plots. B, Predicted probabilities and logistic regression plot of categorical relatedness judgements. Error bars represent the 95% confidence interval for the predicted probabilities per category estimated using the standard error.
A logistic regression model (Fig. 3B) was fit on the categorical judgment data to see if ambiguity type would predict a yes–no relatedness response. The model significantly predicted the data [LL = −425.3765 (df = 3), p < 0.001]. The predicted probabilities of a yes-related response per category are as follows: 0.095 for homonyms, 0.4067 for polysemes (OR = 6.76, 95%CI [4.56,10.31]), and 0.988 (OR = 789.32, 95% CI [354.01, 2118.53]) for unambiguous words.
Procedure
Before the experiment, each participant's head shape and positions of five marker coils and three fiducials were digitized using a Polhemus FastSCAN system. The positions of the marker coils were localized at the beginning and end of the experiment.
Participants were then led through a practice session with practice stimuli that were not included in the final experiment to get familiarized with the pace of presentation and the task. During each session, the participants were instructed to restrict eye movements to between each block and the start screens before each trial. Each word and discourse sentence were presented in white on a black screen in Futura font with a height of 3.5% of the presentation window. The screen was placed at a distance of 50 cm from the participants. The stimuli were presented using Rapid Serial Visual Presentation (RSVP) except for the discourse sentence which immediately preceded the target sentence. Subjects initiated each trial with a button press. To reduce the length of the experiment, the four words preceding the target phrase were chunked into pairs for the Embedded context. The presentation rate of the target sentences (bare word, minimal phrase or full sentence) was 250 ms on and 250 ms off. The trial structure is illustrated in Figure 4.
Trial structure and task. This is a sample trial for the target word flies. The target word which here has been indicated in bold was always stimulus final. For the embedded contexts, the four preceding words were presented on a single screen as a chunk to reduce experiment length. Screens displaying the target stimulus are indicated by thicker squares. The task was a forced choice semantic relatedness task and was randomly presented 30% of the time.
Trials were randomly followed by a task 30% of the time. Participants were presented with a forced choice semantic relatedness task where they had to choose which of two words shown on the screen better matched the overall meaning of the previous sentences. As the task was meant to ensure proper comprehension, one of the two choices was an obvious synonym of the target word and the other was randomly selected from a sample of the top 220,000 wordforms in the Corpus of Contemporary American English (Davies, 2008). All words included in the sample had a frequency higher than 200 and were not related to the target words.
The stimuli were divided so that half were presented during the first session and half during the second session out of concern about the length of the experiment. This division was the same across participants and the stimuli were randomized within each session. Participants were in the MEG for 45 min on average per session.
MEG data acquisition and preprocessing
Magnetoencephalography data was recorded using a 157-channel axial gradiometer whole-head MEG system (Kanazawa Institute of Technology) at a sampling frequency of 1,000 Hz while the participants lay down in a dimly lit magnetically shielded room. A photodiode was used to estimate the delay between the onset of the visual stimulus and the triggers. The trials were corrected using this delay during preprocessing. Online 200 Hz low-pass and 0.1 Hz high-pass filters were used during recording. Because of excessive environmental noise, four subjects were recorded with a 0.3 Hz high-pass filter for Session 1 and eight subjects were recorded at 0.3 Hz for Session 2.
Since participants were recorded in two sessions, for each session, data blocks were concatenated and then noise reduced using the Continuously Adjusted Least-Squares Method available in MEG 160. The raw data was then preprocessed using MNE Python (Gramfort et al., 2013). Each session was preprocessed and subsequently analyzed separately.
The data was first filtered using a bandpass filter between 1 and 40 Hz (the high-pass filter was necessary given NYC environmental noise). The trials were then segmented into epochs with a total duration of 1 s from −200 ms prestimulus onset to 800 ms poststimulus onset. Trials were demeaned across the whole epoch due to the varying size of the stimuli. Ocular and cardiac artifacts as well as radio noise and environmental artifacts were removed using ICA (Independent Component Analysis). Trials with a peak-to-peak amplitude exceeding 3,000 fT after noise reduction were automatically rejected and any bad channels were removed and interpolated (Min = 1, Max = 5). Ten participants were excluded from the final analysis due to excessive environmental noise, failure to participate in both sessions (n = 5), cognitive impairment, or left-handedness (that was disclosed after data collection).
Source estimation was done in MNE Python. First, an average FreeSurfer brain was scaled and coregistered to match the participants’ digitized headshapes and fiducials. On this scaled surface, we generated a source-space with 2,562 vertices per hemisphere. Then forward solutions were computed for each participant using a boundary element model. A channel noise-covariance matrix was generated per session using the individual epochs and baseline corrected using 100 ms before target word onset. Finally, the inverse solution for each subject was computed using an SNR of 3 and applied to the evoked response for the univariate analysis and an SNR of 2 applied to each epoch for the multivariate analyses. This inverse solution was used to compute the L2 minimum norm source estimates which yielded noise-normalized dynamic statistical parameter maps (dSPM). For the univariate analysis, the data was averaged in sensor space per condition for each participant before source estimation. For the RSA, source-spaces were estimated for each trial.
MEG data analyses
Whole-brain univariate analysis
The goal of the univariate analysis was to create a clear link to existing literature through the replication of known compositional and lexical effects, represented by the three factors of our experiment design. Classic theories of ambiguity posit that (contextual) ambiguity can be seen as a categorical factor (Persson, 1988). Words with the same ambiguity type should have similar structural relationships between their meanings and senses, which should translate into similar types of processing within each ambiguity level. On a univariate level, we tried to detect general categorical patterns in line with the hypothesis that ambiguity type influences patterns of neural activity.
A mixed-measures ANOVA with Ambiguity Type, Syntactic Category, and Syntactic Context as fixed factors and Session and Subject as random factors was conducted across the whole brain at each timepoint and each vertex from 0 to 650 ms after target onset (to avoid including the onset of task processing in the analysis). Spatiotemporal cluster-based permutation tests (Maris and Oostenveld, 2007) were conducted on uncorrected clusters of the resulting F values to test for the presence of each main effect and interaction in order to establish significance. Clusters had to have an F value corresponding to an uncorrected p value of 0.05, a minimum duration of 20 ms, and include at least 20 adjacent sources in order to be included in the analysis. Only main effects and interactions between fixed effects for the whole-brain analysis will be reported in detail.
Representational similarity analysis
RSA (Kriegeskorte et al., 2008) was used to pursue our main goal of evaluating the structure and organization of the contextualized meaning space. In addition to addressing this general question, we investigated two other questions: can we abstract a representation of form that is independent of meaning and is there a representation of syntactic category that is independent of meaning? In order to address the relationship between the representation of meaning and that of form and syntactic category, we used RSA to assess the representational similarity between the neural activity in each trial and models of (dis)similarity (RDM). Each model tested an underlying hypothesis about the structure or organization of a contextualized semantic space, such as instance-like contextual representations or sense versus form level representations. As shown in Figure 1B, these hypotheses are not mutually exclusive but rather operate at varying hierarchical levels of organization. We have outlined each hypothesis below. Since this experiment explores different organizations of a contextualized semantic space, we focused on our multivariate analysis which allowed for a more nuanced approach to differences in representation. Our univariate analysis was run as an additional sanity check of our data. Though multiple model RDMs were tested, they were not created or evaluated with the goal of optimizing model fit but rather testing different underlying hypotheses.
The data were first downsampled to 500 Hz in order to maximize computational efficiency. Single-trial dSPM source estimates were generated using the same window of analysis and method used for the univariate source estimates. The final analysis was a spatiotemporal searchlight using a 20 ms sliding window and search area of 20 vertices. Each window and search area formed a spatiotemporal patch. For each patch, the pairwise dissimilarity between the neural activity for all trials was computed using 1 - pearson correlation as a measure of dissimilarity. Then, a second-order Spearman rank correlation was computed between the source estimate patches and the model RDMs. This formed a source time course of the overall correlation between the models and neural activity in each session. The RSA was performed using the MNE-RSA package (https://users.aalto.fi/∼vanvlm1/mne-rsa/api.html) in Python.
The correlation source time courses between each session and the chosen models were z-transformed using arctanh and morphed back to the average FreeSurfer brain for all subjects before statistical analysis. A one-sample t test against a mean of 0 was realized on all the morphed source estimates from both sessions to evaluate the significance of each model on a group level. The resulting t value map was used as input for a spatiotemporal cluster-based permutation test (Maris and Oostenveld, 2007) using 10,000 permutations. Clusters had to have a t value corresponding to an uncorrected p value of 0.05, a minimum duration of 40 ms and include at least 20 adjacent sources in order to be included in the analysis. All univariate and multivariate cluster-based permutation testing was performed using the eelbrain package in Python (Brodbeck et al., 2023).
One-hot encoding models and associated hypotheses
The multivariate analysis proceeded in two steps. First, nine simple theoretical and behavioral models representing certain aspects of the factorial design were created. These aimed to replicate the univariate analysis in multivariate space and had the advantage of using a basic representational encoding of the data. Only the four one-hot models that are most relevant to the research question will be explained in detail. The following additional one-hot models are explained in Extended Data Table 6-1: Senses, Distance, Likert, Ambiguity, Three-way Ambiguity, and Composition.
The Syntactic Category model tests the hypothesis that words with the same syntactic category share an underlying neural representation regardless of meaning or local context. As such, the model encodes the difference between nouns and verbs (Vigliocco et al., 2011; Crepaldi et al., 2013) regardless of the context or target word associated with the trial: items were given a dissimilarity of 0 when compared with the same syntactic category and 1 when compared with words from the other category.
Second, the Syntactic Context model tests the hypothesis that being embedded in a local syntactic context has a regular and linear effect on all words regardless of their meaning. It groups words based on the sheer amount of local context preceding them, represented simply as the number of words present in the target sentence of each trial: one for Null, two for Minimal, and six for Embedded. Importantly, it does not include the discourse sentence in its definition of local context. Dissimilarity between the trials was computed by evaluating the Euclidean distance between the number of words for each context.
Finally, the Wordform model tests the hypothesis that items in the contextualized semantic space are grouped into wordform clusters such that words that share the same form have more similar representations regardless of contextualized meaning. As such, the same wordforms (datesn vs datesv) were given a dissimilarity of 0 when compared with each other and a dissimilarity of 1 when compared with all other lexical items. All three of the aforementioned models encode representational aspects of the stimuli.
The Relatedness model is based on the polysemy advantage literature (Klepousniotou and Baum, 2007; for details about the other ambiguity models, see Extended Data Table 6-1). It tests the hypothesis that words with related senses (unambiguous words and polysemes) are grouped together regardless of individual word meaning or context. The model assumes that there are shared underlying representations between different related senses of a word. All trials with polysemous or unambiguous items were given a dissimilarity of 0 when compared with each other and a dissimilarity of 1 when compared with homonyms.
Vector-based models and associated hypotheses
The one-hot encoding models described above did not consider either item-level meaning or item-specific effects of context. However, there is no unique way to model word-level meaning or the difference between pairs such as her dreams versus her nightmares. There is additionally no unique way to quantify the semantic distance between dreams across contexts such as she has vivid dreams versus she hates bad dreams. If we assume that the brain has context-invariant representations that correspond to a single word's meaning, then context can be seen as a matrix applied to this distributed pattern of activity. Mathematically estimating this matrix is beyond the scope of this study; however, contextualized vector embeddings provide a possible estimate of it. In this work, we chose to model both item-level meaning and the effect of context through contextualized word embeddings extracted from a large language model.
As previously mentioned, the target words in this experiment, through the syntactic and discourse context manipulations, can be seen as representing a highly controlled semantic space. Unlike many approaches where word representations are averaged across contexts, each word is not only disambiguated but each sense representation is unique. Four multivariate RDMs implemented these differences and experimented with different representational hypotheses about the properties of the contextualized semantic space.
The Within-Context-Stem model tests the hypothesis that form-related identity is restricted by words’ local syntactic context. It effectively tests a syntactic context-specific version of the Wordform model. As such the model focused on the semantic distance within a stem family as it only computed the cosine distance between the same stem in the same context, the dissimilarity with all other contexts and target words was 1.
The Stem-only model tests the same hypothesis as the Wordform model, but in a multivariate way. Here, contextualized words that share the same stem form clusters of representations in the contextualized semantic space. The model biased the representational space by computing the dissimilarity of the same stem across contexts as the cosine distance between vector embeddings for each trial containing the stem. The dissimilarity with all other words in the session was 1.
The All-Embeddings model tests the hypothesis that akin to an instance model of semantic representation, each contextualized instance of a word has its own distinct representation. The model encoded the dissimilarity between the target words of each trial as the cosine distance between their contextualized embeddings.
The Sense Identity model is a multivariate version of the Sense model (Extended Data Table 6-1) and tests the hypothesis that the same sense across contexts has a shared underlying representation such that only shared senses are close in semantic space. The model represented the distance between target words sharing the same sense (meaning they shared the same stem and were of the same syntactic category) as the cosine distance between their contextualized embeddings. The dissimilarity when comparing with all other target words was 1.
Extracting the contextualized embeddings
We extracted contextualized embeddings from layers 6–8 of RoBERTa (Liu et al., 2019) using the Hugging Face library in Python. We chose the middle layers as these have been found to best predict brain activity (Toneva and Wehbe, 2019; Caucheteux and King, 2022; Lamprou et al., 2023). To extract the vector embeddings for each target word, first the discourse sentence and the target stimulus were concatenated with punctuation inserted to separate the two sentences. Then the raw transcript was tokenized using the RoBERTa pretrained tokenizer. The representations from the middle layers were averaged to obtain one vector representation for each of the words in the input. The final embedding was the 768-dimension vector obtained for the target word in each trial. The embedding for each trial was obtained separately from the others. Sample input for a minimal context has been provided below.
Input for noun sense Ann does psychotherapy. Her dreams.
Input for verb sense Veena went to bed. She dreams.
Results
Behavioral results
Participants had a mean accuracy of 0.953 (SD = 0.004) across both sessions with 95% accuracy in Session 1 and 94% accuracy in Session 2. As the sole aim of the behavioral task was to maintain participant attention, no further analyses were run.
Univariate MEG results
Univariate spatiotemporal analyses covering the whole brain tested the impact of our three stimulus factors Ambiguity Type, Syntactic Category, and Syntactic Context across both sessions (Fig. 5). Although the random factor of Session had widespread bilateral effects at 45–515 ms (p =0.0001) in the left hemisphere and at 0–530 ms (p = 0.0001) in the right hemisphere, Session did not interact with the fixed factors.
Univariate results for fixed effects. Significant clusters were determined through a nonparametric spatiotemporal permutation test (10,000 permutations) on F values from a mixed-measures ANOVA (whole brain, 0–650 ms) with Ambiguity Type, Syntactic Category, and Syntactic Context as fixed factors and Session and Subject as random factors. Each cluster is color-coded, and the colors correspond to the colored rectangles on the time course in between the right and left hemisphere Syntactic context effects. Barplots show further pairwise testing on the mean activity of each condition within the cluster. The clusters are in chronological order.
For Syntactic Context, we found seven significant spatiotemporal clusters in the left hemisphere and five in the right hemisphere (Fig. 5). All clusters showed a significant difference between the Null context and both the Minimal and Embedded contexts with either a significant increase or decrease in activity in the Null condition relative to the other two. Clusters in the left hemisphere included the inferior temporal gyrus, visual associative areas, and the angular and supramarginal gyri. The cluster in inferior parietal regions that included the angular gyrus (250–305 ms, p = 0.0033) showed a three-way distinction between the contexts with a decrease in activity reversely proportional to the number of words in the trial: activity decreased as the size of the local context increased. Clusters in the right hemisphere included the anterior cingulate cortex, the superior and middle temporal gyri, and visual associative areas.
No significant clusters were observed either for Ambiguity Type or Syntactic Category.
Multivariate MEG results
We designed two types of RSA models representing properties of our target stimuli either through simple one-hot encoding or more enriched vector representations. To establish where and when the models were significantly correlated with neural activity, a one-sample t test against a mean of 0 was performed. Cluster-based permutation (10,000 permutations) was used to establish a significant difference between the model correlations and 0. Significant results here mean that the models significantly correlated with neural activity not only across participants but also within participants across sessions. Figure 6B presents a summary of the spatial distributions of the significant clusters for the RDM models that were tested and Figure 7 the detailed time courses. Because of the interrelationship between the hypotheses our RDMs tested, our models were correlated with each other to varying degrees (Fig. 8A). We have interpreted our results considering these correlations as mentioned in the subsection “Correlation between significant models” (Fig. 8B).
Summary for selected Representational Dissimilarity Models for three target words in the following order: dreams, flies, dates. Additional models are explained in Extended Data Table 6-1. A, Visual representation of RDMs with explanations of how they are calculated. B, Significant clusters showing the mean spatial distributions of model correlations with MEG activity. Cortical annotations of the Desikan–Killiany atlas are in white. Significance was determined through a nonparametric spatiotemporal permutation test (10,000 permutations) on a map of t statistics resulting from a one-sample t test against a mean of 0. A replication of the Syntactic Context results controlling for discourse length is presented in Extended Data Figure 6-2.
Table 6-1
Explanations of additional Representational Dissimilarity Models. Download Table 5-1, DOCX file.
Figure 6-2
Significant clusters replicating the mean spatial distributions of the partial correlation with the Syntactic Context controlling for discourse length with MEG activity. Cortical annotations of the Desikan-Killiany atlas are in white. Significance was determined through a nonparametric spatiotemporal permutation test (10,000 permutations) on a map of t-statistics resulting from a one-sample t-test against a mean of 0. Download Figure 6-2, TIF file.
Spatiotemporal progression of significant RSA clusters. Significance of each cluster is presented below each model. The cortical annotations of the Desikan–Killiany atlas are in white. A color-coded time course of the temporal evolution of each cluster is located at the bottom of the figure.
A, Spearman correlations between RSA models. B, Significant Spearman correlations between RSA models (r > 0.4). Spearman correlations were thresholded based on if the correlation was significant (p < 0.05) and if it was high (r > 0.4). Models that have significant RSA results are underlined in bold.
Since the significance of each model was evaluated independently, we used FDR correction to correct the p values of our significant clusters for the number of models (n = 13) that we tested. Only corrected p values have been reported below; one model (Wordform) did not survive correction.
One-hot models
The results of the Syntactic Category model were the most widespread both spatially and temporally with a left hemisphere cluster from 144–548 ms (pFDR = 0.0268) and a right hemisphere cluster from 20 to 398 ms (pFDR = 0.0268). Figure 7 depicts the spatiotemporal progression of all significant clusters. The correlation between the Syntactic Category model and neural activity begins at 20 ms post-word onset in the right superior parietal lobe and somatosensory cortex. It then spreads frontally and engages frontoparietal regions (including middle frontal and inferior frontal regions such as BA 44 and 45) as well as parts of the temporal pole and superior temporal gyrus from 60–185 ms. The correlation is observed in a sustained fashion in superior frontal regions, cingulate cortex, and inferior temporal regions for the duration of the entire cluster. The left hemisphere cluster began at 144 ms in the occipital lobe before becoming more widespread starting at 200–300 ms and progressing to the somatosensory cortex, inferior and middle frontal gyri, supramarginal gyrus, and the anterior temporal lobe. The cluster ends with activity in the temporal lobe and inferior frontal gyrus at 400–480 ms, as well as activity in visual associative regions, the angular gyrus, and the supramarginal gyrus from 480 ms onward.
The Syntactic Context model also had clusters bilaterally at 20–244 ms in the left hemisphere (pFDR = 0.0388) and at 20–258 ms in the right hemisphere (pFDR = 0.0134). The model significantly correlated with visual and visual associative regions in the occipital and inferior parietal lobes (precuneus, V1, angular gyrus) in both hemispheres as well as the posterior cingulate cortex. Unlike the left, the right hemisphere cluster also included posterior and inferior temporal regions. We additionally ran a partial correlation between the Syntactic Context model and a model representing the length of the discourse sentence in number of words to assure that this effect was largely driven by the number of words in the local syntactic context. While our Syntactic Context results replicated (uncorrected pRH = 0.0029, pLH = 0.0165; Extended Data Fig. 6-2), the discourse model was not significant (p > 0.9), suggesting that the length of the discourse sentences did not affect our Syntactic Context results.
The Wordform model correlated with activity in the same regions as the Syntactic Context model; however, the extent of the cluster as well as the strength of the correlation was smaller. The model only correlated with activity in the left hemisphere from 86 to 310 ms (pFDR = 0.0581), specifically in lateral occipital and inferior parietal regions. This result was marginally significant after correction for multiple comparisons.
The Relatedness model showed no significant clusters.
Vector-based models
The Within-Context-Stem model tested form-related identity restricted by local syntactic context and the Stem-Only model tested form-related identity across contexts. Despite the similarity between both models, only the Within-Context-Stem model yielded a significant cluster, showing a correlation with neural activity in the left hemisphere (pFDR = 0.0026) at 82–352 ms. The cluster can be broken down into four main correlation peaks. The first peak was at ∼114–140 ms in the primary visual cortex (BA 17). The cluster then spread to medial occipital areas until 190 ms. From 190–230 ms, we see a spread from the precalcarine fissure to superior parietal and superior lateral occipital regions. After that, from 280 ms onward, the cluster was mostly located in superior parietal regions.
The All-Embeddings model tested the hypothesis that contextualized words have their own discrete representations. The model was correlated with a widespread albeit sparse network of activity for a large portion of the epoch: from 280 to 632 ms (pFDR = 0.0134). All significant correlations were left lateral. Visually breaking down the progression of the cluster, the cluster began in the pars orbitalis and supramarginal gyrus at 280 ms. The first correlation peak was in the angular gyrus from 316 to 330 ms. The cluster then spread to somatosensory and motor cortices and more specifically to Brodmann areas 2 and 4 as well as BA 44 from 450 to 488 ms. Significant correlation then spread to the middle temporal lobe and quickly passed to the ventromedial prefrontal cortex from 520–540 ms. The last peak was at 540–580 ms in BAs 37, 39, and 20, areas associated with visual and word processing.
No significant clusters were found for the Sense Identity model, which represented context as having a similar effect on the representations of specific senses of the ambiguous words (such as the verb sense of fly).
Degree of correlation between significant models
We ran a pairwise Spearman correlation between our RDMs, filtered our results based on which correlations were significant, and set a correlation threshold of 0.4 in order to identify which models were highly correlated with each other (Fig. 8B) and then exclude them. As we have illustrated in Figure 1B, the hypotheses we tested were not independent. As such, we expected there to be significant correlations between the models testing them. We also recognize that by excluding models that are highly correlated, we risk ruling out the right organizational hypothesis. Out of the main models we tested (Fig. 8B, bold and underlined), the Within-Context-Stem model, showed a substantial correlation (r = 0.5180) with the Wordform model and with the Stem-Only model (r = 0.5210). This is to be expected given that both the Within-Context-Stem and Stem-Only models are multivariate versions of the Wordform model and test very similar hypotheses. Therefore, we might not be able to distinguish between all three models in our data. Looking at our results, we can see that there is virtually no difference between the spatial distribution or source time courses of either the Within-Context-Stem or Wordform model. Additionally, the Sense Identity model was highly correlated with the Wordform (r = 0.6701) and Stem-Only (r = 0.6720) models. Our most straightforward results come from the All-Embeddings and Syntactic Category models. While correlated to some degree with the other matrices, these models did not exceed our cutoff for high correlations. Because of this, our discussion of the results will largely focus on the All-Embeddings and Syntactic Category models.
Context-specific word representations
We further assessed whether our All-Embeddings result was purely due to the use of vector embeddings to represent our stimuli or if there was an additional contribution of the model's context-sensitive word representations, in contrast to a non-context-sensitive model like the Wordform model. To do this, we first ran a follow-up control analysis that used contextless Word2Vec embeddings. We correlated a model that represented the dissimilarity between trials as the cosine similarity between Word2Vec embeddings of each target word with the neural activity at each trial. The Word2Vec model was relatively uncorrelated with all our previously run RSA models (none of the correlations exceeded our previously defined threshold) and had no significant correlations with any of them.
The Word2Vec model was correlated with neural activity from 326 to 582 ms (uncorrected p = 0.0082); however, both the laterality and directionality of the results differed from the rest of our RDMs. Specifically, the cluster was purely right-lateralized (in contrast to our previous left-lateralized and bilateral results) across frontal, parietal, and temporal regions and these regions were negatively correlated with the model for the extent of the cluster duration. As such, the Word2Vec model's contribution is clearly qualitatively different from our other results: the more similar the Word2Vec embeddings were between trials (including trials that had the same target word and thus the same embedding), the more dissimilar their neural representation both within participants across sessions and across participants. Thus, the mere use of vector embeddings was unlikely to drive the success of our All-Embeddings model.
We then probed the specific contribution of our significant embedding-based models (All-Embeddings and Within-Context-Stem), the Word2Vec model, and our one-hot Wordform model to the overall variance explained across these four models. We focused on these models as they represent both context-sensitive and context-insensitive individual word representations (as opposed to the Syntactic Category model which does not make any hypotheses about word representation). We squared the correlation values of each model and then averaged across participants in order to obtain the amount of variance explained (ρ2) from 0 to 650 ms.
As seen in Figure 9, the All-Embeddings model explained by far the most variance across the entire time window. On average, this model explained 57% of the total variance, the Word2Vec model explained ∼23% of the total variance, and the Within-Context-Stem and Wordform models both explained ∼10%.
Time course of the mean variance explained by significant context-sensitive and contextless RSA models across participants. All models were significant and made hypotheses about the neural representation of individual words. The total variance explained across the compared models is plotted in gray.
While this established the All-Embeddings model as being the winning hypothesis of how the semantic space is organized, we wanted to see whether this effect varied as a function of our experimental conditions. Since our univariate analysis only showed a significant effect of Syntactic Context, we ran a secondary analysis where we first grouped our trials by Syntactic Context and then correlated the All-Embeddings model with each group (Null, Minimal, Embedded) using the same spatiotemporal RSA searchlight we mentioned above. For this analysis, we focused on the left hemisphere during a smaller time window of 250–640 ms, as this was the region and time window in which the All-Embeddings model was significant. We then ran a mixed-measures one-way ANOVA across participants, using the same searchlight and permutation testing as outlined in our univariate methods section, to establish the effect of Syntactic Context and the random effects of Session and Subject.
None of our factors were significant.
Overall, our results support the following hypotheses: words from the same syntactic category share an underlying representation; stems that share wordforms share an underlying representation and this shared representation is sensitive to the size of the local context surrounding it. Furthermore, context can be modeled as a linear transformation of words’ representations regardless of their individual meaning. Finally, we had strong evidence for an instance-like model of contextualized meaning where once a word is in context, it receives its own singular neural representation. We did not find evidence that we could cluster words based on their ambiguity type or a shared sense.
Discussion
To investigate the neural representation of words in context, we correlated RSA models with MEG responses to noun/verb ambiguous items in varied contexts. Our RSA models tested a wide range of hypotheses about the organization of a contextualized semantic space; however, many of our hypotheses were not supported. Crucially, we did find evidence for an instance-like organization that is sensitive to syntactic category and not specific to a given context. Our model, which represented a fully contextualized space in which each trial had a distinct neural pattern, accounted for the most amount of variance among all other embeddings-based models. This model correlated first with neural signals in frontoparietal regions starting at ∼300 ms, then in temporal and somatosensory areas at ∼500 ms, and finally in posterior and inferior temporal cortices. We did not find significant results for our categorical ambiguity factor (homonym vs polyseme vs unambiguous); however, a model representing syntactic category (noun vs verb) across all contexts showed extensive bilateral effects. Finally, occipital and occipitoparietal regions correlated with models representing syntactic context and wordforms, likely reflecting syntactically based form predictions for upcoming material.
Fully contextualized lexical representations dominated over categorical semantic distinctions
Prior studies employing multivariate analyses of semantic processing with audiobook stimuli have found extensive bilateral networks. Huth et al. (2016) identified semantic maps across most of the cortical surface, Toneva and Wehbe (2019) observed a network of fronto-temporo-parietal regions predicted by an embedding-trained encoding model, and Jain and Huth (2018) similarly predicted fMRI responses from LSTM embeddings across broad cortical areas.
The network identified by our All-Embeddings model is sparser and strictly left lateralized while specifically encoding a representational space, as opposed to using encoding/decoding techniques to measure sensitivity to item-level representations. This suggests that item-level semantic processing in non-narrative contexts may be less distributed. This conforms with prior work showing that for the encoding of semantic features, narratives engage a higher number of semantically selective voxels compared with sentences and single words (Deniz et al., 2023). Finally, the left lateralization of our results could be tied to our use of MEG and not fMRI like most prior literature. For example, Leonardelli and Fairhall (2022) evaluated joint MEG-fMRI representations of the semantic system and only found a small cluster of regions that were shared across both techniques. Further, supraword embeddings have been found to not predict MEG activity even though they predict MRI activity (Toneva et al., 2022).
While our lateralization differs from naturalistic studies mentioned above, it aligns with established neurolinguistic models of sentence processing and neuropsychological findings. A recent meta-analysis by Jackson (2021) identifies a largely left-dominant network involved in semantic cognition and control consistent with our findings. In our All-Embeddings cluster, the angular gyrus's prevalence at 300 and 530 ms aligns with its proposed role in automatic semantic retrieval (Davey et al., 2015). Given this region's connectivity and structural proximity with visual areas, it may have a key role in interfacing between the outputs of visual/form-based prediction and more complex contextually informed semantic processing.
Following the angular gyrus, the recruitment of somatosensory and motor cortices, along with BA 44, by our All-Embeddings model resonates with the embodied/grounded cognition hypothesis (Barsalou, 2008; Jain and Huth, 2018; Zwaan, 2014), proposing that cognition is grounded in perceptual processing, with language semantics drawing upon sensorimotor representations. Sensorimotor areas were found in a left-dominant network sensitive to a statistical model of abstraction (Hultén et al., 2021).
The categorical processing of lexical ambiguity has been widely studied, with results showing significant differences in left frontoparietal activation between unambiguous and ambiguous words and higher left inferior frontal gyrus (LIFG) activation for homonyms than polysemes (Lukic et al., 2019). Sensitivity to ambiguity, syntactic category, and their interaction in a minimal disambiguating context has also been evidenced in the LIFG and left posterior middle temporal gyrus (Mollo et al., 2018). It has also been shown that the activation difference between ambiguous words is modulated by the distance between their meanings (Grindrod et al., 2014). Here, however, we did not observe effects of the homonym–polyseme–unambiguous contrast either in our univariate or multivariate analyses. What could underlie this difference?
In fact, recent approaches treating ambiguity as a continuous variable have found no categorical differences between ambiguity types (Beekhuizen et al., 2021), viewing it instead as a continuum of sense complexity. Theoretical models have also conceptualized lexical ambiguity as a graded, not categorical, phenomenon (Nerlich and Clarke, 2003). Task demands can influence semantic representations (Patterson et al., 2007) thus affecting the processing of ambiguous items. Aligned with the literature emphasizing the continuous nature of ambiguity, our results suggest that categorical semantic differences may stem more from task effects on processing than from inherent representational differences. A crucial difference between the current design and most prior work is that our ambiguous items were disambiguated well before their occurrence, mimicking natural language in which the broader discourse context usually aligns with one meaning of an ambiguous item (e.g., in a baseball context, mentioning the animal bat is unlikely). Therefore, our study indicates that in a multivariate semantic space, robust representation arises from individualized context-specific representations rather than categorical distinctions among ambiguity classes of lexical items. Crucially, this instance-like organization of contextualized semantic space seems to not be specific to a context of a given size.
Multivariate evidence of widely distributed representations of syntactic category
Of all our models, the Syntactic Category model correlated with neural activity most extensively in both time and space, yet we found no univariate effects of syntactic category. The existence of functional and representational differences between different syntactic categories has, in fact, been highly contested in imaging and neuropsychological data. Although nouns and verbs have dissociated in aphasic behavior (Crepaldi et al., 2006), atrophy location (Lukic et al., 2021), and activation patterns in the healthy brain (Tyler et al., 2004; see Vigliocco et al., 2011 for further review; Hauptman et al., 2021), a meta-analysis of 36 neuroimaging studies found no evidence of functional dissociations between noun and verb processing (Crepaldi et al., 2013).
Our multivariate analysis suggests that syntactic category representation in rich contexts is broadly distributed. We observed correlation peaks in frontal and anterior temporal regions at 200–300 ms in the left hemisphere and at 60–150 ms in the right, in addition to frontal, parietal, and anterior temporal engagement. These regions overlap with aphasia results showing noun/verb processing dissociations (Lukic et al., 2021). However, we did not find any correlation between the Syntactic Category model and activity in left posterior temporal cortex, which has been hypothesized as a central site for syntactic processing (Flick and Pylkkänen, 2020; Matchin and Hickok, 2020; Matchin and Wood, 2020; Matar et al., 2021). Thus, our results suggest that sensitivity to syntactic category is not manifested—at least strongly enough to be observed here—in the region that shows the strongest signals for the combination of words into syntactic phrases but is rather widely distributed across language cortex.
Our syntactic category results began early (20 ms in the RH, ∼140 ms in the LH), likely indicating preactivation of syntactic category representations due to disambiguating contexts. These categorical representations preceded and were more widespread than item-specific information, as evidenced by the All-Embeddings model. Earlier and longer lasting activation of categorical information, compared with item-specific representations, is also evidenced by the earlier activation of super-ordinate semantic categories than individual lexical items (Dirani and Pylkkänen, 2023). In sum, in a syntactically disambiguated context, the representation of the anticipated syntactic category was a highly distributed brain state, active earlier than the representation of more specific lexical information.
Finally, what effect might have our semantic relatedness task, which followed 30% of the trials, have had on the neural signals elicited by the target words? It is possible that this task, which required participants to identify words that were semantically related to the preceding trial sentences, may have elicited specific focus on certain types of meanings associated with noun versus verbs, potentially confounding our syntactic category manipulation. However, our task was unpredictable in its occurrence and a third of our targets were in the “unambiguous” category (i.e., their noun and verb meanings were extremely close to each other), making it is quite unlikely that our category effect would have been meaningfully impacted by some type of strategic semantic anticipation. Nevertheless, the possibility that some type of task-induced focus on specific meanings partially amplified our syntactic category effects cannot be fully excluded.
Possible role of the dorsal visual stream in context-driven form predictions
Our findings not only unveiled correlations with syntactic category and contextualized meaning, but also correlations with two other models: the Syntactic Context model, representing the number of preceding words, and the Within-Context-Stem model, representing wordform identity in specific local syntactic contexts.
Our multivariate Syntactic Context results were localized in occipital and adjacent temporoparietal cortex (Fig. 6). This contrasted with the univariate analysis of Syntactic Context, showing a distributed bilateral effect across visual and visual associative areas, the left inferior temporal gyrus and left angular gyrus. The more focused results of the multivariate analysis suggest that the univariate results may have been driven by global changes in activation levels across the brain that are not specific to any particular pattern or configuration of activity (see Wang and Kuperberg, 2023 for further discussion of differences between univariate and multivariate analyses).
The Within-Context-Stem model correlated with occipital and parietal activity, similarly to the Syntactic Context model. These areas belong to the dorsal stream of visual processing, thought to process bottom-up “where”-based information in preparation for taking actions (Milner and Goodale, 2008). They are connected with prefrontal and posterior temporal areas through the inferior fronto-occipital fasciculus (IFOF), which has been linked with semantic processing (Almairac et al., 2015) and the ventral processing stream in language (Friederici and Gierhan, 2013). Thus our results are observed in bottom-up visual processing regions that connect to associative areas (Richardson et al., 2011), suggesting early contextual and/or structural information may bias bottom-up visual representation processing.
To conclude, we delved into the neural representation of words in context, discovering that of the many different syntactic and semantic properties of our target words, their neural representations were predominantly a reflection of syntactic category and contextually shaped semantics. The left-laterality of our semantic results contrasts with prior studies that have used narrative stimuli and have found an extensive bilateral representation of the semantic space. This suggests that the contribution of the right hemisphere may be specifically linked to narrative level processing.
Footnotes
This work was supported by grant G1001 to from the NYUAD Institute New York University Abu Dhabi (L.P.). We thank Alec Marantz for reviewing earlier versions of the manuscript as well as helpful comments and suggestions throughout the writing and editing process. This work was supported in part through the NYU IT High Performance Computing resources, services, and staff expertise.
The authors declare no competing financial interests.
- Correspondence should be addressed to Aline-Priscillia Messi at Alinepriscillia.messi{at}nyu.edu.