Abstract
Different people listening to the same story may converge upon a largely shared interpretation while still developing idiosyncratic experiences atop that shared foundation. What linguistic properties support this individualized experience of natural language? Here, we investigate how the “concrete–abstract” axis—the extent to which a word is grounded in sensory experience—relates to within- and across-subject variability in the neural representations of language. Leveraging a dataset of human participants of both sexes who each listened to four auditory stories while undergoing functional magnetic resonance imaging, we demonstrate that neural representations of “concreteness” are both reliable across stories and relatively unique to individuals, while neural representations of “abstractness” are variable both within individuals and across the population. Using natural language processing tools, we show that concrete words exhibit similar neural representations despite spanning larger distances within a high-dimensional semantic space, which potentially reflects an underlying representational signature of sensory experience—namely, imageability—shared by concrete words but absent from abstract words. Our findings situate the concrete–abstract axis as a core dimension that supports both shared and individualized representations of natural language.
Significance Statement
The meaning of spoken language is often ambiguous. As a result, people may form different interpretations despite being presented with the same information. What properties of language does the brain leverage to form this diverse, individual experience? Analyses of functional magnetic resonance imaging data demonstrated that “concreteness,” the extent to which language is related to sensory experience, evoked reliable neural patterns that were unique to individual subjects and allowed us to identify individuals solely based on their neural data. Application of machine learning methods showed that sets of concrete concepts, but not abstract concepts, exhibit stable neural patterns, potentially due to a sensory signature: imageability. Overall, this study characterizes concreteness as a central property supporting the individualized experience of real-world language.
Introduction
The success of language as a means of communication relies on a shared understanding of the meanings of words as links to mental concepts (Elman, 2004; Stolk et al., 2016; Thompson et al., 2020). While people generally converge in how they understand and represent language (Fedorenko and Thompson-Schill, 2014; Malik-Moraleda et al., 2022), the conceptual associations evoked by a given word can also be highly individualized and informed by experience (Elman, 2009; Yee and Thompson-Schill, 2016). What linguistic properties scaffold common conceptual knowledge while also providing the foundation for idiosyncratic representations?
A large body of empirical and theoretical work has suggested that human knowledge is organized along an axis that moves from concrete, sensory-based representations to abstract, language-derived representations (Paivio, 1991; Bedny and Caramazza, 2011; Borghi et al., 2017; Bi, 2021). Within this framework, “concrete” words are experienced directly through senses or actions (e.g., dog, table), while “abstract” words have meanings dependent on language (e.g., idea, plan). Together, concreteness and abstractness represent ends of a continuum of “sensory grounding,” where a given word can be placed along this axis based on the degree to which it can be experienced directly through one's senses. Accordingly, each word is assumed to share this property with other words at a similar position along the axis, irrespective of their meanings. Theories of “grounded cognition” (Barsalou, 2008; Binder and Desai, 2011) propose that concrete words benefit from being jointly represented across both sensory and linguistic domains and, as a result, exhibit more stable representations than abstract words. Recent findings from human neuroimaging provide support for these theories, demonstrating close topographical and functional correspondence between representations of sensory and linguistic information (Huth et al., 2016; Deniz et al., 2019; Popham et al., 2021). In turn, the concreteness of words benefits behavior: concrete words are processed faster (Paivio and Begg, 1971; Kroll and Merves, 1986; Schwanenflugel et al., 1988; Roxbury et al., 2014), are more imageable (Altarriba et al., 1999; Tuckute et al., 2018), and are more easily recalled than abstract words (Gorman, 1961; Walker and Hulme, 1999; M. Hamilton and Rajaram, 2001; Romani et al., 2008; Aka et al., 2021).
While studies often highlight population-level commonalities in how people process and represent concrete versus abstract words, researchers have also identified differences in how individuals organize and represent concrete versus abstract language (X. Wang and Bi, 2021). Specifically, concrete words are more similar both across (X. Wang and Bi, 2021) and within subjects (Musz and Thompson-Schill, 2015) in both their conceptual organization (as measured behaviorally with a semantic distance task) and neural representations. However, the extent to which representations of the concrete–abstract axis itself, rather than individual words along that axis, are stable across experiences and unique to each person remains unclear. On one hand, representations of individual concrete words may be more stable due to each word's unique sensory grounding that stabilizes its own representation and distinguishes it from other words. On the other hand, the property of concreteness may provide a shared structure that supports the representation of each individual word, elevating the similarity among all concrete words as a class despite differences in sensory grounding and word meaning. Together, this complicates the interpretation of previous findings: across subjects, the low similarity of abstract word representations may result from not only variability in individual word representations but also variability in representing the property of “abstractness” more generally (Fig. 1C).
One possibility is that representations of abstractness might be highly individualized—in other words, both unique to the individual and shared across distinct abstract words within that individual. Such individual-specific representations would be evidenced by high within-subject similarity across exposures to different abstract words, despite low across-subject similarity. Another possibility is that low similarity results from unstable representations of abstractness. In this case, representations would show low similarity both within and across subjects that could result from high variability in abstractness across contexts. Yet, without evaluating the reliability of representations within subjects and across words, the low similarity of abstract word representations across subjects is difficult to interpret.
Here, we aimed to understand how the concrete–abstract axis provides a foundation for individual differences in the neural representation of language. We investigated this question within a large dataset of subjects who listened to four naturalistic auditory stories during functional magnetic resonance imaging (fMRI). Unlike many previous investigations that used isolated single-word or otherwise simplified paradigms (Friederici et al., 2000; West and Holcomb, 2000; Binder et al., 2005; Roxbury et al., 2014; Musz and Thompson-Schill, 2015; X. Wang and Bi, 2021; Fernandino et al., 2022; Vignali et al., 2023), these data allowed us to characterize neural representations of the concrete–abstract axis within contextualized speech, as language is used in everyday life (L. S. Hamilton and Huth, 2020). We tested not only the extent to which neural representations of concreteness and abstractness are consistent across subjects but also the degree to which these representations are reliable within and unique to a given subject across stories. Then, by leveraging tools from natural language processing (NLP), we relate our findings on concreteness and abstractness to prior work on word meanings by taking sets of similar concepts as a proxy for repeated words across stories. Specifically, we examined how the organization of words within a high-dimensional semantic space relates to differential reliability of how concreteness versus abstractness is represented in the human brain.
Materials and Methods
Participants
We used a subset of data from the publicly available Narratives dataset (Nastase et al., 2021). Specifically, we used data from 45 subjects (N = 33 female; mean age, 23.3 ± 7.4 years) who each listened to four auditory stories (“Running from the Bronx”, 8:56 min; “Pie Man (PNI),” 6:40 min; “I Knew You Were Black,” 13:20 min; “The Man Who Forgot Ray Bradbury,” 13:57 min) during fMRI scans at the Princeton Neuroscience Institute (Fig. 1A). All stories were collected within the same testing session, and each story was collected within a separate run. Across participants, the order of stories was pseudorandomized such that “Bronx” and “Pie Man (PNI)” were always presented in the first half of the session, while “Black” and “Forgot” were presented in the second half of the session. The order of the stories presented within each half of the session was then randomized, resulting in four possible presentation orders across participants. All participants completed written informed consent, were screened for MRI safety, and reported fluency in English, having normal hearing and no history of neurological disorders. The study was approved by the Princeton University Institutional Review Board.
MRI data acquisition and preprocessing
Functional and anatomical images were collected on a 3T Siemens Magnetom Prisma with a 64-channel head coil. Whole-brain images were acquired (48 slices per volume, 2.5 mm isotropic resolution) in an interleaved fashion using a gradient-echo EPI (repetition time, 1.5 s; echo time, 31 ms; flip angle, 67°) with a multiband acceleration factor of 3 and no in-plane acceleration. A total of 1,717 volumes were collected for each participant across four separate scan runs, where a single story was presented within each run.
We used preprocessed data provided by Nastase et al. (2021). In brief, data were preprocessed using fmriprep (Esteban et al., 2019) including coregistration, slice-time correction, and nonlinear alignment to the MNI152 template brain. Time-series were detrended with regressors for motion, white matter, and cerebrospinal fluid and smoothed with a 6 mm FWHM Gaussian kernel. For more information about data acquisition and preprocessing, please refer to Nastase et al. (2021).
As an additional preprocessing step, we performed functional alignment on these data using a shared response model (SRM; Chen et al., 2015) as implemented in BrainIAK (Kumar et al., 2021). Previous work has demonstrated better functional alignment by fitting a SRM within each parcel (Bazeille et al., 2021). Accordingly, we restricted our analyses to the neocortex and used the 200-parcel Schaefer parcellation (Schaefer et al., 2018) and removed any parcel without at least 75% coverage across all participants and stories (total parcels removed, 9/200 or 4.5%). Within each remaining parcel, we then fit a model to capture reliable responses to all stories across participants in a lower-dimensional feature space (number of features, 50). We then inverted the parcel-wise models to reconstruct the individual voxel-wise time courses for each participant and each story (Yates et al., 2021). This procedure served as an additional denoising step to improve the consistency of stimulus-driven spatiotemporal patterns across participants. All analyses were conducted in volume space and projected to surface space (fsaverage) using nilearn (Abraham et al., 2014) for visualization purposes only.
Stimulus preprocessing
Each story was originally transcribed and aligned to the audio file using the Gentle forced-alignment algorithm by the authors of Nastase et al. (2021). We applied additional preprocessing to the transcripts using the Natural Language Toolkit (Bird et al., 2009). First, we obtained parts of speech and word lemmas—the base form of a word (e.g., “go” is the lemma for “going,” “gone,” and “went”)—for each word and excluded stop-words (uninformative, common words) such as “the,” “a,” and “is.”
To address our hypotheses, we leveraged an existing corpus of human ratings of word concreteness (Brysbaert et al., 2014). In this study, online participants rated a total of 40,000 English word lemmas on a five-point Likert scale from abstract (lower) to concrete (higher). Each word was rated by at least 25 participants. Participants were instructed to consider a word as more concrete if it refers to something that exists in reality and can be experienced directly through senses or actions and, in contrast, to consider a word as more abstract if its meaning depends on language and cannot be experienced directly through senses or actions. Henceforth, we use “concrete–abstract axis” to refer to this general linguistic dimension and “concreteness” as a word's specific position on this axis.
For each word in each story, we assigned a value of concreteness using the average human rating for that word's lemma if it was present in the concreteness corpus (Fig. 1B). In addition to our critical predictor (concreteness), we included three other linguistic properties as controls: frequency (Brysbaert and New, 2009; Brysbaert et al., 2019), a measure of how often a word occurs in language, and two affective properties, valence and arousal (Warriner et al., 2013). Word frequency was derived objectively by calculating the number of occurrences of a word per million words (51 million total words), while valence and arousal were derived from human ratings analogous to the concreteness ratings described above. Previous research investigating word frequency effects have demonstrated that less frequent words drive stronger neural responses within the language network (Fiebach et al., 2002; Schuster et al., 2016). A separate set of studies investigating affect have demonstrated that valence and arousal contribute to representations of language within areas related to emotion processing and memory (Kensinger and Schacter, 2006; Brooks et al., 2016). While the selected control properties are not a definitive list, including them as “competition” allows us to make inferences that are more specific to the concrete–abstract axis. Our analysis was then constrained to the set of words with a value for any of the four properties (i.e., the union), resulting in 97.7% of content words sampled on average across stories (2,449 words of the possible 2,500 content words). We were able to model the majority of these content words within each linguistic predictor (concreteness, 96.4%; frequency, 97.7%; valence, 83%; arousal, 83%). Importantly, collinearity between the critical regressor, concreteness, and other linguistic properties varied, showing a moderate relationship with word frequency and weak relationships with all other properties (average Pearson's r across stories, arousal, −0.10; frequency, −0.30; valence, −0.05).
fMRI analysis
Modeling representations of word properties
For each story and participant, we used a general linear model (GLM) to estimate BOLD responses for each linguistic property (concrete–abstract axis, frequency, valence, arousal), plus a low-level auditory feature regressor (loudness, the root mean square of the auditory waveform). We collectively refer to these linguistic and sensory properties as “word properties.”
To construct a continuous, amplitude-modulated regressor, each word property was assigned a value at each timepoint of the story time-series based on the word(s) spoken at that timepoint. We then modeled the BOLD signal as a function of these regressors using AFNI (Cox, 1996). The model yields a map of beta values that correspond to responses to each property, where higher and lower values indicate higher and lower values of a given linguistic property (e.g., higher, more concrete; lower, more abstract). As all word properties were included in the same model, the resulting beta values represent the BOLD response to a given property while controlling for all other properties.
Using the outputs from these models, we first examined group-level univariate responses to each word property using a linear mixed-effects model. At each voxel, the model predicts BOLD activity from the fixed effects of each property plus the random effects of subject and story. The model therefore yields a map of beta values that describes consistent neural responses to each property across stories and subjects. All voxel-wise results are shown following correction for multiple comparisons (qFDR < 0.05).
Evaluating the reliability of representations of the concrete–abstract axis and other word properties
To understand whether word properties elicit reliable representations during story listening (Fig. 1C), we examined the within- and across-subject multivariate pattern similarity of evoked responses for each property across stories. We first divided the cortex into 200 parcels using the Schaefer parcellation (Schaefer et al., 2018). Then, within each parcel, we correlated the multivoxel pattern of beta values between all pairs of participants, repeating this process for each unique pair of stories (six total pairs). Lastly, we averaged across all story-pair matrices to obtain a subject similarity matrix for each parcel (denoted as M within the following equations). We repeated this procedure for each property to understand the similarity of neural representations across stories both within and across subjects. See Figure 1D for a schematic of this analysis.
We evaluated two multivariate signatures of these neural representations (Fig. 1D). Our first method, reliability, assesses the similarity of a subject's representations to themselves across stories compared with the similarity of their representations to those of other subjects. Specifically, reliability is calculated as the difference between the similarity of a subject to themselves (within-subject similarity) and the average pairwise similarity of a subject to all other subjects (across-subject similarity) as follows:
For both reliability and identifiability analyses, the statistical significance was evaluated via permutation testing. Specifically, for each parcel, we permuted the rows of the subject similarity matrix and recalculated reliability and identifiability values. This process was repeated 10,000 times, and observed values were tested against this null distribution. Resulting p values for each signature were corrected for multiple comparisons across 200 parcels using the Benjamini–Hochberg method (qFDR < 0.05). To evaluate reliability and identifiability at a whole-brain level, for each signature, we used a linear mixed-effects model to predict reliability/identifiability from the fixed effect of word property while controlling for the random effect of parcel in both models and a random effect of subject within the reliability model. We tested for significant differences between word properties by conducting pairwise statistical tests between model fits to each property.
To understand what was driving observed reliability—i.e., high within-subject consistency, low across-subject similarity, or both—we compared within-subject similarity with across-subject similarity. Specifically, we calculated across-subject similarity in two ways: (1) in the same stories and (2) across different stories. For each word property, we used one-sample tests to assess the significance of similarity of representations for each form of similarity. Then, we used a linear mixed-effects model to evaluate whether within-subject similarity was higher than both forms of across-subject similarity. All tests were two-tailed, tested at alpha p < 0.05, and corrected for multiple comparisons using FDR correction.
Disentangling the reliability of representations of concreteness versus abstractness
We next aimed to understand whether concreteness and abstractness differentially contribute to the reliability of neural representations of the concrete–abstract axis. To this end, within each story, we limited our analysis to nouns (as verbs were more prevalent at the abstract end) and dichotomized the concrete–abstract axis by selecting the top 30% of concrete and top 30% of abstract words (Fig. 4A). Specifically, we asked if and where representations of concreteness are more reliable than representations of abstractness or vice versa.
We used a GLM to estimate separate BOLD response patterns for concreteness and abstractness (using regressors defined based on the top 30% of words at each end). Within this model, we specified concreteness and abstractness as event regressors, discarding the amplitude component and treating all words of a given property as contributing equally to the model of BOLD response. The regressors for concreteness and abstractness each contained a total of 187 words aggregated across stories, resulting in a total of 374 words modeled across stories (black, 94 words; bronx, 92 words; piemanpni, 68 words; forgot, 120 words). We also included two amplitude-modulated regressors, word frequency and loudness, to control for differences in low-level linguistic and sensory features. We then repeated our analysis of reliability and identifiability (described above) on the beta maps of concreteness and abstractness separately.
For each parcel, we contrasted the reliability of concreteness and abstractness within each subject by applying Fisher's z-transformation and taking the difference between the reliability scores (concrete minus abstract), limiting our analysis to parcels that showed significant reliability for either concreteness or abstractness. Then, within each parcel, we conducted paired t-tests to identify parcels that significantly differed in their reliability of concreteness and abstractness representations. All tests were two-tailed, tested at alpha p < 0.05, and corrected for multiple comparisons using FDR correction.
Evaluating the stability of representations of concrete versus abstract concept clusters
In light of the finding that representations of concreteness are more reliable than those of abstractness (compare Fig. 4B), we asked whether this higher reliability is driven by closer and more stable semantic relationships between words at the concrete end of the spectrum. To define semantic relationships between words, we used an NLP model (GloVe; Pennington et al., 2014) to embed each word in both the top 30% concrete and top 30% abstract word sets, aggregated across stories, within a high-dimensional semantic space (Fig. 5A). We then applied spectral clustering (Shi and Malik, 2000) over the concrete and abstract word embeddings to obtain clusters for each end of the spectrum (k = 3 each for the concrete and abstract ends, so six in total) composed of semantically similar words, which we refer to as “concept clusters.” While we selected k = 3 clusters because this value of k yielded the most balanced number of words in each cluster, similar results were obtained at both k = 2 and k = 4 clusters. These clusters grouped concrete and abstract words into sets of related concepts—such as a food-related concrete cluster containing the words “bread” and “cheese”—that were visually distinct when projected into a two-dimensional space using Uniform Manifold Approximation and Projection (UMAP; Fig. 5B; McInnes et al., 2020). Importantly, words within each concept cluster could come from within the same story or from different stories.
In addition to visualizing the qualitative organization of concept clusters, we also formally tested the semantic similarity of words in the same or in different clusters, within and between ends of the concrete–abstract axis. Importantly, because the clustering itself was done on semantic distances, we expected that distances will be lower between words in the same versus different clusters, but this analysis also let us quantify if and how semantic spread across clusters is greater at one end of the concrete–abstract axis than the other. Specifically, we calculated the cosine similarity between all pairs of words embedded within the semantic space. We then grouped these pairwise similarity values into the following categories: (1) pairs of words within the same cluster, (2) pairs of words in different clusters at the same end of the concrete–abstract axis (i.e., either concrete or abstract), and (3) pairs of words at different ends of the concrete–abstract axis, which were (by definition) in different clusters. To compare these groups of similarity values, we used a linear mixed-effects model to evaluate how the end of the property spectrum (concrete vs abstract), cluster membership (within vs between), and the interaction between these two features relate to the semantic similarity of cluster words while controlling for the random effect of word. To help interpret any resulting differences, we also conducted follow-up pairwise statistical tests. All tests were two-tailed, tested at alpha p < 0.05, and corrected for multiple comparisons using FDR correction.
Next, we used a GLM to estimate BOLD responses to words within each concept cluster and evaluated both within- and across-subject similarity of these neural concept–cluster representations across stories. Similar to our analysis of semantic space, we calculated (1) the similarity of neural representations of the same cluster across stories, (2) the similarity of neural representations of different clusters at the same end of the spectrum (e.g., concrete clusters to other concrete clusters), and (3) the similarity of neural representations between concrete clusters and abstract clusters. Crucially, all analyses of cluster similarity, both within and across subjects, are calculated as the similarity of clusters across stories; this allowed us to evaluate the stability and uniqueness of concept–cluster representations across distinct presentations and contexts.
Using two separate linear mixed-effects models, we examined how the end of the property spectrum (concrete vs abstract), cluster membership (within vs between), and specific cluster relationship (e.g., within concrete, between concrete, etc.) differentially contribute to whole-brain similarity of neural representations while controlling for random effects of subject and parcel. Our first model predicted similarity from the fixed effects of end of the property spectrum and cluster membership and evaluates their main effects as well as their interaction. Then, in a separate model, we predicted similarity from the fixed effect of specific cluster relationship, specifying each cluster relationship as a separate level of the fixed effect. Using this second model, we tested for significant differences between cluster relationships by conducting pairwise statistical tests. All tests were two-tailed, tested at alpha p < 0.05, and corrected for multiple comparisons using FDR correction.
Results
We aimed to understand how neural representations of the concrete–abstract axis vary within individuals and across the population during naturalistic story listening. Using a dataset of subjects (N = 45) that listened to four stories each, we replicated previous findings that univariate neural responses to the concrete–abstract axis show group-level consistency. Complementing this consistency, we also found idiosyncratic multivariate representations of this axis that were unique to individuals and stable across stories, allowing us to identify subjects with a high degree of accuracy. Furthermore, by placing words within a high-dimensional semantic space, we demonstrated that neural representations of concrete words are particularly stable and stereotyped and that this consistency primarily drives the reliability of the concrete–abstract axis, while representations of abstract words are more variable both within and across subjects.
Consistent group-level activations to the concrete–abstract axis
We first sought to replicate prior work demonstrating group-level consistency of univariate activity to the concrete–abstract axis. For each subject and story, we modeled brain activity as a function of the time-varying concreteness level of its content (as given by word-level norms provided by a separate set of human raters). Our model also included time-varying regressors for other linguistic properties—namely, frequency, valence, and arousal—plus loudness, a low-level sensory control.
All properties, both sensory and linguistic, demonstrated univariate neural responses that were consistent across both subjects and stories (Fig. 2; qFDR < 0.05). For example, as expected, loudness evoked responses in the bilateral primary auditory cortex. Critically, the concrete–abstract axis evoked neural responses across a wide swath of the cortex: more concrete words drove higher responses in regions including the bilateral angular gyrus, bilateral parahippocampal cortex, and bilateral inferior frontal gyrus, while more abstract words drove responses in regions such as bilateral superior temporal gyrus and bilateral anterior temporal lobe. These results align with previous research that has reported similar cortical regions engaged in processing concrete and abstract concepts (J. Wang et al., 2010; Montefinese, 2019). Importantly, all linguistic properties exhibited responses that agree with prior research: frequency modulation in the left inferior frontal gyrus (Schuster et al., 2016), valence in the right temporoparietal junction (Tamir et al., 2016), and arousal in the posterior cingulate (Maddock and Buonocore, 1997) and ventromedial prefrontal cortex (Kensinger and Schacter, 2006).
Representations of the concrete–abstract axis are reliable within individuals
Having shown that the concrete–abstract axis drives consistent univariate activity at the group level, we next investigated the stability of multivariate representations of this axis, as well other word properties, across stories. Representations were operationalized as multivoxel patterns of activity within each cortical parcel evoked by a given property in a given story. Specifically, we compared representations both within and across individuals, allowing us to understand the extent to which representations of these common linguistic dimensions are shared versus individualized.
We found that representations of all word properties except valence exhibited individual reliability across stories in at least some brain regions (Fig. 3A; n = 10,000 permutations, all qFDR < 0.05), where reliability was defined as the difference between within-subject and average across-subject similarity. Importantly, while the low-level sensory property of loudness showed the highest average reliability across parcels (r = 0.11), the concrete–abstract axis showed the second highest average reliability (r = 0.09) and was significantly more reliable than all other linguistic (i.e., non-sensory) properties (frequency, r = 0.04; β = 0.01; t(42,967) = 8.71; valence, r = −0.002; β = 0.05; t(42,967) = 41.83; arousal, r = 0.02; β = 0.03; t(42,967) = 23.74; all ps < 0.001).
We next disentangled the separate contributions of within- and across-subject similarity in driving reliability of individual representations. In theory, high individual reliability of representations across stories could result from (1) highly similar representations within subjects, (2) highly dissimilar representations across subjects, or (3) a combination of the two. Accordingly, for each word property, we calculated the within- and across-subject similarity of representations. Specifically, we calculated the similarity of across-subject representations both within the same stories and across different stories. We compared the similarity of within-subject representations with both forms of across-subject similarity. Importantly, this comparison ensured that any observed differences in reliability stemmed from individualized representations (within-subject similarity) above and beyond characteristics of the presented stories.
For all word properties with significant reliability (i.e., all except valence), participants’ representations were significantly similar to themselves across different stories (Fig. 3B; one-sample t-tests; all ps < 0.001). Critically, participants were significantly more similar to themselves than to other participants, even when across-subject representations were compared within the same story (LME range of β = −0.01 to 0.03; all ps < 0.001).
We then examined whether there was a relationship between within- and across-subject similarity of word property representations. By correlating within- and across-subject similarity values across parcels, we found that brain areas with word property representations that were more similar within subjects also showed higher similarity in representations across subjects (loudness, r = 0.874; concrete–abstract, r = 0.784; frequency, r = 0.797; valence, r = 0.428; arousal, r = 0.599; all ps < 0.001). This finding recapitulates a seemingly paradoxical phenomenon previously shown in functional connectivity fingerprinting: brain states that make individuals more similar to others also make them more similar to themselves (Finn et al., 2017).
Individuals are identifiable from their representations of the concrete–abstract axis
The previous analyses revealed that individuals’ representations of the concrete–abstract axis are stable across stories, but how unique are these representations? High reliability does not necessarily imply uniqueness: low average across-subject similarity could be due to high variability in across-subject similarity. In other words, certain pairs of subjects may have highly similar representations of the concrete–abstract axis, despite most of the group exhibiting low similarity. To test the extent to which word property representations are unique to each individual, we evaluated our ability to identify subjects from their representations of each word property.
Across cortical parcels, we were able to identify subjects from representations of both sensory response (loudness) and all four linguistic properties across much of the brain (Fig. 3C; n = 10,000 permutations; all qFDR < 0.05). Of note, the average identification rates across cortical parcels were low in an absolute sense but still significantly above chance (chance, 2.22%; Fig. 3D). Overall, representations of loudness provided the best ability to identify subjects (22.1%), demonstrating significantly higher identification rates, on average, than the concrete–abstract axis (16.5%; β = 10.41; t(948) = 14.77; p < 0.001). However, representations of the concrete–abstract axis enabled significantly higher identification accuracy than representations of other linguistic properties (frequency, 8.8%; β = 2.9; t(948) = 4.11; valence, 4.4%; β = 7.24; t(948) = 10.27; arousal, 6.6%; β = 5.08; t(948) = 7.2; all ps < 0.001).
We then applied a winner-takes-all approach to identifiability maps to understand the cortical parcels where concrete–abstract axis representations showed the highest accuracy out of all word properties. We found that the concrete–abstract axis enabled the highest identification of subjects—even higher than loudness—within regions including the left anterior temporal lobe, left inferior frontal gyrus, and bilateral retrosplenial cortex. These results dovetail with previous studies that have shown that areas within the left-lateralized language network and multimodal cortex are important in representing concrete and abstract concepts (Binder et al., 2005; J. Wang et al., 2010; Roxbury et al., 2014; Zhang et al., 2020).
Representations of concreteness are more reliable than representations of abstractness and drive individual identifiability
Thus far, we have shown that representations of the concrete–abstract axis are reliable within and unique to individual subjects across experiences. Yet it remains unclear whether both ends of this continuum—concreteness and abstractness—contribute equally this reliability and uniqueness.
On one hand, representations of concreteness may be more reliable than those of abstractness due to greater associations with sensory experience. On the other hand, representations of abstractness may be more idiosyncratic, as uniquely language-based representations could depend more heavily on individual experience to create meaning. While prior work suggests that representations of abstract words exhibit lower similarity across individuals than concrete words, disentangling the source of this difference requires (1) evaluating the stability of concreteness and abstractness as classes and (2) assessing similarity within the same individual across experiences.
To understand the differential contributions of concreteness and abstractness in driving reliability, we dichotomized the continuous concrete–abstract axis and estimated reliability separately for each end of the spectrum. Specifically, we first limited our analysis to nouns to avoid confounds associated with different parts of speech, as verbs are more prevalent at the abstract end of the axis. We then separated the top 30% of words at each end of the concrete–abstract axis into two classes representing “concreteness” and “abstractness.” Lastly, we used a GLM to estimate separate BOLD response patterns for “concreteness” and “abstractness.”
We observed that representations of concreteness and abstractness each demonstrated significant reliability across stories in several brain regions (Fig. 4B; n = 10,000 permutations; both qFDR < 0.05). By contrasting the reliability maps, we found that many cortical parcels (36% or 72/200) exhibited more reliable responses to concreteness than abstractness. On the other hand, no parcels showed greater reliability for representations of abstractness over concreteness. We then repeated our identifiability analysis (see Materials and Methods) to understand whether these representations of concreteness and abstractness were unique enough to discriminate individual subjects from one another. Across the majority of parcels, we were able to identify individuals based on their representations of both concreteness and abstractness significantly above chance (Fig. 4C; n = 10,000 permutations; both qFDR < 0.05). However, at a whole-brain level, representations of concreteness showed a significantly higher rate of identification compared with representations of abstractness (Fig. 4D; concreteness, 14%; abstractness, 6.4%; β = 3.83; t(190) = 12.79; p < 0.001). Together, these findings suggest that representations of concreteness primarily drive reliable responses of the concrete–abstract axis and are more individualized than representations of abstractness, extending previous, population-level findings to individual patterns of neural responses (West and Holcomb, 2000; Binder et al., 2005; Roxbury et al., 2014; X. Wang and Bi, 2021; Tong et al., 2022).
Concrete concepts share an underlying representational signature that drives reliability of representations across experiences
Why might neural representations of the concrete end of the spectrum be more reliable than representations of the abstract end? One potential explanation is that concrete words share the property of imageability, which carries its own representational signature that undergirds the representations of individual concrete words despite their differences in meaning. This representational signature could serve to stabilize the representations of individual concrete words across different contexts and in relation to other concrete words. While the naturalistic nature of these stimuli means that we did not necessarily have repeated presentation of the same word(s) across stories, we can use NLP techniques to group words into clusters of semantically related words and use these clusters to help understand why representations of concreteness are more reliable than those of abstractness, even when generalizing over individual words and concepts.
Numerous recent studies have demonstrated parallels in language representation between humans and NLP models (Huth et al., 2016; Schrimpf et al., 2021; Caucheteux and King, 2022; Goldstein et al., 2022; Tuckute et al., 2024). Here, we used a word-embedding NLP model (GloVe; Pennington et al., 2014) to understand how the semantic relationships among concrete and abstract words relate to the reliability of representations of the concrete–abstract axis. Specifically, we embedded concrete and abstract words within a high-dimensional semantic space and clustered words based on their semantic similarity. We then analyzed the similarity of these “concept clusters” in semantic space and, analogously, the similarity of neural responses to each cluster across stories using linear mixed-effects models (see Materials and Methods).
The semantic-embedding analysis confirmed that words within the same concept cluster were more similar to each other than to words in different clusters (Fig. 5C; β = 0.03; t(610) = 14.71; p < 0.001), a pattern of results consistent across both concrete and abstract clusters (pairwise comparisons; concrete, t(306) = 10.76; abstract, t(306) = 10.03; both ps < 0.001). This was expected given that the clustering was performed on semantic distances but still served as a useful check on the appropriateness of the cluster solution. Here, we also observed a somewhat puzzling result: within semantic space, abstract clusters were generally more similar to one another than concrete clusters were to one another (β = 0.03; t(610) = 5.87; p < 0.001). This finding was particularly surprising given the results from the previous analysis (compare Fig. 4B) that showed that neural representations of concreteness are more reliable than representations of abstractness. Why might the concrete end of the spectrum, which encompasses more variability in (i.e., spans more of) semantic space, show less variability in its neural representations?
We next turned to analyze within-subject neural representations of concrete and abstract concept clusters. Echoing the results in semantic space, representations of words within the same cluster were more similar across stories than representations of words in different clusters (Fig. 5D; β = 0.007; t(34,373) = 20.04; p < 0.001), and this was true for both the concrete and abstract ends of the spectrum (concrete, z = 4.36; abstract, z = 23.99; both ps < 0.001). In contrast to the similarity of clusters in semantic space (Fig. 5C), neural representations of concrete clusters exhibited greater similarity than abstract clusters regardless of semantic distance (same or different clusters; β = 0.01; t(34,373) = 29.45; p < 0.001; Fig. 5D).
Critically, there was also an interaction such that the similarity advantage for same- over different-cluster representation was smaller for concrete clusters than for abstract clusters (β = −0.005; t(34,373) = −13.88; p < 0.001). Strikingly, neural representations of different concrete clusters were more similar within subjects across stories than neural representations of the same abstract cluster (Fig. 5D; mean difference, 0.007; z = 7.12; p < 0.001). Furthermore, this pattern of results persisted when analyzing similarity across subjects (within > across, β = 0.002; t(34,373) = 24.11; concrete > abstract, β = 0.001; t(34,373) = 17.07; interaction, β = −0.001; t(34,373) = −13.27; all ps < 0.001; data not shown), suggesting that a consistent principle drives how concreteness is represented across similar words, within individuals, and across the population.
Considered together, neural representations of semantically similar concrete words were more alike than those of semantically similar abstract words, despite concrete words spanning greater distances within semantic space than abstract words. These divergent results between the NLP model and neural data suggest that concrete words share a representational signature beyond linguistic representations due to sensory associations that could stem from integrating visual information into the neural representations.
Discussion
Word meanings vary across both people and contexts, often informed by conceptual associations specific to the individual as well as different situations in which the word is used. What linguistic properties provide a stable foundation for conceptual knowledge while simultaneously supporting unique, individual experience? Here, we found that the concrete–abstract axis provides a basis for both population stability and individual variability in the representation of natural language.
Many studies have demonstrated that while both concrete and abstract words evoke responses within the language network (Friederici et al., 2000; Binder et al., 2005; Moseley and Pulvermüller, 2014; Del Maschio et al., 2021), concrete words exhibit stronger and longer-lasting responses (West and Holcomb, 2000; Barber et al., 2013; Vignali et al., 2023) and also engage multimodal cortices, such as bilateral angular gyrus, posterior cingulate, and precuneus, more than abstract words (Binder et al., 2005; J. Wang et al., 2010; Roxbury et al., 2014; Zhang et al., 2020). In our study, we assessed whether reliability exists uniformly across the concrete–abstract axis, enabling us to understand if previously observed variability in abstract word representations can be explained by variability in representations of abstractness itself. We found reliable representations of the concrete–abstract axis within regions related to the language network and within the multimodal cortex that were unique to individual subjects across diverse, naturalistic stories. Critically, representations of the concrete–abstract axis were more reliable than representations of other linguistic properties (i.e., frequency, valence, arousal), and this effect was driven primarily by the stable representations of the concrete end of the axis. Together, our results suggest that word representations are stabilized by consistent representations of concreteness more so than abstractness, potentially due to the engagement of multimodal areas known to integrate sensory and linguistic information.
Traditionally, neural representations of language have been probed by presenting participants with single words, sentences, and short paragraphs (Bookheimer, 2002; Hagoort, 2019). These studies have revealed neural territory specific to language (Fedorenko et al., 2011; Malik-Moraleda et al., 2022) that closely interacts with other networks involved in cognitive control and theory of mind (Fedorenko and Thompson-Schill, 2014; Paunov et al., 2019, 2022). In contrast to these carefully controlled experiments, everyday language is dynamic and contextualized, such that the meanings of words and sentences are informed by larger narrative structure (L. S. Hamilton and Huth, 2020; Willems et al., 2020). It is therefore crucial to evaluate the degree to which findings of carefully controlled studies extend to naturalistic language perception (Nastase et al., 2020). Within the present study, participants were presented with naturalistic auditory narratives representative of how language is used in day-to-day life. Importantly, we found that representations of abstractness, as well as clusters of related abstract words, were more variable both within and across subjects than representations of concrete words.
The finding of higher across-subject variability for abstractness aligns with another recent study that used a single-word paradigm to study abstract words (X. Wang and Bi, 2021); the authors of that study interpreted this heightened variability as reflecting individual differences in meaning of abstract words in particular. However, the appeal to individual differences implies a stability of representations within the same subject over time, which was not tested. Our study differs from this previous work in two ways: first, we examined neural representations to the concrete–abstract axis across words within distinct, naturalistic stories, and second, we evaluated the reliability of representations within subjects, across stories to understand if abstractness is idiosyncratically represented. We found that compared with representations of concreteness, representations of abstractness were more variable not only across subjects but also within the same individual across distinct experiences. This suggests that variability in abstract words stems less from individual differences in meaning and more from a general instability of representations of abstractness.
Recent developments in NLP models have provided researchers with tools to better investigate how the human brain organizes and processes natural language (Huth et al., 2016; Schrimpf et al., 2021; Caucheteux and King, 2022; Goldstein et al., 2022; Tuckute et al., 2024). These computational models not only capture semantic relationships between words but also contain rich knowledge regarding how words relate within various contexts (Erk, 2012). Importantly, the contextual relationships between concrete words—that a fish and a whale may be semantically similar in terms of “wetness” but different in terms of “size”—closely correspond to human judgments of the same categories (Grand et al., 2022). Yet, within our study, we found that clusters of concrete words were less similar than clusters of abstract words within an NLP model but more similar in the human brain. This dissociation supports theories of grounded cognition that suggest representations of concreteness carry additional information beyond pure linguistic representation (Altarriba et al., 1999; Tuckute et al., 2018). Indeed, recent computational work has demonstrated that visual grounding is essential for linguistic representations to capture human ratings of the concrete–abstract axis (Zhang et al., 2021). While prior work has revealed subsets of abstract words that also exhibit sensory associations (Barsalou and Wiemer-Hastings, 2005; Ghio et al., 2013; Kiefer and Harpaintner, 2020), the lower similarity of abstract words even within a concept cluster suggests that the representational signature of sensory experience may be weaker or not present for abstract words. Together, these findings suggest that concrete words, but not abstract words, carry a shared signature of sensory grounding that stabilizes their neural representations both within and across subjects.
Though our work aligns with and extends past work on the concrete–abstract axis, it has some limitations. First, it is possible that we have underestimated the extent to which neural representations of the other properties (valence, arousal, frequency) are also idiosyncratic. In the current study, we leveraged preexisting human ratings of these properties, but these behavioral ratings were collected by presenting participants with individual words out of context. Similarly, we leveraged an NLP model that does not incorporate contextual information into the word-level representations. Some of these other properties, especially valence and arousal, may be more context-dependent and require ratings specific to a given story or individual to understand the idiosyncrasies in neural representations. In addition, the moderate negative relationship between the concrete–abstract axis and word frequency in our dataset also leaves open the possibility that some effects attributed to concreteness may be shared with (inverse) frequency. Second, due to the diversity of content across the auditory narratives, we were limited in our ability to compare representations of the same words across stories. We addressed this by comparing the neural representations of clusters of similar words across stories, extending prior work on single words to the organization of broader concepts in semantic space. Future work could select stories that contain the same words but vary in narrative content to understand the stability of both specific words and semantic organization more generally across experiences.
In sum, our work establishes the concrete–abstract axis as a critical dimension for promoting both shared and individualized representations of language. In particular, these findings disentangle the sources of individual variability of concrete and abstract word representation and reveal a representational signature of sensory experience specific to concrete words that boost their representational stability. Our results underscore the importance of considering within-subject variability when identifying underlying drivers of common versus idiosyncratic processing of natural language.
Footnotes
The authors declare no competing financial interests.
- Correspondence should be addressed to Thomas L. Botch at tlb.gr{at}dartmouth.edu or Emily S. Finn at emily.s.finn{at}dartmouth.edu.