Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE
PreviousNext
Research Articles, Behavioral/Cognitive

MEG Evidence That Modality-Independent Conceptual Representations Contain Semantic and Visual Features

Julien Dirani and Liina Pylkkänen
Journal of Neuroscience 3 July 2024, 44 (27) e0326242024; https://doi.org/10.1523/JNEUROSCI.0326-24.2024
Julien Dirani
1Departments of Psychology, New York University, New York, New York 10003
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Julien Dirani
Liina Pylkkänen
1Departments of Psychology, New York University, New York, New York 10003
2Linguistics, New York University, New York, New York 10003
3NYUAD Research Institute, New York University Abu Dhabi, Abu Dhabi 129188, United Arab Emirates
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Liina Pylkkänen
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • Peer Review
  • PDF
Loading

Abstract

The semantic knowledge stored in our brains can be accessed from different stimulus modalities. For example, a picture of a cat and the word “cat” both engage similar conceptual representations. While existing research has found evidence for modality-independent representations, their content remains unknown. Modality-independent representations could be semantic, or they might also contain perceptual features. We developed a novel approach combining word/picture cross-condition decoding with neural network classifiers that learned latent modality-independent representations from MEG data (25 human participants, 15 females, 10 males). We then compared these representations to models representing semantic, sensory, and orthographic features. Results show that modality-independent representations correlate both with semantic and visual representations. There was no evidence that these results were due to picture-specific visual features or orthographic features automatically activated by the stimuli presented in the experiment. These findings support the notion that modality-independent concepts contain both perceptual and semantic representations.

  • concepts
  • lexical
  • MEG
  • modality
  • semantic
  • visual

Significance Statement

This study sheds light on how the human brain stores semantic knowledge across different stimulus modalities (pictures and written words). We developed a method that allowed us to investigate the content of conceptual representations in the brain independently of the stimulus modality that was perceived by participants. Results showed that modality-independent representations contain both semantic and visual features. We found no evidence that these results are due to picture-specific visual or orthographic features activated by the stimuli presented in the experiment.

Introduction

How does the human brain represent semantic knowledge? A fundamental aspect of the brain's capacity to encode meaning lies in its ability to abstract away from an initial input modality. For example, reading the word “cat” and looking at the picture of a cat both activate similar conceptual content. While extensive research has shown that different tasks and stimulus modalities (such as pictures and words) activate common brain areas and shared representations of concepts (Meyer et al., 2010; Simanova et al., 2010; Akama et al., 2012; Devereux et al., 2013; Fairhall and Caramazza, 2013; Dirani and Pylkkänen, 2023), the content of those representations remains unknown. Neurobiological accounts of semantic knowledge show an increasing reliance on feature spaces distributed across the cerebral cortex (Binder and Desai, 2011) with evidence supporting the existence of a cross-modal conceptual hub in the anterior temporal lobe (Ralph et al., 2017). Yet, a major challenge lies in the difficulty to distinguish between the processes by which semantic information is retrieved and the actual content of the representations (Poeppel and Idsardi, 2022). Few theories specify the set of distributed features that contribute to invariant conceptual representations, that is, representations that are consistently active regardless of the stimulus modality (such as pictures or words). In this paper, we define modality-independent representations as any representations that are consistently activated across two stimulus modalities: pictures and written words. We use a novel representational learning approach to test the extent to which modality-independent representations contain semantic features, sensory components, or orthographic features of the words corresponding to the concept. Furthermore, we assess the extent to which the content of modality-independent representations dynamically changes over the milliseconds following the perception of a stimulus.

Prominent theories of word meaning and concepts, like the prototype and exemplar theories (Posner and Keele, 1968; Medin and Schaffer, 1978), rely on semantic features as their foundational element of representation (e.g., is an animal, can fly, has feathers). Semantic feature representations are also employed in numerous models of semantic memory, object recognition, and word recognition (Collins and Loftus, 1975; Hinton and Shallice, 1991; Plaut and Shallice, 1993; Plaut, 2002; Harm and Seidenberg, 2004). Given the well-established predictive power of semantic feature models across multiple modalities, the most intuitive hypothesis is that they are the content of modality-independent representations. Alternatively, modality-independent representations could contain sensory-level representations, such as visual or auditory representations. For example, reading the word “cat” could activate visual representations related to cats. In fact, several studies have shown that the brain engages sensory–motor features in the representation of concepts, such as visual shapes, sounds, and motor representations (Binder and Desai, 2011; Binder et al., 2016; Ralph et al., 2017). However, it remains unknown whether these representations are modality-independent, that is, activated for all input modalities.

To address these hypotheses, we combined a cross-condition decoding approach (King and Dehaene, 2014) with representational similarity analysis (RSA; Kriegeskorte et al., 2008) using MEG. Twenty-five participants viewed pictures and words while making a binary animacy judgment. We used deep neural network classifiers which learned latent representations during training. If these classifiers trained on one modality (e.g., words) successfully generalize to another modality (pictures), they would have learned latent representations of concepts that are independent of the training and testing modalities. The RSA allowed us to compare these latent modality-independent representations with models representing conceptual, sensory, and linguistic aspects of the concepts (Fig. 1).

An experimental paradigm featuring repeated presentations of pictures and their corresponding words opens up the possibility that encountering a concept in one modality could trigger the automatic activation of the stimulus expressing that concept in the other modality. For example, the word “dog” could trigger mental imagery of a previously presented picture of a dog. To address this type of interpretation for any potential modality-independent representation, we also compared the latent modality-independent representations to the stimulus-specific visual representations captured by a pretrained ResNet convolutional neural network (He et al., 2016), as well as orthographic representations of the written words captured by Levenshtein distance (Levenshtein, 1966).

Materials and Methods

Participants

Twenty-five native English speakers were paid to participate in the study [15 females; age, M = 23.70; standard deviation (SD) = 4.92]. Participants were recruited by posting advertisements about the study throughout the Washington Square campus of New York University (NYU), although recruitment was not limited to NYU students. All participants reported normal or corrected-to-normal vision and no history of neurological or language disorders. The study received ethical approval from the institutional review board at NYU.

Experimental design

Participants were asked to perform a binary animacy judgment, while their brain activity was recorded using magnetoencephalography. The stimuli consisted of 16 exemplars that were presented as pictures and as words, with each exemplar repeated 40 times for each modality resulting in a total of 1,280 trials. Each trial started with a fixation cross that appeared on the screen for 300 ms, followed by a blank screen for 300 ms, and finally the target picture or word appeared on the screen until a response was given using a button box. All responses were given using the left index for animate exemplars and left middle finger for inanimate exemplars. Participants were instructed to give their responses as fast and as accurately as they could. The interstimulus intervals were randomly sampled from a uniform distribution with a range of 200–700 ms. Stimuli were presented using Psychopy 2020.1.2 (Peirce et al., 2019). All stimuli were matched between animate and inanimate words for length, lexical frequency, number of morphemes, number of phonemes, number of phonographic neighbors, number of orthographic neighbors, number of phonological neighbors, and average bigram count (Balota et al., 2007).

MEG acquisition and preprocessing

Continuous MEG was recorded with a 157-channel axial gradiometer system (Kanazawa Institute of Technology) at a sampling rate of 1,000 Hz with an online bandpass filter of 0.1–200 Hz. The raw data was noise-reduced with the continuously adjusted least-square method (Adachi et al., 2001) using the MEG Laboratory software 2.004A (Yokogawa Electric and Eagle Technology). The data was low-pass filtered off-line at 40 Hz, and bad channels were identified after visual inspection, and the data for those channels were estimated using interpolation (Perrin et al., 1989). An independent component analysis was then fitted to the data using the “FastICA” method, selecting components by a 95 cumulative percentage of explained variance. Components related to eyeblinks, saccades, and heartbeats were then rejected manually. Epochs from −100 to 600 ms from the target onset were extracted, and baseline correction was done using the 100 ms preceding the target onset. Time-locking of the epochs to the MEG triggers was adjusted using a photodiode. Evoked responses were created by averaging every five random repeats of each exemplar resulting in a total of 128 averaged epochs per modality and per participant. The resulting evoked responses were downsampled by averaging nonoverlapping bins of 5 ms.

Statistical analyses

Cross-condition decoding

For each of the picture and word modalities and at each timepoint, a unique feedforward neural network classifier was used to discriminate MEG response patterns associated with each of the 16 unique basic-level concepts from the 157 sensors of the MEG data. The data was first scaled, so the mean activity at each sensor was 0 with an SD of 1. This was done independently at each timepoint. The network architecture consisted of an input layer with an input size of 157, corresponding to the MEG channels, followed by three hidden layers containing 100, 50, and 10 neurons, respectively. The output layer size was 16, corresponding to the 16 unique exemplars. Rectified linear unit activation function was employed for all hidden layers. The optimization of the neural network was facilitated by the Adam solver (Kingma and Ba, 2014). The learning rate was set to a constant value of 0.001. Regularization was implemented through an L2 regularization term (alpha) set to 0.0001. The training progress was evaluated using accuracy as the scoring metric, aiming to maximize classification accuracy using cross-entropy loss, with 10% of the training data being reserved as a validation set. In selecting the neural network architecture, empirical validation was employed through an iterative process involving experimentation with multiple configurations. Final accuracy scores were obtained at each timepoint using a fivefold cross-validation. This procedure was done separately for each subject, and average accuracy scores across subjects are reported at each timepoint, and the results are plotted in ⇓Fig. 2A,B. This first step served as a sanity check that the 16-way classification returned above-chance accuracy scores within each condition. In order to investigate whether and when modality-independent representations of semantic categories occur, we assessed the extent to which classifiers trained on one modality could generalize when tested on the other modality. This was done for all pairs of timepoints, for example, the classifier trained at 100 ms on the words MEG data was tested on all timepoints from −100 to 600 ms on the pictures’ data, following the condition generalization approach from King and Dehaene (2014). This procedure was done separately for each subject.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Analysis pipeline. (1) Cross-condition decoding in which neural network classifiers trained on one modality were tested on the other modality, for all pairs of timepoints. This allowed us to map the clusters of timepoints (ttrain, ttest) where modality-independent representations of basic-level concepts were activated. (2) Classifiers with successful cross-condition generalization were assumed to have learned latent representations of the semantic space that were modality-independent. We investigated the representational content of those representations using RSA and compared them to three hypotheses spaces. We also compared them to the ResNet embeddings of the picture stimuli and the orthographic features of the words to test whether the shared representations between pictures and words did not merely result from an automatic reactivation of stimulus-specific representations.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

The activation timing of modality-independent representations of basic-level concepts. A,B, Accuracy scores at each timepoint for classifiers trained and tested within each modality. The shaded regions indicate timepoints where classifier accuracy was above chance at the group level. C, Cross-condition decoding results where models trained on the MEG data from the words were tested on MEG data from the pictures for all pairs of timepoints (tword, tpicture). The contour plot indicates the cluster of timepoints with accuracy scores significantly above chance. Modality-independent representations were active at ∼250 ms and sustained until ∼600 ms after the stimulus onset. The part of the cluster that is off-diagonal indicates that representations that were active earlier in the pictures (∼100–300 ms) were delayed in the words (∼400–600 ms). Models trained on MEG data from the pictures and tested on the words did not significantly surpass chance-level accuracy.

Extracting brain-based modality–independent representations of concepts

The cross-condition decoding allowed us to assess the pairs of timepoints (ttrain, ttest) where a classifier trained on one modality generalized to the other modality. In order for a classifier to be modality-independent, it must have learned to extract from the MEG data representations that allow it to perform above-chance classification regardless of the input modality. The primary goal of this study was to assess the nature of modality-independent representations. Thus, by examining the latent representations captured in the last hidden layer of modality-independent classifiers, we aimed to gain insight into the nature of those representations. For each classifier trained at time ttrain, if the classifier generalized at time ttest, we extracted the last hidden layer of that classifier when the test data was fed-forward into it. We then averaged those representations across repeated exemplars, resulting in a single vector embedding for each exemplar and at each pair of timepoints (ttrain, ttest) where the classifier generalized. Overall, this method offers two advantages. First, it enables the extraction of features from the MEG signal relevant to classifying or recognizing exemplars that are shared across stimulus modalities. In other words, the hidden layers of the model represent the features that facilitate this classification task, narrowing down the signal of interest in the MEG data. Second, this method allows to directly investigate the learned modality-independent feature space and its alignment with the hypothesis models, beyond the alternative approach of comparing separate results obtained for each modality.

Representational similarity analysis

RSA was used to investigate the content of the latent representations that were learned by the modality-independent classifiers. The RSA involved computing a Pearson's correlation between representational dissimilarity matrices (RDMs) capturing pairwise cosine distances of the modality-independent representations and hypothesis RDMs. These hypothesis RDMs, in turn, depict the pairwise distances of exemplars as predicted by a hypothesis or computational model (Kriegeskorte et al., 2008). In other words, RSA allows us to compare the structure of the stimuli under some hypothesis space to the structure of the modality-independent space captured by the classifiers. Thus, for each pair of timepoints (ttrain, ttest) where the classifier generalized across conditions, we computed a modality-independent RDM and compared it to our hypothesis RDMs.

Hypothesis RDMs

Semantic features: We operationalized each exemplar as the set of features that define it, based on feature production norms by 725 participants (McRae et al., 2005). In the original paper, participants were given written words and were asked to list as many properties as they could for the concept that the word refers to. These properties were not constrained and could include internal and external features, functional properties, and taxonomic properties (for details see McRae et al., 2005). The final hypothesis RDM was constructed by first reducing the feature space of the entire dataset from 2,526 unique semantic features to 300 principal components which captured 87% of the variance and then calculating the pairwise cosine distance of all 16 unique exemplars.

Sensory features: An alternative hypothesis posits that modality-independent representations capture sensory features that are consistently recruited for the representation of concepts. To test this hypothesis, we operationalized the exemplars based on their visual and auditory components as measured by the experiential feature norms from Binder et al. (2016), where participants were given a set of semantic components (e.g., visual, auditory, social) and had to rate the extent to which a target word was associated with the component. We extracted the visual and auditory features only and build two hypothesis RDMs which captured the structure of the exemplar based on their visual and auditory features.

ResNet embeddings: Modality-independent representations that were captured by the classifier could be a reactivation of the other modality's specific visual components. For example, when participants read the word “dog,” they might reactivate the mental image of the dog that was previously encountered during the experiment. While the visual features described above capture the extent to which visual representations contribute to the representation of a concept, these features are not specific to the picture stimulus of that concept that was used in the current experiment. In order to operationalize the visual representation of the specific stimuli used here, we processed the JPEG files containing pictures and word stimuli through a pretrained convolutional neural network (ResNet; He et al., 2016) and extracted the last hidden layer for each exemplar and each modality.

Orthographic features: Similarly, it is possible for the orthographic representations of the words to be automatically activated when viewing their picture equivalent during the experiment. To test this possibility, we operationalized the RDM of orthographic representations using the pairwise Levenshtein distance between the words. Given that lexical frequency has consistently been used as a measure of lexical processing with electrophysiological data, we also included it as a supplemental test, which returned no significant correlations with modality-independent representations extracted from MEG data.

To ensure that each hypothesis captured a unique aspect of the representation space, we confirmed that all pairwise Pearson's correlation coefficients between hypothesis remained under 0.3.

Group-level statistical analyses

To evaluate classifier accuracy at the group level, for each pair of timepoints (ttrain, ttest), a t value was computed using a one-tailed one–sample t test assessing the group-level average accuracy score against chance (0.0625). The resulting t value map was then thresholded at a t value corresponding to an uncorrected p value of 0.05. Clusters were formed based on direct adjacency in time (minimum timepoints in a cluster, 2), and the sum of all t values (Σt) was computed for each resulting cluster. This procedure was then repeated by randomly permuting the data 10,000 times in order to obtain a null distribution. The Monte Carlo p value was computed for each cluster in the original t map as the proportion of random permutations in which the observed Σt was larger than the values from the permutation distribution. We retained clusters whose Monte Carlo p value was smaller or equal to 0.05 (Maris and Oostenveld, 2007). To assess the group-level significance of the RSA, for each pair of timepoints (ttrain, ttest), a t value was computed using a two-tailed one–sample t test assessing the group-level average Pearson's correlation coefficients score against 0. A similar procedure as described above was then performed to compute a null distribution, with the exception that we retained clusters whose Monte Carlo p value was smaller or equal to 0.05 correcting for multiple comparisons using false discovery rate (Benjamini and Yekutieli, 2001) across all hypotheses (5).

Results

Modality-independent representations of basic-level concepts are activated at ∼250 ms

For each timepoint ttrain of the MEG data of the word modality, we trained a classifier to decode the exemplar that was presented on the screen. We then tested the classifier at all timepoints ttest of the MEG data of the picture modality. This allowed us to find the pairs of timepoints (ttrain, ttest) where modality-independent representations of concepts are activated. The results showed that for both pictures and words, modality-independent representations were active at ∼250 ms and sustained until ∼600 ms after the onset of the stimulus. Some representations that were active early in the pictures (∼100–300 ms) seemed to be also activated later in the words (∼400–600 ms) as depicted by the off-diagonal cluster on the matrix of scores (Fig. 2C; p < 0.05). Notably, the converse approach of training on the picture data and testing on the words did not yield above-chance decoding accuracies suggesting that models trained on pictures may have relied on picture-specific representations that are not activated when processing words. This discrepancy likely stems from the fact that MEG data from pictures potentially carries a stronger signal reflecting low-level visual features, which does not readily generalize to words. When classifiers are trained on words, they might more effectively pick up on semantic representations since little to no information about the exemplars is present in the visual features. These classifiers would then more effectively generalize to the picture recognition process. Finally, it is worth noting that the within-condition decoding accuracies for the pictures (Fig. 2A) appear higher than those for the words (Fig. 2B). This is in line with prior work showing worse classification performance on MEG data from words compared with those from picture data (Simanova et al., 2010; Dirani and Pylkkänen, 2023).

Modality-independent conceptual representations correlate with the semantic and visual hypotheses

To assess the content of modality-independent representations, we compared the latent representation space learned by successful modality-independent classifiers to our hypotheses using RSA. This was done at each pair of timepoints (tword, tpicture) where modality-independent representations were identified. Each hypothesis was formulated as an RDM, representing the structure of the stimulus space through pairwise distances of all exemplars. The semantic feature hypothesis was operationalized using human-normed semantic features (McRae et al., 2005), while visual and auditory sensory hypotheses were operationalized using brain-based experiential norms (Binder et al., 2016). The results showed that modality-independent representations significantly correlated both with the semantic features and the visual features (Fig. 3A,B). There was no evidence that modality-independent representations correlated with orthographic features or with auditory features (Fig. 3C,D). The semantic feature hypothesis correlated with modality-independent representations at virtually all timepoints (tword, tpicture) where modality-independent representations were observed, illustrating the widespread presence of semantic feature representations in the encoding of modality-independent conceptual representations. Visual representations followed a similar temporal extent, although qualitative differences can be observed (Fig. 3B) where the effect appears constrained to a cluster falling around the diagonal of the accuracy matrix (Fig. 3B). This suggests that the modality-independent representations that correlated with visual features evolved at relatively concurrent timepoints in the pictures and words from ∼250 ms until 600 ms. To test whether our positive results for the presence visual features in modality-independent representations may simply reflect reactivation of a specific previously encountered picture in the experiment, we processed the picture stimuli through a pretrained convolutional neural network (ResNet; He et al., 2016) and extracted the activation of the last hidden layer for each unique exemplar. This revealed no evidence that the modality-independent representations identified by the classifier reflected reactivation of specific picture stimuli when reading their corresponding words during the experimental task (Fig. 3E).

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

RSA results investigating the content of modality-independent representations. For each pair of timepoints (tword, tpicture) where modality-independent representations were identified (see cluster result in Fig. 2), we investigated their content using RSA. The contour plots indicate clusters of timepoints where modality-independent representations significantly correlated with the hypothesis. The gray area represents points outside the cluster of modality-independent representations and contains no data. A,B, Modality-independent representations significantly correlated with the semantic features and the visual features hypothesis. Semantic features had a widespread correlation over most of the cluster, while visual representations appear qualitatively constrained to the part of the cluster falling around the diagonal. C, Modality-independent representations did not correlate with auditory features. D,E, Modality-independent representations did not correlate with ResNet embeddings or orthographic features, suggesting that shared representations between pictures and words did not merely result from an automatic reactivation of stimulus-specific representations.

Discussion

Modality-independent conceptual representations contain both semantic and visual features

While previous work has shown evidence that the brain stores modality-independent representations of concepts (Ralph et al., 2017), to date, the content of those representations has remained unknown. Here we used an innovative approach combining cross-condition decoding with RSA to directly investigate the nature of modality-independent representations. We found that modality-independent representations most strongly align with the semantic feature hypothesis, suggesting a shared semantic space across modalities. In line with expectations, the brain recruits a semantic feature space that serves to define the content of the concept irrespective of the initial stimulus modality (e.g., a dog: it is an animal; it barks; it has a tail). But this does not mean that modality-independent representations must be purely amodal, that is, void of any sensory–motor content. Semantic features often themselves constitute complex concepts (such as “has wings”), which means they may themselves have an internal structure containing, say, visual or motor representations. Consistent with this, our RSA results indicated that modality-independent representations shared a similar structure to the visual hypothesis. This suggests that the semantic content that is consistently activated for both words and pictures contains visual representations in addition to semantic representations. For example, both reading the word “cat” or seeing a cat would activate some visual representations related to cats, as expected under theories in which concepts are neurally represented at least to some extent using the perceptual systems in which they are experienced (L. W. Barsalou, 1999; Binder and Desai, 2011; Kiefer and Pulvermüller, 2012; Meteyard et al., 2012; Binder et al., 2016). In contrast, we found no evidence that modality-independent representations contain auditory representations. Of course, both our words and pictures were presented visually. This and the use of an animacy judgment task could have encouraged the activation of visual representations, in a way that other stimulus modalities (e.g., speech) and tasks would not. We addressed this possibility by showing that stimulus-specific visual representations captured by ResNet did not correlate with modality-independent representations. Still, were the words presented auditorily or the concepts as auditory objects (such as a dog barking), the recruited sensory representations might have been more auditory in nature. In fact, prior work on areas thought to be modality-independent suggest that they encode a range of modality-specific representations in a graded fashion (Ralph et al., 2017), potentially including auditory representations. This can be straightforwardly tested with our novel method in the future.

Finally, modality-independent representations did not correlate with the orthographic features of the words, suggesting that the word forms were not necessarily activated when participants were looking at their corresponding pictures. Nevertheless, this remains a null finding, and it does not rule out the possibility that orthographic representations either contribute to or are coactivated with modality-independent representations.

Overall, these results demonstrate that modality-independent representations consist not only of semantic features but also of visual representations. In other words, the subsets of features that are active regardless of the stimulus modality are not limited to semantic features alone. While these results seem to support a hybrid theory involving perceptual representations and an amodal conceptual hub (Ralph et al., 2017), a purely embodied position could also align with these findings. This can be achieved by assuming that semantic features are in fact supported by perceptual representations, for example, the semantic feature “has wings” containing visual representations (see Limitations). Conversely, purely amodal views could argue that the engagement of perceptual representations is the result of spreading activation. In fact, we defined modality-independent representations as any representations that are coactivated in picture and word recognition, with the assumption that these shared representations reflect a core part of the conceptual representations. However, they could also be a result of spreading activation from one modality-specific representation (e.g., semantic) to another modality-specific representation (e.g., visual), with neither corresponding to shared modality-independent representations. Disentangling these two possibilities has long been a challenge in the field, and drawing a clear distinction between them remains a challenge (Mahon and Caramazza, 2008; Hauk and Tschentscher, 2013; Mahon, 2015). Nevertheless, here we propose a novel method that, at the very least, provides insight into the content of those representations that are systematically activated across pictures and words, indicating that they engage perceptual processes.

The timing of activation of modality-independent representations

Here we trained classifiers of basic-level concepts on MEG data from the words and tested these classifiers on the MEG data from the pictures. By systematically examining all pairs of timepoints (tword, tpicture), we mapped out the time-course of activation of modality-independent conceptual representations (King and Dehaene, 2014). Our findings revealed that modality-independent representations of basic-level concepts are activated at ∼250 ms for both the pictures and the words. This novel finding complements prior MEG evidence that modality-independent representations of higher-level categories (animacy) are simultaneously activated at 150 ms for picture and words (Dirani and Pylkkänen, 2023). The temporal pattern aligns with prior research in object recognition, showing that superordinate category representations are active within the first 150 ms (Fabre-Thorpe, 2011; Cichy et al., 2014; Clarke et al., 2015) followed by basic-level representations at ∼200 ms (Martinovic et al., 2008; Schendan and Maher, 2009; Peelen and Caramazza, 2012; Quiroga, 2012; Clarke et al., 2013; Clarke and Tyler, 2015). Crucially, our results extend this temporal pattern to the activation time-course of modality-independent representations indicating that they follow a hierarchical process in which superordinate representations are activated first, followed by the basic-level representations.

Finally, regarding the early ∼75 ms onset of decoding in words, it is possible that this early effect represents ultrarapid semantic processing, as observed in prior work (Hauk et al., 2012; MacGregor et al., 2012; Amsel et al., 2013). This would imply that the later cross-condition decoding reflects a second stage of semantic processing that follows early modality-specific representations, in line with embodied theories that distinguish early semantic retrieval from later “situated simulation” (L. Barsalou, 2003). This could explain our observed correlation of modality-independent representations with visual representations. However, another interpretation is that early word decoding reflects low-level features like the size on screen; in fact, within-condition decoding starts at ∼75 ms for both the pictures and words. Cross-condition decoding becomes valuable in this context, as it assumes models generalizing across conditions do not rely on low-level features. Thus, later modality-independent representations may not denote a second stage but rather the earliest representations not reliant on low-level features, which generalize to picture naming. In fact, the 250 ms onset of cross-condition decoding aligns with semantic processing and lexical access timings (Indefrey and Levelt, 2004). In practice, it is possible that both interpretations explain these results with the early modality-specific decoding relying on both low-level features and an initial stage of semantic processing, followed by modality-independent semantic representations.

Limitations

One limitation concerns normative models of semantic features (McRae et al., 2005) in which the features constituting a concept can themselves be complex concepts (e.g., “has wings”), potentially incorporating other representations such as nested semantic features (e.g., “has feathers”) or perceptual representations (Binder et al., 2016). Similarly, visual features could also contain semantic representations, for example, capturing high-level semantic categories. This implies that the semantic/perceptual distinction can be viewed as a continuum, in a way that could make it impossible to fully separate them. Despite ensuring that each hypothesis contributed to a unique aspect of the representation space (with all pairwise correlations between hypothesis models <0.3), this limitation could explain the overlap observed in the clusters of correlations with visual features and semantic features (Fig. 3A,B). Nevertheless, it does not undermine the positive finding that modality-independent representations correlated with visual representations, supporting the conclusion that perceptual processes are engaged in the encoding of modality-independent concepts.

It is also worth considering the extent to which the neural network classifiers can alter the representations encoded in the MEG data during training. Artificial neural networks aim to identify the signal from MEG data that would facilitate the classification task, potentially distorting or inflating the relevance of some features while undermining others. This emphasizes the need to consider that the current method might fail to capture some features that in fact do contribute to modality-independent representations.

Future directions

Our results pertain to the representation of concrete concepts. These results might not necessarily generalize to the representation of abstract concepts which could rely on different types of features, potentially placing less emphasis on visual representations. Future research should aim to extend our findings to abstract concepts. Finally, in this paper, we took advantage of the high-temporal resolution of MEG to investigate the temporal evolution of representations, with the trade-off that the spatial dimension was not explored. Assessing representations across the whole brain could inflate the salience of some features (e.g., highly distributed features) at the expense of others. Thus, future work should expand on the present finding to integrate them with an investigation of the spatial dimension for a more comprehensive understanding of conceptual representations and their dynamics across the entire brain.

Conclusion

Our investigation into the content of modality-independent conceptual representations revealed that they contain not only semantic features but also visual representations. This is consistent with theories in which perceptual processes play a fundamental role in the encoding of modality-independent representations.

Footnotes

  • This work was supported by the New York University Abu Dhabi Research Institute (Grant G1001) and The William Orr Dingwall Dissertation Fellowship.

  • The authors declare no competing financial interests.

  • Correspondence should be addressed to Julien Dirani at julien.dirani{at}nyu.edu.

SfN exclusive license.

References

  1. ↵
    1. Adachi Y,
    2. Shimogawara M,
    3. Higuchi M,
    4. Haruta Y,
    5. Ochiai M
    (2001) Reduction of non-periodic environmental magnetic noise in MEG measurement by continuously adjusted least squares method. IEEE Trans Appl Supercond 11:669–672. https://doi.org/10.1109/77.919433
    OpenUrlCrossRef
  2. ↵
    1. Akama H,
    2. Murphy B,
    3. Na L,
    4. Shimizu Y,
    5. Poesio M
    (2012) Decoding semantics across fMRI sessions with different stimulus modalities: a practical MVPA study. Front Neuroinform 6:24. https://doi.org/10.3389/fninf.2012.00024 pmid:22936912
    OpenUrlPubMed
  3. ↵
    1. Amsel BD,
    2. Urbach TP,
    3. Kutas M
    (2013) Alive and grasping: stable and rapid semantic access to an object category but not object graspability. Neuroimage 77:1–13. https://doi.org/10.1016/j.neuroimage.2013.03.058 pmid:23567884
    OpenUrlPubMed
  4. ↵
    1. Balota D,
    2. Yap M,
    3. Cortese M,
    4. Hutchison K,
    5. Kessler B,
    6. Loftis B,
    7. Neely J,
    8. Nelson D,
    9. Simpson G,
    10. Treiman R
    (2007) The English lexicon project. Behav Res Methods 39:445–459. https://doi.org/10.3758/BF03193014
    OpenUrlCrossRefPubMed
  5. ↵
    1. Barsalou LW
    (1999) Perceptual symbol systems. Behav Brain Sci 22:577–660. https://doi.org/10.1017/S0140525X99002149
    OpenUrlCrossRefPubMed
  6. ↵
    1. Barsalou L
    (2003) Situated simulation in the human conceptual system. Lang Cogn Process 18:513–562. https://doi.org/10.1080/01690960344000026
    OpenUrlCrossRef
  7. ↵
    1. Benjamini Y,
    2. Yekutieli D
    (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 9:1165–1188. http://www.jstor.org/stable/2674075
    OpenUrl
  8. ↵
    1. Binder JR,
    2. Conant LL,
    3. Humphries CJ,
    4. Fernandino L,
    5. Simons SB,
    6. Aguilar M,
    7. Desai RH
    (2016) Toward a brain-based componential semantic representation. Cogn Neuropsychol 33:130–174. https://doi.org/10.1080/02643294.2016.1147426
    OpenUrlCrossRefPubMed
  9. ↵
    1. Binder JR,
    2. Desai RH
    (2011) The neurobiology of semantic memory. Trends Cogn Sci 15:527–536. https://doi.org/10.1016/j.tics.2011.10.001 pmid:22001867
    OpenUrlCrossRefPubMed
  10. ↵
    1. Cichy RM,
    2. Pantazis D,
    3. Oliva A
    (2014) Resolving human object recognition in space and time. Nat Neurosci 17:455–462. https://doi.org/10.1038/nn.3635 pmid:24464044
    OpenUrlCrossRefPubMed
  11. ↵
    1. Clarke A,
    2. Devereux BJ,
    3. Randall B,
    4. Tyler LK
    (2015) Predicting the time course of individual objects with MEG. Cereb Cortex 25:3602–3612. https://doi.org/10.1093/cercor/bhu203 pmid:25209607
    OpenUrlCrossRefPubMed
  12. ↵
    1. Clarke A,
    2. Taylor KI,
    3. Devereux B,
    4. Randall B,
    5. Tyler LK
    (2013) From perception to conception: how meaningful objects are processed over time. Cereb Cortex 23:187–197. https://doi.org/10.1093/cercor/bhs002 pmid:22275484
    OpenUrlCrossRefPubMed
  13. ↵
    1. Clarke A,
    2. Tyler LK
    (2015) Understanding what we see: how we derive meaning from vision. Trends Cogn Sci 19:677–687. https://doi.org/10.1016/j.tics.2015.08.008 pmid:26440124
    OpenUrlCrossRefPubMed
  14. ↵
    1. Collins AM,
    2. Loftus EF
    (1975) A spreading-activation theory of semantic processing. Psychol Rev 82:407. https://doi.org/10.1037/0033-295X.82.6.407
    OpenUrlCrossRef
  15. ↵
    1. Devereux BJ,
    2. Clarke A,
    3. Marouchos A,
    4. Tyler LK
    (2013) Representational similarity analysis reveals commonalities and differences in the semantic processing of words and objects. J Neurosci 33:18906–18916. https://doi.org/10.1523/JNEUROSCI.3809-13.2013 pmid:24285896
    OpenUrlAbstract/FREE Full Text
  16. ↵
    1. Dirani J,
    2. Pylkkänen L
    (2023) The time course of cross-modal representations of conceptual categories. Neuroimage 277:120254. https://doi.org/10.1016/j.neuroimage.2023.120254
    OpenUrl
  17. ↵
    1. Fabre-Thorpe M
    (2011) The characteristics and limits of rapid visual categorization. Front Psychol 2:243. https://doi.org/10.3389/fpsyg.2011.00243 pmid:22007180
    OpenUrlCrossRefPubMed
  18. ↵
    1. Fairhall SL,
    2. Caramazza A
    (2013) Brain regions that represent amodal conceptual knowledge. J Neurosci 33:10552–10558. https://doi.org/10.1523/JNEUROSCI.0051-13.2013 pmid:23785167
    OpenUrlAbstract/FREE Full Text
  19. ↵
    1. Harm MW,
    2. Seidenberg MS
    (2004) Computing the meanings of words in reading: cooperative division of labor between visual and phonological processes. Psychol Rev 111:662. https://doi.org/10.1037/0033-295X.111.3.662
    OpenUrlCrossRefPubMed
  20. ↵
    1. Hauk O,
    2. Coutout C,
    3. Holden A,
    4. Chen Y
    (2012) The time-course of single-word reading: evidence from fast behavioral and brain responses. Neuroimage 60:1462–1477. https://doi.org/10.1016/j.neuroimage.2012.01.061 pmid:22281671
    OpenUrlCrossRefPubMed
  21. ↵
    1. Hauk O,
    2. Tschentscher N
    (2013) The body of evidence: what can neuroscience tell us about embodied semantics? Front Psychol 4:50. https://doi.org/10.3389/fpsyg.2013.00050 pmid:23407791
    OpenUrlPubMed
  22. ↵
    1. He K,
    2. Zhang X,
    3. Ren S,
    4. Sun J
    (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770-778.
  23. ↵
    1. Hinton GE,
    2. Shallice T
    (1991) Lesioning an attractor network: investigations of acquired dyslexia. Psychol Rev 98:74. https://doi.org/10.1037/0033-295X.98.1.74
    OpenUrlCrossRefPubMed
  24. ↵
    1. Indefrey P,
    2. Levelt WJ
    (2004) The spatial and temporal signatures of word production components. Cognition 92:101–144. https://doi.org/10.1016/j.cognition.2002.06.001
    OpenUrlCrossRefPubMed
  25. ↵
    1. Kiefer M,
    2. Pulvermüller F
    (2012) Conceptual representations in mind and brain: theoretical developments, current evidence and future directions. Cortex 48:805–825. https://doi.org/10.1016/j.cortex.2011.04.006
    OpenUrlCrossRefPubMed
  26. ↵
    1. King J-R,
    2. Dehaene S
    (2014) Characterizing the dynamics of mental representations: the temporal generalization method. Trends Cogn Sci 18:203–210. https://doi.org/10.1016/j.tics.2014.01.002 pmid:24593982
    OpenUrlCrossRefPubMed
  27. ↵
    1. Kingma DP,
    2. Ba J
    (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:14126980.
  28. ↵
    1. Kriegeskorte N,
    2. Mur M,
    3. Bandettini PA
    (2008) Representational similarity analysis-connecting the branches of systems neuroscience. Front Syst Neurosci 4:249. https://doi.org/10.3389/neuro.06.004.2008 pmid:19104670
    OpenUrlCrossRefPubMed
  29. ↵
    1. Levenshtein VI
    (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics Doklady 10:707–710.
    OpenUrl
  30. ↵
    1. MacGregor LJ,
    2. Pulvermüller F,
    3. Van Casteren M,
    4. Shtyrov Y
    (2012) Ultra-rapid access to words in the brain. Nat Commun 3:1–7. https://doi.org/10.1038/ncomms1715 pmid:22426232
    OpenUrlCrossRefPubMed
  31. ↵
    1. Mahon BZ
    (2015) What is embodied about cognition? Lang Cogn Neurosci 30:420–429. https://doi.org/10.1080/23273798.2014.987791 pmid:25914889
    OpenUrlCrossRefPubMed
  32. ↵
    1. Mahon BZ,
    2. Caramazza A
    (2008) A critical look at the embodied cognition hypothesis and a new proposal for grounding conceptual content. J Physiol Paris 102:59–70. https://doi.org/10.1016/j.jphysparis.2008.03.004
    OpenUrlCrossRefPubMed
  33. ↵
    1. Maris E,
    2. Oostenveld R
    (2007) Nonparametric statistical testing of EEG-and MEG-data. J Neurosci Methods 164:177–190. https://doi.org/10.1016/j.jneumeth.2007.03.024
    OpenUrlCrossRefPubMed
  34. ↵
    1. Martinovic J,
    2. Gruber T,
    3. Hantsch A,
    4. Müller MM
    (2008) Induced gamma-band activity is related to the time point of object identification. Brain Res 1198:93–106. https://doi.org/10.1016/j.brainres.2007.12.050
    OpenUrlCrossRefPubMed
  35. ↵
    1. McRae K,
    2. Cree GS,
    3. Seidenberg MS,
    4. McNorgan C
    (2005) Semantic feature production norms for a large set of living and nonliving things. Behav Res Methods 37:547–559. https://doi.org/10.3758/BF03192726
    OpenUrlCrossRefPubMed
  36. ↵
    1. Medin DL,
    2. Schaffer MM
    (1978) Context theory of classification learning. Psychol Rev 85:207. https://doi.org/10.1037/0033-295X.85.3.207
    OpenUrlCrossRef
  37. ↵
    1. Meteyard L,
    2. Cuadrado SR,
    3. Bahrami B,
    4. Vigliocco G
    (2012) Coming of age: a review of embodiment and the neuroscience of semantics. Cortex 48:788–804. https://doi.org/10.1016/j.cortex.2010.11.002
    OpenUrlCrossRefPubMed
  38. ↵
    1. Meyer K,
    2. Kaplan JT,
    3. Essex R,
    4. Webber C,
    5. Damasio H,
    6. Damasio A
    (2010) Predicting visual stimuli on the basis of activity in auditory cortices. Nat Neurosci 13:667–668. https://doi.org/10.1038/nn.2533
    OpenUrlCrossRefPubMed
  39. ↵
    1. Peelen MV,
    2. Caramazza A
    (2012) Conceptual object representations in human anterior temporal cortex. J Neurosci 32:15728–15736. https://doi.org/10.1523/JNEUROSCI.1953-12.2012 pmid:23136412
    OpenUrlAbstract/FREE Full Text
  40. ↵
    1. Peirce JW,
    2. Gray JR,
    3. Simpson S,
    4. MacAskill MR,
    5. Höchenberger R,
    6. Sogo H,
    7. Kastman E,
    8. Lindeløv J
    (2019) Psychopy2: experiments in behavior made easy. Behav Res Methods 51:195–203. https://doi.org/10.3758/s13428-018-01193-y pmid:30734206
    OpenUrlPubMed
  41. ↵
    1. Perrin F,
    2. Pernier J,
    3. Bertrand O,
    4. Echallier JF
    (1989) Spherical splines for scalp potential and current density mapping. Electroencephalogr Clin Neurophysiol 72:184–187. https://doi.org/10.1016/0013-4694(89)90180-6
    OpenUrlCrossRefPubMed
  42. ↵
    1. Plaut DC
    (2002) Graded modality-specific specialisation in semantics: a computational account of optic aphasia. Cogn Neuropsychol 19:603–639. https://doi.org/10.1080/02643290244000112
    OpenUrlCrossRefPubMed
  43. ↵
    1. Plaut DC,
    2. Shallice T
    (1993) Deep dyslexia: a case study of connectionist neuropsychology. Cogn Neuropsychol 10:377–500. https://doi.org/10.1080/02643299308253469
    OpenUrlCrossRef
  44. ↵
    1. Poeppel D,
    2. Idsardi W
    (2022) We don’t know how the brain stores anything, let alone words. Trends Cogn Sci 26:1054–1055. https://doi.org/10.1016/j.tics.2022.08.010
    OpenUrlCrossRef
  45. ↵
    1. Posner MI,
    2. Keele SW
    (1968) On the genesis of abstract ideas. J Exp Psychol 77:353. https://doi.org/10.1037/h0025953
    OpenUrlCrossRefPubMed
  46. ↵
    1. Quiroga RQ
    (2012) Concept cells: the building blocks of declarative memory functions. Nat Rev Neurosci 13:587–597. https://doi.org/10.1038/nrn3251
    OpenUrlCrossRefPubMed
  47. ↵
    1. Ralph MAL,
    2. Jefferies E,
    3. Patterson K,
    4. Rogers TT
    (2017) The neural and computational bases of semantic cognition. Nat Rev Neurosci 18:42–55. https://doi.org/10.1038/nrn.2016.150
    OpenUrlCrossRefPubMed
  48. ↵
    1. Schendan HE,
    2. Maher SM
    (2009) Object knowledge during entry-level categorization is activated and modified by implicit memory after 200ms. Neuroimage 44:1423–1438. https://doi.org/10.1016/j.neuroimage.2008.09.061
    OpenUrlCrossRefPubMed
  49. ↵
    1. Simanova I,
    2. Van Gerven M,
    3. Oostenveld R,
    4. Hagoort P
    (2010) Identifying object categories from event-related EEG: toward decoding of conceptual representations. PLoS One 5:e14465. https://doi.org/10.1371/journal.pone.0014465 pmid:21209937
    OpenUrlCrossRefPubMed
Back to top

In this issue

The Journal of Neuroscience: 44 (27)
Journal of Neuroscience
Vol. 44, Issue 27
3 Jul 2024
  • Table of Contents
  • About the Cover
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
MEG Evidence That Modality-Independent Conceptual Representations Contain Semantic and Visual Features
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
MEG Evidence That Modality-Independent Conceptual Representations Contain Semantic and Visual Features
Julien Dirani, Liina Pylkkänen
Journal of Neuroscience 3 July 2024, 44 (27) e0326242024; DOI: 10.1523/JNEUROSCI.0326-24.2024

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
MEG Evidence That Modality-Independent Conceptual Representations Contain Semantic and Visual Features
Julien Dirani, Liina Pylkkänen
Journal of Neuroscience 3 July 2024, 44 (27) e0326242024; DOI: 10.1523/JNEUROSCI.0326-24.2024
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Conclusion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • Peer Review
  • PDF

Keywords

  • concepts
  • lexical
  • MEG
  • modality
  • semantic
  • visual

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Articles

  • Altered regional brain activity underlying the higher postoperative analgesic requirements in abstinent smokers: A prospective cohort study
  • Contributions of distinct attention mechanisms to saccadic choices in a gamified, dynamic environment
  • Functional near-infrared spectroscopy reveals functional rewiring between macaque motor areas following post-infarction recovery of manual dexterity
Show more Research Articles

Behavioral/Cognitive

  • Altered regional brain activity underlying the higher postoperative analgesic requirements in abstinent smokers: A prospective cohort study
  • Contributions of distinct attention mechanisms to saccadic choices in a gamified, dynamic environment
  • Functional near-infrared spectroscopy reveals functional rewiring between macaque motor areas following post-infarction recovery of manual dexterity
Show more Behavioral/Cognitive
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Accessibility
(JNeurosci logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.