Abstract
The human medial temporal lobe (MTL) plays a crucial role in recognizing visual objects, a key cognitive function that relies on the formation of semantic representations. Nonetheless, it remains unknown how visual information of general objects is translated into semantic representations in the MTL. Furthermore, the debate about whether the human MTL is involved in perception has endured for a long time. To address these questions, we investigated three distinct models of neural object coding—semantic coding, axis-based feature coding, and region-based feature coding—in each subregion of the human MTL, using high-resolution fMRI in two male and six female participants. Our findings revealed the presence of semantic coding throughout the MTL, with a higher prevalence observed in the parahippocampal cortex (PHC) and perirhinal cortex (PRC), while axis coding and region coding were primarily observed in the earlier regions of the MTL. Moreover, we demonstrated that voxels exhibiting axis coding supported the transition to region coding and contained information relevant to semantic coding. Together, by providing a detailed characterization of neural object coding schemes and offering a comprehensive summary of visual coding information for each MTL subregion, our results not only emphasize a clear role of the MTL in perceptual processing but also shed light on the translation of perception-driven representations of visual features into memory-driven representations of semantics along the MTL processing pathway.
- amygdala
- entorhinal cortex
- hippocampus
- medial temporal lobe
- neural object coding
- parahippocampal cortex
- perirhinal cortex
Significance Statement
In this study, we delved into the mechanisms underlying visual object recognition within the human medial temporal lobe (MTL), a pivotal region known for its role in the formation of semantic representations crucial for memory. In particular, the translation of visual information into semantic representations within the MTL has remained unclear, and the enduring debate regarding the involvement of the human MTL in perception has persisted. To address these questions, we comprehensively examined distinct neural object coding models across each subregion of the MTL, leveraging high-resolution fMRI. We also showed the transition of information between object coding models and across MTL subregions. Our findings significantly contribute to advancing our understanding of the intricate pathway involved in visual object coding.
Introduction
The human medial temporal lobe (MTL) is critical for memory-related behaviors. A hallmark for the MTL to support this function is by forming a highly sparse code of individuals and objects at the level of single neurons (Quian Quiroga et al., 2005; Quian Quiroga, 2012). However, it remains largely unknown how such a sparse coding property emerges. In particular, the MTL is only a few synapses downstream of the high-level visual cortex, where recent primate studies have shown a feature-based coding that differs greatly from that found in the MTL (Chang and Tsao, 2017; Bashivan et al., 2019; Ponce et al., 2019; Bao et al., 2020). It remains elusive how the brain translates information about visual stimuli from the distributed representations of features in the higher visual cortex to the sparse representations of semantics in the MTL.
A recent study has suggested a possible mechanism (Cao et al., 2020b): neurons in the human amygdala and hippocampus (HP) encode a region (i.e., receptive field) in the high-level feature space, whose axes are encoded by neurons in the higher visual cortex, and these neurons become selective only to the identities and objects that fall into this region, demonstrating a sparse coding property. Therefore, neurons showing a region coding can extract information from the distributed representations of features and abstract toward sparse representations of semantics, forming an intermediate code bridging the feature coding in the higher visual cortex and sparse coding in the MTL. However, given the limited spatial coverage of human single-neuron recordings, a detailed delineation of this visual processing pathway is missing. Specifically, it remains unclear how the information flows within the MTL, and there is a lack of a detailed characterization of coding properties in different subregions of the MTL.
In this study, we set out to address this important question by using a high-resolution fMRI covering the entire human MTL with a large set of natural scene stimuli (Allen et al., 2022). We characterized three forms of neural coding of objects. First, the axis coding model predicts that the activity of neurons/voxels parametrically correlates with visual features along the specific axes in the feature space. It is a specific form of a general feature-coding model positing that object representations are encoded by a broad and distributed population of neurons (Freeman, 1975; Hinton, 1984; Rolls et al., 1997; Churchland and Sejnowski, 2016), and recognizing a particular individual or object requires access to many neurons, with each neuron responding to many different faces/objects that share specific visual features (Turk and Pentland, 1991; Freiwald et al., 2009). Axis coding has been found in the nonhuman primate inferotemporal (IT) cortex (Chang and Tsao, 2017; Bashivan et al., 2019; Ponce et al., 2019; Bao et al., 2020) and perirhinal cortex (PRC) (She et al., 2021) using single-neuron recordings and in the human visual cortex using fMRI (Loffler et al., 2005; Carlin and Kriegeskorte, 2017; Cao et al., 2020a). It has also been shown that neurons in the human amygdala parametrically encode degrees of facial expressions of emotion (Wang et al., 2017), and neurons in the human amygdala and HP parametrically encode degrees of various social traits of faces (i.e., encoding axes of a social trait space; Cao et al., 2022b,c). Second, the region-coding model is another specific form of feature coding and posits that neurons encode a receptive field in the high-dimensional feature space. Such neurons have been observed in the human amygdala and HP (Cao et al., 2020b). Third, the sparse coding model posits that explicit object representations in the brain are formed by the highly selective (sparse) but at the same time highly visually invariant neurons (Barlow, 1972; Valentine, 1991; Quian Quiroga et al., 2005; Quian Quiroga, 2012). Neurons that selectively respond to many different images of a specific person's face or a landmark embody sparse coding properties. Such neurons exist in the human MTL (Quian Quiroga et al., 2005; Quian Quiroga, 2012) and are thought to be part of the building blocks for episodic memories (Quian Quiroga, 2012). Notably, in line with sparse coding, MTL neurons demonstrate strong visual selectivity and tuning to different categories of visual stimuli (Kreiman et al., 2000; Rutishauser et al., 2015; Wang et al., 2018).
Importantly, the present study also addresses an ongoing debate about whether the MTL is involved in perceptual processing. Classical studies considered that the MTL is only involved in memory but not perceptual processing (Squire et al., 2004; Eichenbaum et al., 2007); however, other studies argued that the relationship between the visual cortices and MTL is more gradual (Bussey et al., 2005; Murray et al., 2007). This perspective is significant in that it proposes that the MTL facilitates both perceptual and mnemonic behaviors (Murray and Bussey, 1999). To examine these contrasting arguments, extensive experimental data spanning many years has been utilized to determine whether the MTL plays a role in perception (Suzuki and Baxter, 2009; Murray and Wise, 2012). A recent study integrating lesion, electrophysiological, and behavioral results within a shared computational framework has shown that the MTL, especially the PRC, enables visual discrimination behaviors not supported by the canonical visual cortex alone and is thus critically involved in perceptual processing (Bonnen et al., 2021). Furthermore, a face patch has been revealed in the PRC that represents the sensory percept of faces using a distributed axis code (Sliwa and Freiwald, 2017; She et al., 2021). By undertaking a detailed examination of neural object coding schemes and providing a comprehensive summary of the visual coding information for each distinct MTL subregion, our present study resolves this long-standing debate and elucidates the involvement of the MTL in perceptual processing, specifically how it encodes visual features. Critically, our results further shed light on the intricate process of translating perception-driven representations of visual features into memory-driven representations of semantics along the MTL processing pathway, reconciling the taxonomies.
Materials and Methods
Data
The data were from the Natural Scenes Dataset (NSD) (Allen et al., 2022) that contained a high-resolution (1.8 mm isotropic resolution with whole-brain coverage) BOLD-fMRI response to thousands of images from the Microsoft's COCO database from eight participants (two males and six females). In this study, we used the single-trial betas of the upsampled 1.0 mm high-resolution preparation of the NSD data (Allen et al., 2022). We only analyzed the images that were seen three times, and we averaged the single-trial β across the three repetitions to create our voxel responses. We removed all trials whose β was below 0 from further analysis.
To ensure that each object category had a sufficient and balanced number of images, we only included the 20 object categories that had more than 100 images. For object categories that had more than 100 images, we randomly selected 100 images for analysis. In total, we used 2,000 images for analysis for each participant. Although different participants viewed different sets of images, we used the same object categories for all the participants. The object categories include person, bear, bench, book, bowl, cat, cow, dog, giraffe, train, airplane, bed, bird, bottle, car, chair, clock, elephant, toilet, and zebra.
Regions of interest (ROIs)
We analyzed the following subregions (i.e., ROIs) of the MTL: parahippocampal cortex (PHC), PRC, entorhinal cortex (ERC), HP (including CA1, CA3, and DG), and amygdala. These ROIs were provided by the NSD (Allen et al., 2022). Specifically, the amygdala ROI was derived using the FreeSurfer (version 6.0) automatic volumetric segmentation (aseg.presurf/mgz file), and the other ROIs were derived using an established manual protocol in the volume space (Berron et al., 2017).
Feature extraction and construction of feature spaces
We used the well-known deep neural network (DNN) implementation based on the ResNet convolutional neural network architecture (He et al., 2016) to extract features for each image. To assess the model's ability to discriminate between different categories and to ensure its adequacy as a feature extractor, we carried out a fine-tuning procedure. We first fine-tuned the fully connected layer exclusively with the images (n = 2,000) used in the present study, while all other layers remained frozen. We then adjusted the output layer of our model to have 20 units, corresponding to the number of categories in the dataset. For training, we divided the stimuli into a training set containing two-thirds of the stimuli and a testing set comprising the remaining stimuli. The Adam optimizer was used with an initial learning rate of 5 × 10−4 for the training process, which consisted of 10 epochs in total. To facilitate the convergence of the loss function, we applied a learning rate scheduler after each epoch, setting the γ value to 0.9. During fine-tuning, we updated the weights by calculating the cross-entropy loss on random batches of four object images, which were scaled to 224 × 224 pixels for backpropagation. The model achieved an accuracy of approximately 71%, which aligns with the fine-tuning results reported in other studies (Kornblith et al., 2019). The accuracy in category classification suggested that the network effectively extracted relevant features and was suitable for further analysis.
Based on the same DNN features, we subsequently applied two approaches to reduce the high-dimensional features and construct feature spaces. It is worth noting that neither feature extraction nor the construction of feature spaces involved any neural responses; instead, they were solely based on the stimuli. Subsequently, after extracting the features and constructing the feature spaces, we employed them to fit the neural responses.
First, to investigate the region-based feature coding, we applied the t-distributed stochastic neighbor embedding (t-SNE) method to transform high-dimensional features into a two-dimensional feature space. The region-based feature coding aimed to identify a receptive field in the feature space, and t-SNE facilitated the creation of such a feature space. Specifically, the feature space constructed using t-SNE exhibited an organized structure, with images of the same object category clustered together. Similar images appeared adjacent to each other, and the feature dimensions represented meaningful variations from one object category to another. These properties resembled the functional organization found in the human MTL. We applied t-SNE with the cost function parameter (Perp) of t-SNE representing the perplexity of the conditional probability distribution induced by a Gaussian kernel. Because a sparse distribution of objects could lead to a larger tuning region, we adjusted the distribution of objects using the t-SNE perplexity parameter so that the objects were distributed approximately homogeneously (Extended Data Fig. 4-1A). Notably, we conducted an analysis of robustness by varying the perplexity parameter and examining the resulting number of the region-coding voxels (Extended Data Fig. 4-1). We found that around the choice of our perplexity parameter, the distribution of objects was similar across the perplexity parameters (Extended Data Fig. 4-1A), and the pairwise distance between objects was correlated between the perplexity parameters (Extended Data Fig. 4-1B). As a consequence, the number of selected region-coding voxels in the layer res5b was stable across the perplexity parameters (Extended Data Fig. 4-1C), confirming that our results were robust to the perplexity parameter. It is worth noting that we did not use this analysis to select the perplexity parameter leading to the highest number of voxels, which may introduce bias in the selection process.
Second, although a feature space was not necessary for investigating axis-based feature coding [see below for details on our use of partial least squares (PLS) regression with DNN feature maps to select axis-coding voxels], we conducted a principal component analysis (PCA) to reduce high-dimensional features. This allowed us to visualize the distribution of visual features (as illustrated in Fig. 3A,B) and was in line with the methodology employed in prior studies (Bao et al., 2020). Additionally, we correlated the axis-coding voxels with PCA features to examine the specific features encoded by each voxel (Fig. 3H). PCA was conducted using the “pca” function in MATLAB, employing the singular value decomposition algorithm.
The MTL and an overview of the neural object coding models. A, Schematic illustration of the MTL of a single participant. Color coding shows the subregions of the MTL. PHC, parahippocampal cortex; PRC, perirhinal cortex; ERC, entorhinal cortex; HP, hippocampus; Amyg, amygdala. B, Number of voxels in each subregion of the MTL. Error bars indicate SEM across participants. C, Schematic illustration of the processing stages in the MTL (Squire et al., 2004). D, Schematic illustration of different neural object coding models. In the semantic-coding model (i.e., sparse coding model), voxels selectively respond to different images of a specific object category. In the axis-coding model, the activity of voxels parametrically correlates with visual features along specific axes in feature space. In the region-coding model, voxels encode a receptive field in the high-dimensional feature space. E, Task. Participants observed a series of colorful natural scene images and determined whether each image had been presented before (Allen et al., 2022). Participants were required to keep their gaze fixed at the center. F, Object categories and sample stimuli. The scenes, sourced from Microsoft's COCO dataset, are extensively annotated with object details.
Semantic coding. A–D, Distribution of voxels showing semantic coding in the MTL. A, Proportion of all category-selective voxels. B, Proportion of SC-selective voxels. C, Proportion of MC-selective voxels. D, Percentage of SC voxels in all category-selective voxels. Each dot represents a participant. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points the algorithm considers to be not outliers, and the crosses denote the outliers. The colored asterisks for each subregion indicate a significantly above-chance (5%; dashed line) proportion of voxels using a two-tailed one-sample t test. The black asterisks between subregions indicate a significant difference in the proportion of category-selective voxels using a two-tailed paired t test. *P < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001. E, Summary of encoded object categories. The size of the text is proportional to the number of voxels that most prefer that object category. The lower shows the proportion of voxels that most prefer the object category. Error bars indicate SEM across participants. F, DOS index. The asterisks indicate a significant difference using a two-tailed paired t test. *P < 0.05 and **p < 0.01. G, Ordered average responses from the most- to the least-preferred category for each MTL subregion. Error bars indicate SEM across participants. H, Difference in response ratio between the most-preferred and second most-preferred categories.
Axis coding. A, B, The object feature space constructed by PCA. A, For clarification, only 200 images (10 images per object category) are plotted. B, Each dot represents an image. Color coding indicates the object category. C, D, An example voxel from the PHC showing axis coding. E, F, An example voxel from the PRC showing axis coding. Both voxels had a significant PLS regression with DNN feature maps. We show the correlation between neural response and the first/second PC of the feature map. D, F, Each dot represents an object image, and the gray line denotes the linear fit. Color coding indicates the object category. G, Proportion of voxels showing axis coding in each subregion of the MTL. Legend conventions as in Figure 2. H, Percentage of axis-coding voxels that significantly correlated with each PC. Error bars indicate SEM across participants. The asterisks indicate a significant difference between the PHC and PRC using a two-tailed paired t test. *P < 0.05. I, Axis-coding voxels from an example participant, superimposed on anatomical sections of the standardized MNI T1-weighted brain template. The contours delineate the MTL subregions. Color coding indicates the strength of axis coding (i.e., model predictability), which was assessed using Pearson’s correlation between the predicted and actual neural response in the test dataset. J, Correlation between PHC and PRC DMs. The left matrix shows the average PHC DM across participants, and the right matrix shows the average PRC DM across participants. Color coding shows dissimilarity values (1−r). For each participant, we calculated the Spearman correlation coefficient between PHC and PRC DMs. For group results, we compared the correlation coefficients across participants to 0 and obtained statistical significance using a two-tailed one-sample t test. The order of category indices is the same as in Figure 2E. See Extended Data Figure 3-1 for replication of axis coding using the other layers of the ResNet and the AlexNet. See Extended Data Figure 3-2 for RSA for axis coding.
Figure 3-1
Control analysis for axis coding. (A-F) Proportion of voxels showing axis coding using features from different ResNet layers. (G-J) Proportion of voxels showing axis coding using features extracted from the AlexNet. It is worth noting that the fc6 layer of the AlexNet was used to show axis coding in the primate inferotemporal (IT) cortex (Bao et al 2020). (K-M) Linear regression with the two dimensions of the t-SNE feature space. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points the algorithm considers to be not outliers, and the crosses denote the outliers. Asterisks for each subregion indicate a significantly above-chance (5%; dashed line) proportion of voxels using two-tailed one-sample t-test. **: P < 0.01, ***: P < 0.001, and ****: P < 0.0001. Download Figure 3-1, TIF file.
Figure 3-2
Representational similarity analysis for axis coding. (A) Dissimilarity matrix (DM) for each MTL subregion. Each DM was averaged across participants. Color coding shows dissimilarity values (1−r). (B) Correspondence between each pair of MTL subregions. For each participant, we calculated the Spearman correlation coefficient between DMs. For group result, we compared the correlation coefficients across participants to 0 and obtained statistical significance using a two-tailed one-sample t-test. (C) Comparison of correspondence (i.e., Spearman’s ρ) between pairs of MTL subregions. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points the algorithm considers to be not outliers, and the crosses denote the outliers. Asterisks indicate a significant difference using two-tailed paired t-test. +: P < 0.1, *: P < 0.05, **: P < 0.01, ***: P < 0.001, and ****: P < 0.0001. Download Figure 3-2, TIF file.
It is worth noting that we did not employ the same feature space to study both axis-based and region-based feature coding. On the one hand, t-SNE provided better dimension reduction than PCA for region-based coding because the first two principal components (PCs) in PCA did not capture enough variance to cluster objects from the same category. On the other hand, we were able to replicate our results using the two dimensions of the t-SNE space for axis-based coding (Extended Data Fig. 3-1K–M) as well as for the transition from axis-based coding to region-based coding (Extended Data Fig. 6-2). Since our study did not specifically focus on comparing these models but rather aimed to survey them, we applied the conventional method best suited for each model (i.e., PLS/PCA for axis coding and t-SNE for region coding) without unifying them. Furthermore, the analysis of model transitions did not require the use of the same feature space; in fact, models constructed using different feature spaces may suggest better generalizability.
Selection of semantic-coding voxels
To select the semantic-coding (i.e., category-selective) voxels, we first used a one-way ANOVA to identify the voxels with a significantly unequal response to different object categories. We next imposed an additional criterion to identify the selected categories: the neural response of a category was 1.5 standard deviations (SD) above the mean of neural responses from all categories. These identified object categories whose response stood out from the global mean were the encoded object categories. We refer to the voxels that encoded a single object category as single-category (SC) voxels, and we refer to the voxels that encoded multiple categories as multiple-category (MC) voxels. Our previous study has shown that this procedure is effective in identifying category-selective neurons (Cao et al., 2020b).
Selection of axis-coding voxels
To identify the axis-coding voxels (i.e., voxels encoding a linear combination of visual features), we employed a PLS regression with DNN feature maps. The PLS method has been demonstrated to effectively fit neural responses with high-dimensional features (Yamins et al., 2014; Ponce et al., 2019). We implemented the PLS regression using the “plsregress” function in MATLAB based on SIMPLS, which extracts PLS factors that maximize the covariance from the original variables (de Jong, 1993). For PLS, we selected the number of components (n = 4) with a 10-fold cross-validation procedure, ensuring that they explain at least 80% of the variance. This procedure minimizes the prediction error when fitting the model. For both approaches, we used a permutation test with 100 runs to determine whether a voxel encoded a significant visual model (i.e., the voxel encoded the dimensions of the feature space). Note that we confirmed the analysis using a permutation test with 1,000 runs in one subject and ensured that the results were reliable using a permutation test with 100 runs. In each run, we randomly shuffled the object labels and used 50% of the objects as the training dataset. We used the training dataset to construct a model (i.e., deriving regression coefficients), predicted responses using this model for each object in the remaining 50% of objects (i.e., test dataset), and computed Pearson’s correlation between the predicted and actual response in the test dataset. The distribution of correlation coefficients computed with shuffling (i.e., null distribution) was eventually compared with the one without shuffling (i.e., observed response). If the correlation coefficient of the observed response was greater than 95% of the correlation coefficients from the null distribution, this visual model was considered significant. This procedure has been shown to be very effective in selecting neurons with significant face models (Chang and Tsao, 2017). The correlation coefficient could also indicate the model's predictability and thus be compared between different voxels.
Selection of region-coding voxels
To select the region-coding voxels, we first estimated a continuous response density map in the feature space by smoothing the neural response map using a 2D Gaussian kernel (kernel size = feature dimension range × 0.2; SD = 4). We then estimated the statistical significance for each pixel by permutation testing: in each of the 1,000 runs, we randomly shuffled the labels of objects. We calculated the p-value for each pixel by comparing the observed response density value to those from the null distribution derived from the permutation. We applied a mask to exclude pixels from the edges and corners of the response density map where there were no objects because these regions were susceptible to false positives given our procedure. We lastly selected the region with significant pixels (permutation p < 0.01, cluster size >2.5% of the pixels within the mask). If a voxel had a region with significant pixels, the voxel was defined as a “region-coding voxel” and demonstrated “region-based feature coding.” Our previous study has shown that this procedure is effective in identifying the region-coding neurons (Cao et al., 2020b).
Depth of selectivity (DOS) index
To summarize the response of category-selective voxels, we quantified the DOS for each voxel:
Response ratio
The response ratio was calculated for each object category by first dividing by the response of the most preferred category and then ranking the categories from the most preferred to the least preferred. The response ratio of the most-preferred category is thus 1. A steeper change from the best to the worst category indicates a stronger category selectivity.
Representational similarity analysis (RSA) for axis coding
For RSA (Kriegeskorte et al., 2008) of axis-coding voxels, dissimilarity matrices (DMs) are symmetrical matrices representing the dissimilarity between all pairs of object categories. In a DM, larger values indicate greater dissimilarity between pairs, with the smallest possible value being the similarity of a condition to itself (dissimilarity of 0). We used Pearson’s correlation to calculate DMs and employed the Spearman correlation to determine the correspondence between these DMs. The Spearman correlation, chosen for its independence from assuming a linear relationship (Stolier and Freeman, 2016), involved a Fisher’s z-transformation on Pearson's r to ensure that the sample distribution approximated normality. To mitigate the impact of consistency between object images for the same category on the correspondence between DMs, we averaged neural responses across object images for each object category and computed the DM between object categories. We assessed the correspondence between DMs for each participant and compared these values across participants using a two-tailed one-sample t test against 0 and a two-tailed two-sample t test between pairs of MTL subregions. It is worth noting that this analysis exclusively considered axis-coding voxels.
Correlation between coding models using pairwise distance in the object space
We employed a pairwise distance metric (Grossman et al., 2019) to explore the relationship between voxels with different coding models across the various brain areas. To diminish the impact of image consistency within the same category, we averaged the neural responses across object images for each object category, focusing on between-category distance rather than within-category distance. For each pair of object categories, we employed the dissimilarity value (1−Pearson's r) (Kriegeskorte et al., 2008) as our distance metric. This metric was computed between neural responses of voxels with a specific coding model (i.e., axis coding or region coding; note that our analysis was not restricted to voxels exhibiting a single coding model) within a given brain area. Subsequently, we correlated the pairwise neural distance metrics between different coding models across distinct brain areas.
Population decoding of object categories
We pooled all voxels from a group into a large pseudo-population. Neural responses were z-scored individually for each voxel to give equal weight to each voxel. We used a maximal correlation coefficient (MCC) classifier as implemented in the MATLAB neural decoding toolbox (Meyers, 2013). The MCC estimates a mean template for each class i and assigns the class for a test trial. We randomly selected 75% of the trials as the training set, while the remaining trials were used as the validation data for assessing the model's accuracy. This process was repeated 1,000 times. The statistical significance of the decoding performance for each group of voxels against chance was estimated by calculating the percentage of runs (1,000 in total) that had an accuracy below chance (i.e., 5%). This decoding approach is consistent with our prior studies (Wang et al., 2019; Cao et al., 2020b) and has proven to be highly effective in the study of neural population activity.
Results
Overview
We investigated the following brain regions in the human MTL: PHC, PRC, ERC, HP (including CA1, CA3, and DG), and amygdala (Fig. 1A,B). Classical MTL pathways suggest that the PHC and PRC receive information from the neocortices and provide inputs to the ERC, which in turn provides inputs to the HP and amygdala (Squire and Zola-Morgan, 1991; Squire et al., 2004; Wixted and Squire, 2011) (Fig. 1C). We investigated three neural coding models for each voxel (Fig. 1D): (1) if a voxel exhibited an elevated response to a category or several categories of objects, it showed semantic coding (Fig. 1D left). (2) If a voxel's response correlated parametrically with the visual features of objects, it showed axis coding (Fig. 1D middle). (3) If a voxel's response was elevated for a specific region in the feature space (note that the encoded region can be independent of semantic categories, as a voxel can encode a subset of stimuli from a category or multiple categories, as long as the encoded stimuli are adjacent in the feature space), it showed region coding (Fig. 1D right). To test these neural object coding models (note that these models are not mutually exclusive; see Discussion), we employed an established dataset (Allen et al., 2022) where eight participants viewed 9,40 7± 522.9 (mean ± SD) natural scene images while their brain responses were recorded using 7T-fMRI (Fig. 1E). We selected 20 categories of images that had at least 100 images per category for further analysis, and we analyzed 2,000 images for each participant (Fig. 1F; see Materials and Methods). Although each participant viewed a different set of images, we used the same 20 categories for all participants.
Semantic coding in the MTL
We first analyzed how different subregions of the MTL encoded visual categories, a hallmark of the human MTL (Kreiman et al., 2000). To select category-selective voxels, we first used a one-way (1 × 20) ANOVA to identify voxels with a significantly unequal response to different object categories (p < 0.05). We imposed a second criterion to identify which object category a voxel was selectively responding to (selected categories): the neural response to such an object category was required to be at least 1.5 SD above the mean of the neural responses to all object categories. We found that all five subregions of the MTL had a significantly above-chance number of category-selective (i.e., semantic-coding) voxels (two-tailed one-sample t test against 5%; Fig. 2A), suggesting that all five subregions were involved in coding object categories. The PHC and PRC had the highest percentage of category-selective voxels (Fig. 2A; two-tailed paired t test between subregions: p < 0.001), whereas the ERC had the lowest percentage of category-selective voxels (Fig. 2A). Of the category-selective voxels, some responded to a single object category only (referred to here as SC voxel; Fig. 2B), and others responded to multiple object categories (referred to here as MC voxel; Fig. 2C). A similar pattern of result was observed: the percentage of category-selective voxels decreased from the PHC and PRC to the ERC but increased from the ERC to the amygdala (Fig. 2A–C). Furthermore, the percentage of SC voxels in all category-selective voxels was higher in the PHC and amygdala (Fig. 2D). Interestingly, different subregions of the MTL preferentially encoded different object categories, although there was also consistency across subregions (Fig. 2E). For example, “train” and “zebra” were two preferred categories from the PHC to the HP, while “bed” was a preferred category from the PRC to the amygdala.
We next investigated the sparseness in category coding across MTL subregions. We used a DOS index (Fig. 2F) and ordered responses from the most- to the least-preferred categories (Fig. 2G) to quantify the sparseness in category coding. We found that the amygdala had the highest DOS (Fig. 2F) and the steepest ordered responses from the most- to the least-preferred categories (Fig. 2G). This was also captured by the larger decrease in neural response between the most-preferred and the second most-preferred stimuli in the amygdala (Fig. 2H). Together, these results indicate that the sparseness and category selectivity are highest at the apex of the processing pathway in the MTL.
Together, our results suggest that the MTL is involved in semantic coding and different subregions of the MTL had different coding properties.
Axis coding in the MTL
We next investigated whether subregions in the MTL exhibited axis coding. We used the ResNet (He et al., 2016), a DNN pretrained for image recognition, to extract the visual features from each image [Fig. 3A,B; see Extended Data Fig. 3-1 for replication of our results using the other layers of the ResNet as well as using the AlexNet (Krizhevsky et al., 2012); see also Extended Data Fig. 3-1 for a linear regression with the two dimensions of the t-SNE feature space]. We used DNN features to fit for the neural response of each voxel (see Materials and Methods for details). We identified voxels whose response parametrically varied as a function of visual features. For example, a voxel from the PHC encoded the change from food items to outdoor natural scenes (see Fig. 3C,D for illustration), and a voxel from the PRC encoded the change from outdoor animals to indoor scenes (see Fig. 3E,F for illustration). However, we found that only the PHC and PRC had a significant percentage of voxels showing axis coding (Fig. 3G,I). The HP and amygdala were not involved in axis coding (Fig. 3G,I), consistent with our previous work (Cao et al., 2020b, 2022c). Furthermore, although both the PHC and PRC encoded more of the first few PCs of visual features [similar to Bao et al., (2020) in the visual cortex], the PHC had a higher percentage of voxels encoding the first two PCs compared with the PRC (Fig. 3H).
We next investigated whether the PHC and PRC represented the same axis coding information using RSA (Kriegeskorte et al., 2008; see Materials and Methods). We found that the PHC and PRC had a similar representational structure between object categories (Fig. 3J), suggesting that similar axis-coding information was shared between these subregions. Notably, in comparison with other pairs of MTL subregions that lacked axis coding (considered as baseline conditions), the correspondence between the PHC and PRC was significantly higher (Extended Data Fig. 3-2).
Together, our results suggest that the PHC and PRC subregions of the MTL are involved in axis coding.
Region coding in the MTL
To study region coding, we used the same visual features extracted by the pretrained ResNet DNN (He et al., 2016) for axis coding. We constructed a two-dimensional object feature space using t-SNE for dimension reduction of these DNN features (see Materials and Methods; see Extended Data Fig. 4-1 for an analysis of the robustness of the t-SNE perplexity parameter). The dimensions (or axes) of the object feature space represented the major variations in images that led to the successful recognition of the object categories by the DNN. The feature space demonstrated an organized structure (Fig. 4A). For example, images of the same object category were clustered, and Feature Dimension 1 represented the change from indoor to outdoor scenes. Importantly, regardless of object categories, visually similar images appeared adjacent to each other (see Extended Data Fig. 4-2 for a quantification based on DNN features). Note that this object feature space was derived solely from the images without using any neural responses.
Region coding. A, The visual feature space constructed by t-SNE for the DNN layer res5b. The stimuli analyzed for Participant 1 are shown in this space. B, Proportion of voxels showing region coding in each subregion of the MTL. Legend conventions as in Figure 2. C, The number of objects encoded by region-coding voxels in each subregion of the MTL. D, The number of categories encoded by region-coding voxels in each subregion of the MTL. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points the algorithm considers to be not outliers, and the crosses denote the outliers. E–J, Two example voxels that show region coding. E–G, Amygdala. H–J, PHC. E, H, Projection of neural responses onto the object feature space. Each dot represents an image. Each color represents a different object category. The size of the dot indicates the neural response. F, I, Estimate of the response density in the feature space. By comparing observed versus permuted responses, we could identify a region where the observed neuronal response was significantly higher in the feature space. This region was defined as the tuning region of a voxel. G, J, Neural responses to 20 object categories. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points the algorithm considers to be not outliers, and the crosses denote the outliers. Color coding indicates the object category. See Extended Data Figure 4-1 for an analysis of the robustness of the t-SNE perplexity parameter. See Extended Data Figure 4-2 for analysis of DNN feature distance between objects.
Figure 4-1
Analysis of the robustness of the t-SNE perplexity parameter. (A) Res5b feature spaces constructed using different t-SNE perplexity parameters. The number of region-coding voxels selected within each feature space is shown in the title. The parameter used in the analysis of neural response is labeled in red. PerP: perplexity parameter. Each dot represents an object image and each color represents an object category. The magenta contour delineates the boundary of a mask that excludes pixels from the edges and corners. (B) Spearman’s correlation of pairwise distance of objects between feature spaces constructed using different perplexity parameters. (C) Number of region-coding voxels selected as a function of t-SNE perplexity parameter in the layer res5b (illustrated using the data from the PHC). The parameter used in the analysis of neural response is shown in red. The dashed line shows the chance level of significant voxels (5% of all voxels). Download Figure 4-1, TIF file.
Figure 4-2
Deep neural network (DNN) feature distance between objects. We calculated a distance matrix of features (1−Pearson’s r) between each object, grouped by object categories. It is worth noting that in earlier layers, DNN features from objects of the same category were not highly correlated, suggesting that these objects were not grouped together but distributed in the feature space. In later layers, DNN features from objects of the same category were highly correlated (i.e., short distance), suggesting that these objects were clustered in the feature space. Download Figure 4-2, TIF file.
Using our established methods to study region coding (Cao et al., 2020b, Wang et al., 2022), we found that while all subregions of the MTL had a significant number of voxels showing region coding (see Fig. 4E–J for examples and Fig. 4B–D for group summary), region coding primarily appeared in the PHC and PRC (Fig. 4B). The mean number of objects (Fig. 4C) and object categories (Fig. 4D) decreased from the PHC to the other subregions of the MTL, suggesting that the size of the tuning regions decreased along the MTL processing stream. Together, our results show that the MTL is involved in region coding.
Summary of coding models
In summary, we found semantic coding across all five subregions of the MTL (Fig. 2A), axis coding only in the PHC and PRC (Fig. 3G), and region coding primarily in the PHC and PRC (Fig. 4B). Notably, the relative proportion of coding voxels changed across MTL subregions (Fig. 5A). Specifically, we found that the proportion of voxels with semantic coding increased, while the proportion of voxels with axis coding decreased from the PHC to the amygdala (Fig. 5A). The coding voxels in the amygdala were primarily semantic-coding only, but the PHC involved a substantial proportion of feature-coding voxels (i.e., the voxels were axis and/or region coding; Fig. 5A,B). We further showed that the ratio between the number of semantic-coding and axis-coding voxels increased monodically from the PHC to the amygdala (Fig. 5C). Together, our results suggest that the MTL evolves from more feature-based coding toward more abstract semantic coding along its processing pathway.
Summary of coding models. A, Pie charts show the relative proportion of voxels for each type of coding. The numbers in the pie charts indicate the absolute number of voxels as well as the percentage of all voxels in each subregion. B, Distribution of two types of coding voxels (semantic only vs feature coding [semantic and axis and region]) from an example participant, superimposed on anatomical sections of the standardized MNI T1-weighted brain template. The contours delineate the MTL subregions. Color coding indicates the coding model. C, The ratio between the number of semantic-coding voxels and axis-coding voxels. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points the algorithm considers to be not outliers, and the crosses denote the outliers.
Transition between coding models
Lastly, we investigated how axis coding translated into region coding. Using a pairwise distance metric (see Materials and Methods) extensively employed in studying the correspondence between neural coding models and brain areas (Grossman et al., 2019; Cao et al., 2020b), we observed that voxels exhibiting axis coding had a similar representational structure to those showing region coding (Fig. 6A,B). Importantly, voxels with region coding across all MTL subregions shared information with voxels demonstrating axis coding (Fig. 6A,B). This correspondence between region coding and axis coding supports our hypothesis that neurons/voxels with axis coding form the basis of the feature space, where neurons/voxels with region coding encode a receptive field.
The transition from axis coding to region coding. A, B, Correlation of pairwise distance between axis-coding and region-coding voxels across different subregions of the MTL. A, Correlation with axis-coding voxels from the PHC. B, Correlation with axis-coding voxels from the PRC. The thickness of the lines indicates the correlation strength. The correlation coefficients (mean ± SD) are labeled above the lines. The asterisks indicate a significant correlation using a two-tailed one-sample t test against 0. *P < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001. C, Comparison of DNN full feature distance in region-coding voxels. Feature distance was calculated between categories using the average DNN full feature of images from each category. D, Comparison of axis neural distance in region-coding voxels. Neural distance was calculated between categories using the response of axis-coding voxels from the PHC and PRC. S-S, selective–selective category pairs with both categories falling within a coding region; S-NS, selective–nonselective (S-NS) category pairs with only one category falling within a coding region. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points the algorithm considers to be not outliers, and the crosses denote the outliers. We first averaged the feature/neural distance of all S-S category pairs and all S-NS category pairs for each voxel and participant, and then we compared the feature/neural distance between S-S and S-NS category pairs across participants. The asterisks indicate a significant difference between S-S and S-NS pairs using a two-tailed one-sample t test. *P < 0.05, **p < 0.01, and ****p < 0.0001. E, F, Population decoding of object category. E, Decoding using category-selective voxels from each MTL subregion. F, Decoding using axis-coding voxels and region-coding voxels. The classifier was trained using axis-coding voxels from the PHC and PRC and tested on region-coding voxels from each MTL subregion. The asterisks indicate a significant above-chance decoding performance using a two-tailed one-sample t test against chance (5%; dashed line). **P < 0.01, ***p < 0.001, and ****p < 0.0001. See Extended Data Figure 6-1 for decoding with a classifier trained on region-coding voxels and tested on axis-coding voxels. See Extended Data Figure 6-2 for analyses using axis-coding and region-coding voxels both selected from the t-SNE feature space. See Extended Data Figure 6-3 for comparisons of alignment with DNN feature space between axis-coding and region-coding voxels.
Figure 6-1
Decoding with a classifier trained on region-coding voxels and tested on axis-coding voxels in each MTL subregion. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points the algorithm considers to be not outliers, and the crosses denote the outliers. Asterisks indicate a significant above-chance decoding performance using two-tailed one-sample t-test against chance (5%; dashed line). **: P < 0.01 and ****: P < 0.0001. Download Figure 6-1, TIF file.
Figure 6-2
Transition from axis coding to region coding using voxels both selected from the t-SNE feature space. (A, B) Correlation of pairwise distance between axis coding and region-coding voxels across different subregions of the medial temporal lobe (MTL). (A) Correlation with axis-coding voxels from the parahippocampal cortex (PHC). (B) Correlation with axis-coding voxels from the perirhinal cortex (PRC). The thickness of the lines indicate the correlation strength. The correlation coefficients (mean ± SD) are labeled above the lines. Asterisks indicate a significant correlation using two-tailed one-sample t-test against 0. *: P < 0.05, **: P < 0.01, ***: P < 0.001, and ****: P < 0.0001. (C) Comparison of axis neural distance in region-coding voxels. Neural distance was calculated between categories using the response of axis-coding voxels from the PHC and PRC. S-S: selective-selective category pairs with both categories falling within a coding region. S-NS: selective-non-selective (S-NS) category pairs with only one category falling within a coding region. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points the algorithm considers to be not outliers, and the crosses denote the outliers. We first averaged the feature/neural distance of all S-S category pairs and all S-NS category pairs for each voxel and participant, and then we compared feature/neural distance between S-S vs. S-NS category pairs across participants. Asterisks indicate a significant difference between S-S and S-NS pairs using two-tailed one-sample t-test. *: P < 0.05, **: P < 0.01, and ****: P < 0.0001. (D) Decoding using axis-coding voxels and region-coding voxels. The classifier was trained using axis-coding voxels from the PHC and PRC and tested on region-coding voxels from each MTL subregion. Asterisks indicate a significant above-chance decoding performance using two-tailed one-sample t-test against chance (5%; dashed line). **: P < 0.01, ***: P < 0.001, and ****: P < 0.0001. Download Figure 6-2, TIF file.
Figure 6-3
Alignment with the deep neural network (DNN) feature space for each medial temporal lobe (MTL) subregion using representational similarity analysis. Dissimilarity matrix (DM) was calculated separately for axis-coding voxels and region-coding voxels from each MTL subregion and then correlated with the DNN DM. Correspondence (i.e., Spearman’s ρ) with the DNN DM (i.e., DNN feature space) was compared between axis-coding voxels and region-coding voxels. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points the algorithm considers to be not outliers, and the crosses denote the outliers. For all comparisons, we did not observe a significant difference between axis-coding and region-coding voxels using two-tailed paired t-test. n.s.: not significant. Download Figure 6-3, TIF file.
We next directly tested whether the neural axes encoded by the axis-coding voxels formed a feature space for region coding. We first confirmed that in the DNN feature space, the encoded objects by region-coding voxels (i.e., objects falling in the coding regions) had a shorter DNN feature distance compared with those nonencoded objects (Fig. 6C), as expected. Notably, the encoded objects by region-coding voxels had a shorter neural distance calculated by PHC and PRC axis-coding voxels (Fig. 6D), suggesting that the PHC and PRC axis-coding voxels formed the axes of the neural object space for region-coding voxels.
How do axis and region coding contribute to semantic coding? We employed a population decoding approach to answer this question. Using this approach, we first showed a significant above-chance decoding of object categories in category-selective voxels from each MTL subregion (Fig. 6E), as expected. Importantly, we found that a classifier trained using PHC and PRC axis-coding voxels could predict object category in region-coding voxels (Fig. 6F; see also Extended Data Fig. 6-1 for above-chance decoding with a classifier trained on region-coding voxels and tested on axis-coding voxels in each MTL subregion). This cross-model decoding not only suggests that axis-coding and region-coding voxels share object category information but also indicates that axis coding and region coding contribute to semantic coding.
It is worth noting that we replicated the above results using axis-coding and region-coding voxels both selected from the t-SNE feature space (Extended Data Fig. 6-2). Furthermore, we demonstrated that axis-coding and region-coding voxels from each MTL subregion did not exhibit significantly different alignment with the DNN feature space (Extended Data Fig. 6-3), indicating that the change in neural representations could not be attributed to differences in alignment with the DNN feature space.
Discussion
In this study, we investigated how the human MTL encodes general objects by examining three different neural coding models. Using high-resolution fMRI, we demonstrated that semantic-based category coding was present throughout the MTL, with a higher prevalence observed in the PHC and PRC. Notably, the earlier subregions of the MTL, such as the PHC and PRC, encoded objects based on visual features. Specifically, the PHC and PRC exhibited both axis-based feature coding, which has been observed in the IT cortex in nonhuman primates (Chang and Tsao, 2017; Bao et al., 2020), and region-based feature coding, as proposed in our recent studies (Cao et al., 2020b; Wang et al., 2022). Our results also suggested a transition from axis coding in the PHC and PRC to region and semantic coding in other parts of the MTL. Together, our findings not only provide a detailed characterization of visual object coding schemes but also delineate the transitions between these coding schemes.
Visual object processing pathway
While extensive studies have highlighted the critical role of the ventral visual stream, which includes the occipital and the ventral temporal cortex (VTC), in primate object perception (Schwarzlose et al., 2008; DiCarlo et al., 2012; Duchaine and Yovel, 2015; Bao et al., 2020), emerging evidence supports the involvement of the MTL in this process (Murray et al., 2007; Bonnen et al., 2021; Bonnen et al., 2023). First, the presence of direct connections between various MTL structures and VTC subregions provides a possible neural pathway for object processing (Kravitz et al., 2013). Specifically, the anterior inferior temporal cortex sends inputs to the PRC, PHC, and amygdala, and these regions, in turn, project back to multiple areas in the VTC (Baizer et al., 1993; Middleton and Strick, 2000; Kondo et al., 2003). The PRC and PHC also project to the ERC (Insausti and Amaral, 2008; Qin et al., 2016) and the HP (Leonard et al., 1995), although direct projections from the VTC to the HP have also been identified (Kravitz et al., 2011). Second, despite the long-standing debates regarding the mnemonic (Squire et al., 2004; Eichenbaum et al., 2007) versus perceptual (Suzuki and Baxter, 2009; Murray and Wise, 2012) function of the MTL, more recent studies have suggested that both the PHC and PRC are recruited for visuospatial processing (Baumann and Mattingley, 2016; Bonnen et al., 2021; Bonnen et al., 2023). Our present results provide new evidence for the functional involvement of the MTL in visual object perception by demonstrating that visual features are encoded in the earlier regions of the MTL. Furthermore, we observed a decrease in feature coding but an increase in semantic coding in the later areas of the MTL, which supports the expanded framework for neural object processing (Kravitz et al., 2011; Kravitz et al., 2013).
Neural object coding models
Understanding the neural codes for objects has been a focus of cognitive neuroscience over the past decade (Tsao et al., 2003; Freiwald and Tsao, 2010; DiCarlo et al., 2012; Quiroga, 2017). Three distinct coding models have been proposed across species. The axis-based coding model, observed in the higher visual cortices of monkeys, suggests that a given face or object is represented by multiple neurons, each linearly tracking visual variations along the specific feature axes (e.g., texture, shape; Chang and Tsao, 2017; Ponce et al., 2019). The semantic coding, proposed in the human MTL, suggests that each neuron responds selectively and invariantly to one or a few exemplars regardless of visual features (Kreiman et al., 2000; Quian Quiroga et al., 2005). The region-based coding, as proposed in our recent work, suggests that neurons in the human amygdala and HP are tuned to images falling within their “receptive field,” which encompasses stimuli sharing similar visual features (Cao et al., 2020b).
In this study, we conducted a detailed analysis of three neural object coding models within subregions of the human MTL. First, we identified axis coding within the MTL, supporting the notion that the human MTL plays a role in feature-based coding. It is worth noting that axis coding was predominantly observed in the earlier areas of the MTL, specifically the PHC and PRC, which directly receive inputs from the VTC (Kravitz et al., 2013). Second, we observed semantic coding in all MTL subregions, in line with the sparse coding of semantics at the single-neuron level (Kreiman et al., 2000; Quian Quiroga et al., 2005). Third, we confirmed the novel region-coding model, in which neurons establish receptive fields within a high-dimensional feature space, aligning with the recently proposed manifold coding framework (Gallego et al., 2017). Interestingly, we observed that the PHC and PRC had a higher percentage of voxels exhibiting all three coding models, indicating their more substantial involvement in visual object processing.
It is important to note the distinctions between these neural object coding models. First, semantic coding does not necessitate the encoded objects to be visually similar, that is, distributed adjacently in the visual feature space. In other words, voxels exhibiting semantic coding can encode objects distributed across the feature space and/or multiple object categories that are not visually similar/adjacent in the feature space. Second, region coding requires the encoded objects to be visually similar and distributed adjacently in the visual feature space, forming a local hotspot in the feature space. It does not assume that the encoded stimuli all belong to the same semantic category and can encompass a mixture of stimuli from different categories that share similar visual features, thereby not leading to semantic coding. Therefore, region coding is not synonymous with semantic coding. Third, axis coding only necessitates a global change in visual features along certain dimensions of the visual feature space.
On the other hand, it is important to note that these neural object coding models are not mutually exclusive or orthogonal to each other. For instance, semantic coding overlaps with region coding when the encoded object categories share similar visual features and are closely distributed in the object feature space. Region coding for stimuli in a corner of the feature space may also lead to axis coding. Therefore, we observed a substantial overlap between these neural coding models (Fig. 5A). However, despite their interconnectedness, each model contributes valuable insights to our understanding of MTL functioning in perceptual cognition. Specifically, both axis-coding and semantic-coding models serve as indicators of perceptual coding within the MTL. This not only resolves the long-standing debate of whether the MTL is involved in perceptual processing but also elucidates how the MTL is involved in perceptual processing and encodes visual features.
Together, these coding models mutually support each other. A detailed examination of these coding models allows us to better understand the perceptual coding intricacies within the MTL.
Transformation of object coding in the MTL
We investigated the transition of neural coding models from one to another to prove how semantic representations of objects are formed. This is a key question for understanding how humans can recognize objects despite variations in visual appearance, such as colors, shapes, and viewpoints. This “many-to-one” problem is believed to be solved through a dimensionality reduction, where the high-dimensional visual variations represented in the visual cortex (i.e., encoding of visual features) transform into invariant low-dimensional representations (i.e., encoding of semantics; Palmeri and Tarr, 2008). Invariant representations have been primarily observed in MTL neurons, which exhibit the selectivity for one or a few identities and are believed to serve as the building blocks for episodic memory (Quian Quiroga, 2012; Rutishauser et al., 2021). Importantly, our results suggest that feature-based coding in earlier regions of the MTL contributes to semantic coding throughout the MTL, providing a possible neural mechanism for forming visually invariant object representations, which in turn form the neural basis for object recognition and memory.
Limitations of the current study
Although the current study employed an ultrahigh-resolution 7T-fMRI (Allen et al., 2022), which provides an isotropic resolution of 1.8 mm, it is important to note that the neural response within a single voxel may encompass the responses from more than one million neurons. Therefore, semantic coding can only be observed using fMRI when neurons within a voxel exhibit relatively consistent responses. However, neural codes for certain object semantic information may be distributed across diverse neurons and may not elicit a response that is discernible at the voxel level. Additionally, the selectivity of semantic voxels is inherently weaker than what has been observed in single neurons (Cao et al., 2020b; Rey et al., 2020). Moreover, the current study did not account for temporal dynamics due to a limited temporal resolution. It is worth noting that different coding mechanisms may emerge at different latencies. Therefore, future studies capable of analyzing neural responses at the single-neuron level are necessary to address these limitations.
Different object categories may elicit distinct valence or arousal responses, and the amygdala's reaction to visual objects, and thus semantic coding within the amygdala, may be partly influenced by these factors. However, it is worth noting that the vast majority of stimuli used in the present study were emotionally neutral object images (Fig. 1F). Neurons in the human amygdala demonstrate category selectivity and semantic coding for people (Kreiman et al., 2000; Rutishauser et al., 2015; Wang et al., 2018) and animals (Mormann et al., 2011; Rutishauser et al., 2015), and notably, these neurons respond similarly to different individuals (Kreiman et al., 2000; Rutishauser et al., 2015; Wang et al., 2018) and animals (Mormann et al., 2011; Rutishauser et al., 2015), even when these individuals and animals may elicit a varying valence or arousal. This suggests that semantic coding cannot be simply explained by the valence or arousal associated with people and animals. Furthermore, different sets of neurons in the amygdala likely encode emotions and visual categories. For example, neurons encoding social attributes (which encompass a broader category of emotions and are correlated with them) are distinct from those encoding identities (which are visual categories) (Cao et al., 2022a,c). Therefore, the semantic coding observed in the present study is likely distinct from the encoding of emotion and arousal. Through detailed measurements of valence and arousal associated with each stimulus, future studies are needed to fully assess and discount the influence of valence and arousal on visual object coding.
Data Availability
All data that support the findings of this study are publicly available (Allen et al., 2022).
Code Availability
The source code for this study is publicly available on Open Science Framework (https://osf.io/f9kvu/).
Footnotes
This research was supported by the McDonnell Center for Systems Neuroscience, AFOSR (FA9550-21-1-0088), NSF (BCS-1945230, IIS-2114644), and NIH (R01MH129426). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
↵*Y.W. and R.C. contributed equally to this work.
The authors declare no competing financial interests.
- Correspondence should be addressed to Yue Wang at yue.w{at}wustl.edu or Shuo Wang at shuowang{at}wustl.edu.