Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE
PreviousNext
Research Articles, Behavioral/Cognitive

Visual Recognition Memory of Scenes Is Driven by Categorical, Not Sensory, Visual Representations

Ricardo Morales-Torres, Erik A. Wing, Lifu Deng, Simon W. Davis and Roberto Cabeza
Journal of Neuroscience 22 May 2024, 44 (21) e1479232024; https://doi.org/10.1523/JNEUROSCI.1479-23.2024
Ricardo Morales-Torres
1Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ricardo Morales-Torres
Erik A. Wing
2Rotman Research Institute, Baycrest Health Sciences, Toronto, Ontario M6A 2E1, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lifu Deng
1Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Simon W. Davis
1Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
3Department of Neurology, Duke University School of Medicine, Durham, North Carolina 27708
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Simon W. Davis
Roberto Cabeza
1Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

When we perceive a scene, our brain processes various types of visual information simultaneously, ranging from sensory features, such as line orientations and colors, to categorical features, such as objects and their arrangements. Whereas the role of sensory and categorical visual representations in predicting subsequent memory has been studied using isolated objects, their impact on memory for complex scenes remains largely unknown. To address this gap, we conducted an fMRI study in which female and male participants encoded pictures of familiar scenes (e.g., an airport picture) and later recalled them, while rating the vividness of their visual recall. Outside the scanner, participants had to distinguish each seen scene from three similar lures (e.g., three airport pictures). We modeled the sensory and categorical visual features of multiple scenes using both early and late layers of a deep convolutional neural network. Then, we applied representational similarity analysis to determine which brain regions represented stimuli in accordance with the sensory and categorical models. We found that categorical, but not sensory, representations predicted subsequent memory. In line with the previous result, only for the categorical model, the average recognition performance of each scene exhibited a positive correlation with the average visual dissimilarity between the item in question and its respective lures. These results strongly suggest that even in memory tests that ostensibly rely solely on visual cues (such as forced-choice visual recognition with similar distractors), memory decisions for scenes may be primarily influenced by categorical rather than sensory representations.

  • episodic memory
  • recognition memory
  • representational similarity analysis
  • scene memory

Significance Statement

Our memory for real-world scenes often comprises a tableau of complex visual features, but recent findings challenge the view that our memories of such stimuli rely on purely visual information. Instead, it appears that our memory for scenes is heavily influenced by higher-level categorical information. Analyzing cortical representations in regions responsive to both categorical and sensory features, we discovered that only the former can reliably predict memory outcomes. Moreover, the distinctiveness of scenes in terms of their categoric features among similar examples is positively associated with our ability to accurately recognize previously encountered scenes. In essence, this study sheds light on how our brains rely on categorical information to recognize natural scenes.

Introduction

Humans have an astonishing capacity to remember scenes (Standing, 1973), which are multielement organizations of visual features (Henderson and Hollingworth, 1999; Epstein and Baker, 2019). Even after exposure to numerous scenes, individuals can retain sufficient details to later distinguish between seen and similar unseen exemplars of the same scene category, such as two photographs of similar airports (Konkle et al., 2010). However, it is unclear whether this capacity is driven by neural representations of sensory information (the color, size, or orientation of the planes) or by the neural representations of categorical information: the knowledge structure that contains information about the more likely objects and their spatial distribution to be encountered within a layout (Henderson and Hollingworth, 1999; Epstein and Baker, 2019), for example, seeing an airplane next to an air traffic control tower allows an observer to categorize a scene as an airport instead of an aircraft museum. While the impact of different features on scene perception has been carefully investigated (for reviews, see Epstein and Baker, 2019; Castelhano and Krzyś, 2020), their specific influence on subsequent scene memory remains largely unknown, and it is the focus of the current study.

To identify the neural correlates of different visual features, functional magnetic resonance imaging (fMRI) researchers are increasingly relying on a combination of voxel-based representational similarity analyses (RSA) and stimulus models based on deep convolutional neural networks (DNNs). RSA allows for the quantification of the representational structure of cortical activation patterns by relating the similarity in voxel activation and model values across items (Kriegeskorte and Diedrichsen, 2019). DNNs, in turn, offer ready resources for such model parameters, such that early DNN layers quantify simple visual features while late layers describe more complex categorical features. This hierarchical organization mirrors the properties of the human visual system, making DNNs an important tool in this research domain (Yamins et al., 2014; Kriegeskorte, 2015; Peters and Kriegeskorte, 2021). By employing this combined approach, researchers have discovered that during scene processing, the neural representations of early DNN layers (henceforward sensory visual features) emerge rapidly across the visual cortex (Greene and Hansen, 2020) and influence the subsequent categorical processing (Dima et al., 2018; Kaiser et al., 2020). In turn, only the neural representations of late DNN layers (henceforward categorical visual features) have been shown to correlate with participants’ behavior when categorizing or rating the similarity between different scenes (Groen et al., 2018; King et al., 2019; Greene and Hansen, 2020).

Although our current knowledge regarding the specific impact of neural representations of scene features on memory is limited, evidence emphasizes the importance of categorical visual features in the formation of memories. Unlike sensory features, categorical visual features have been shown to be predictive of scene memorability (Isola et al., 2014), defined as the likelihood of a viewer recognizing a previously seen scene. Additionally, mnemonic interference between scenes is primarily influenced by similarities in categorical rather than sensory visual features (Konkle et al., 2010; Mikhailova et al., 2023). Given the prominent role of categorical features in both scene memory and the neural representations of scenes, we hypothesize that a stronger encoding of categorical visual features will be associated with enhanced memory performance.

To investigate the role of visual features in scene memory, we employed a widely used DNN (VGG-16) to quantify both sensory and categorical properties of complex scenes. By modeling both types of features we aimed to isolate the influence of each feature type and examine its relationship to memory. To assess whether a stronger encoding of categorical scene features was associated with memory performance, we utilized RSA to identify brain regions representing both types of visual features and examined how these feature representations were linked to scene memory. Thus, our approach seeks to characterize the influence of multiple levels of information in processing scene information, and address how that information contributes to the formation of lasting memories.

Materials and Methods

Participants

A total of 22 adults (12 women; mean age, 23.5; SD = 3.0) took part in the experiment, all of whom were healthy, right-handed, English speakers, and with normal or corrected-to-normal vision. Prior to their participation, informed consent was obtained from each participant, and the consent process was approved by the Duke University institutional review board. One participant's data was excluded from analysis due to a technical error during the encoding phase.

Paradigm and procedure

As illustrated by Figure 1, the experiment consisted of three phases: encoding (three fMRI runs), recall vividness (three fMRI runs), and forced-choice recognition (outside the scanner). There was also a short practice session prior to scanning. During each encoding run, participants viewed 32 images of scenes (96 scenes in total), each presented for 4 s, accompanied by a label indicating its general category (such as “Airport” or “Movie Set”). Participants were asked to rate the degree to which each image represented its label (e.g., “Is this a good picture of an airport?”). An 8 s interval separated each image presentation, during which participants made even/odd judgments to a series of digits ranging from 1 to 9. During each recall vividness run, participants read the labels of encoded scenes and tried to recall the corresponding scenes as vividly as possible, rating vividness from 1 = least amount of detail, to 4 = highly detailed memory. Immediately following the scan session, participants completed a four-alternative forced-choice test of all 96 encoded scenes. In each trial, the target image was presented together with three scene lures for the same scene category as the target (e.g., three airport scenes) in the four quadrants of the screen. Participants selected the scene they believed they saw during the encoding phase and then rated their confidence in the choice (1 = guess, 4 = very confident).

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Experimental design. During the encoding phase participants saw 96 pictures of scenes, with a descriptive label. During the recall vividness test, descriptive labels for the previously encoded scenes were presented and participants were asked to first recall the encoded scene and then rate how vivid/detailed their mental image of the scene was. During the four-alternative forced-choice test, participants were asked to choose the specific scene they believed was presented during encoding and then rate their confidence about the decision.

The results of some analyses on the current fMRI dataset were previously reported by Wing et al. (2015), who performed encoding-retrieval similarity analyses to investigate the reactivation of encoding information during the recall vividness test. In contrast, the current study uses RSA to examine if and how sensory and categorical representations predict performance in recall vividness and forced-choice recognition tests.

fMRI acquisition and preprocessing

The fMRI acquisition has been previously described in Wing et al. (2015). Briefly, data were collected with a 3 T GE scanner, using a SENSE spiral-in sequence (repetition time, 2 s; echo time, 30 ms; field of view, 24 cm). The anatomical image comprised 96 axial slices parallel to the AC–PC plane with voxel dimensions of 0.9 × 0.9 × 1.9 mm. The data were preprocessed in SPM-12 (http://www.fil.ion.ucl.ac.uk/spm/) using several steps, including discarding the first six functional images (to allow the scanner to reach equilibrium), slice time correction to the first slice, realignment to the first scan, motion correction, and unwarping the images. The functional images were then coregistered to the skull-stripped anatomical image and normalized into the MNI space using DARTEL (Ashburner, 2007). The DRIFTER toolbox was used to denoise the images (Särkkä et al., 2012). The preprocessed functional images were nonsmoothed, and each voxel had a size of 3.75 × 3.75 × 3.8 mm3.

To obtain beta coefficients for each scene, a single trial model was conducted using the least squares-separate approach developed by Mumford et al. (2012). The model consisted of a regressor modeling the activity of the trial of interest and a regressor modeling the activity of all other trials within that run. The events were modeled using a stick function convolved with a standard hemodynamic response function. Additionally, the six raw motion regressors, a composite motion parameter generated by the Artifact Detection Tools, outlier time points (scan-to-scan motion, >2.0 mm or degrees; scan-to-scan global signal change, >9.0 z score; derived from the composite motion parameter), white matter, and cerebrospinal fluid time series were included in each first-level model. A high-pass temporal filter with a 128 s cutoff was applied to the data.

DNN model

As a model of visual scene perception, we used the Keras implementation of the VGG-16 model (https://github.com/GKalliatakis/Keras-VGG16-places365). This implementation was pretrained using the Places365-Standard dataset, a subset of the Places365 dataset (Zhou et al., 2018), which contains ∼1.8 million images from 365 scene categories. The architecture of the network consists of 13 convolutional layers, five max-pooling layers, and three fully connected layers. Each layer is composed of multiple units, with the units of the first layer corresponding to every pixel of the input image (244 × 244 pixels) and progressively decrease the number of units until reaching the last, which is a vector of 365 units that represents the categories of the scenes.

To obtain a model of the visual features of scenes, we fed the network the 96 images (resized to 244 × 244 pixels) used in our experiment. To model sensory and categorical representations, we followed O’Connell and Chun (2018) and Xu and Vaziri-Pashkam (2021) and extracted the values from the first and last max-pooling layers. The max-pooling layers pool information processed by the convolutional layers and pass it to the next section of the network, marking the end of a processing stage. Since the first set of convolutional layers are sensitive to low-level visual features, such as frequency, boundary, and color information (Kriegeskorte, 2015; Eickenberg et al., 2017; Peters and Kriegeskorte, 2021), we used the first max-pooling layer as a model of the sensory features of scenes. The last block of convolutional layers contains information defined by the visual features of scenes, but with enough abstraction that feeding the activity of this layer to the last block of the network (the fully connected layers) allows the model to categorize the images. Therefore, we used the max-pooling layer at the end of the last convolutional block to model categorical visual features (see Fig. 2B for a representation of how these models group images).

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

A, Overview of the RSA approach. We used the DNN VGG-16 as model of the visual processing of scenes. We selected the first and last max-pooling layers as models of the sensory and categorical features of scenes (note that we are depicting a simplified version of VGG-16 and not the actual architecture of it). We generated RDMs from our sensory and categorical models, which we correlated with the multivariate activity of several regions of the brain. By correlating the rows from the sensory and categorical model RDMs with the brain APMs, we obtained a measure of the sensory and categorical representations of the scenes. B, Multidimensional scaling plots (MDS) of the sensory and categorical models. The MDS plots qualitatively show that the sensory model groups images according to their color (orange scenes on the left and blue scenes on the right), while the categorical model groups images according to their category (indoor scenes on the left and outdoor scenes on the right). These visualizations provide insights into how the model represents scenes at different levels of abstraction.

Regions of interest

To examine the influence of sensory and categorical features of images on brain representations, we defined bilateral ROIs using all bilateral cortical (excluding the cerebellum) and subcortical regions from the AAL3 brain atlas (Rolls et al., 2020).

Representational similarity analysis

In order to obtain the representations of the categorical and sensory features of the images, we employed a representational similarity analysis approach (Kriegeskorte and Kievit, 2013; Kriegeskorte and Diedrichsen, 2019).

As a first step, we generated representational dissimilarity matrices (RDMs) of the sensory and categorical feature of scenes by feeding to the DNN the 96 scenes. To generate the sensory RDM, we vectorized the values of the first max-pooling layer for each scene, and then we calculated the pairwise dissimilarity between all scenes, computed as 1 minus the Pearson’s correlation coefficient. Using this procedure, we generate a 96 × 96 matrix, each row of the matrix representing how dissimilar each image is to all other images according to the sensory features of the scenes. To obtain the categorical RDM, we used the same procedure, but vectorizing the activity of the last max-pooling layer.

We then created an activity pattern matrix (APM) for all bilateral ROIs. To construct an APM, we first vectorized all voxels within an ROI, repeating this process across the 96 trials. This resulted in 96 vectors representing the voxel patterns for a given brain region during the presentation of the 96 scenes. We calculated the dissimilarity between these 96 vectors, defined as 1 minus the Pearson’s correlation coefficient, obtaining a 96 × 96 matrix. The dimensions of the APM match those of our sensory and categorical RDMs.

In our last step, the objective was to identify which ROIs represented stimuli in a manner consistent with how the model represented the sensory and/or categorical features of scenes. To achieve this, we applied an item-wise procedure developed by Davis et al. (2021). This procedure yielded a value that serves as an index of the fit between the neural and the DNN representations of a specific scene, obtained as a second-order correlation between the rows of the two matrices. We computed the Spearman correlation between each row of the APM with each corresponding row of the sensory RDM and normalized the correlation coefficient using the Fisher transformation. Each row of the APM and RDMs represents how dissimilar a specific scene is from all other scenes (according to a DNN or according to an ROI). By correlating the APM row, which shows the dissimilarity in voxel activity estimates, with the RDM row, reflecting the dissimilarity in DNN layer activation values, we measure how similarly a brain region represents a particular scene compared with how the model does. This process yields 96 values, with higher values indicating a greater consistency between a specific ROI's stimulus representation and the type of feature format that the model is representing (Fig. 2A presents a graphical example of this approach). We used the same procedure to obtain the representational strength of the categorical features of scenes.

As in Davis et al. (2021), we interpret the item-wise brain-model fit values as an index of the sensitivity of specific regions to the particular type of feature being modeled by a DNN layer. While this approach differs from classical RSA, where the two similarity matrices are correlated (Kriegeskorte and Kievit, 2013; Dimsdale-Zucker and Ranganath, 2018; Popal et al., 2019), by calculating the brain-model fit for each scene individually, we can directly relate the sensitivity of an ROI to a specific feature and its relationship with the memory for an individual scene.

Behavioral analysis

To evaluate the relationship between the two memory tests, we utilized Yule's Q, a statistical test that measures the level of association or dependence between the outcomes of both tests. Yule's Q provides a value that can be interpreted in a manner similar to a correlation coefficient (Kahana, 2000), offering insights into the degree of dependency between the recall vividness and forced-choice recognition tasks.

For the recall vividness test, as in previous studies, we grouped vividness ratings of 1 and 2 as “low vividness” and ratings of 3 and 4 as “high vividness” (Kuhl and Chun, 2014; Lee et al., 2019). In the recognition memory test, as we are only interested in memory performance driven by recollection, rather than memory performance driven by familiarity, we considered only those trials recognized with a high confidence rating as accurate (Kim and Cabeza, 2009; Lee et al., 2019).

Statistical approach

To test the influence of sensory and categorical scene representations on subsequent memory, we generated two different types of statistical models. Our first model was generated to answer if our ROIs represented the sensory and categorical features that we derived from the DNN. We employed an item-wise, hierarchical mixed-effects model with the brain-model fit values for each scene in each region as the outcome variable. To account for potential autocorrelation between different layers of the DNN (Bone et al., 2020), and to isolate the influence of a specific type of feature, we included the type of visual representation that was not used as the outcome variable as a control variable. Subject and scene were included as random effects. We included each scene as a random effect, such that our findings can be generalized to other scenes, thus aiming to address the stimulus as a fixed effect fallacy (Raaijmakers, 2003). Consequently, the statistical models were formulated as follows:Brain-Model FitSensory∼ROI+Brain-Model FitCategorical+(1|Subject)+(1|Scene), (1)Brain-Model FitCategorical∼ROI+Brain-Model FitSensory+(1|Subject)+(1|Scene). (2)We then constructed a second type of model to examine the impact of sensory and categorical representations on subsequent memory. The objective of this model was to determine which type of representation could predict whether a scene would be remembered. We included the brain-model fit values of both features as predictors. By incorporating both features in the same model, we allowed them to “compete” with each other, each attempting to demonstrate its unique contribution to explaining variations in the dependent variable, while controlling for the influence of the other representation type. As the dependent variable is binary, this second model took the form of a mixed-effects logistic regression, with subject and scene as random effects. Consequently, the statistical model was formulated as follows:Memory(1/0)∼Brain-Model FitSensory+Brain-Model FitCategorical+(1|Subject)+(1|Scene). (3)To test the significance of the logistic regression coefficients, we employed a permutation approach, which offers the advantage of better controlling for false positives and making fewer assumptions compared with standard statistical approaches (Winkler et al., 2014). In the first step, we computed the logistic regression coefficients of the mixed-effects logistic regression. Following previous approaches to randomization in mixed-effects models (Manly, 1986; Anderson and Legendre, 1999), our second step involved, for each subject, randomly changing the values of the dependent variable (which scenes were remembered or forgotten) and then computing the coefficients. We repeated this step 5,000 times and compared how extreme the real coefficients obtained from the first step were compared with the distribution of the 5,000 coefficients obtained during the randomization process. A p value of < 0.05 means that the real coefficient has a more extreme value than at least 95% of the coefficients obtained during the randomization approach, which represents the distribution of values if the null hypothesis were true.

Statistical analysis was done in RStudio (RStudio Team, 2020), and graphs were generated using the ggplot2 package (Wickham, 2011). Mixed-effects models were performed using the lme4 package (Bates et al., 2015). The p values for the brain-model fit coefficients of each AAL3 bilateral ROI were obtained using the lmerTest package (Kuznetsova et al., 2017).

Target–lure dissimilarity analysis

Our last analysis focused on comparing which type of scene feature is used when subjects need to recognize an image from visually similar lures. The assumption of this analysis is that forced-choice recognition accuracy is better when the target is different from the lures. Following that rationale we generated a metric, named mean target–lure dissimilarity (MTLD), that summarizes the relationship between a target scene and a series of lure scenes for that target, focusing on a specific type of scene feature.

We computed MTLD values for each scene following five steps: (1) We fed the target scene and its three lures of each trial to the DNN. (2) We extracted the activation values from the first and last max-pooling layers of the DNN. (3) For both the sensory and categorical features, we calculated the dissimilarity value (1 minus the Pearson’s correlation) between each target and its three lures. (4) We then averaged these three dissimilarity values separately for sensory and categorical features to obtain each trial's MTLD value. (5) We computed the z score of the MTLD value, using the mean and standard deviation of the dissimilarity between the target scene and the other 95 scenes.

Results

Behavioral results

The analysis of the behavior of participants aimed at testing to what degree the sequentially administered recall vividness and forced-choice recognition tests were interrelated and thus to what extent their mnemonic processes draw upon common information (Tulving and Wiseman, 1975). The recall vividness test has as outcomes high vividness (1) or low vividness (0) trials, while the forced-choice recognition test has as outcomes trials recognized (1) or forgotten (0). Our contingency analyses considered the number of scenes rated as high vividness and later recognized (1/1), scenes rated as high vividness but not recognized (1/0), scenes rated as low vividness but then recognized (0/1), and scenes rated as low vividness and not recognized (0/0).

We calculated Yule's Q for each participant, averaging this value across subjects and then compared it against 0 using a t test. Our results showed that the recall vividness and forced-choice recognition had a moderate dependency between each other (Yule's Q = 0.44; SD = 0.26; t(20) = 7.62; p < 0.001; Cohen's d = 1.66). A moderate Yule's Q suggest that the recall vividness and forced-choice recognition tests are sensitive to similar information; therefore, it is reasonable to expect that visual representations could have a similar impact on both tests.

Sensory and categorical representations regardless of subsequent memory

Functional MRI data first revealed which regions showed evidence of representing sensory and categorical features of scenes (Fig. 3). We estimated the brain-model fit for each ROI and for each feature model using a mixed-effects regression. We then contrasted these coefficients against 0 and corrected the p values for all comparisons using the false discovery rate (FDR) correction. We found several regions within occipital, ventral, and parietal regions that represented both types of scene features. These regions were the dorsolateral prefrontal cortex (categorical: t(125.8) = 3.69, p < 0.001; sensory: t(64.46) = 3.35, p < 0.001), parahippocampal cortex (categorical: t(125.8) = 3.16, p = 0.004; sensory: t(64.46) = 4.40, p < 0.001), inferior temporal cortex (categorical: t(125.8) = 4.19, p < 0.001; sensory: t(64.46) = 3.31, p = 0.006), precuneus (categorical: t(125.8) = 3.97, p < 0.001; sensory: t(64.46) = 3.46, p = 0.005), fusiform cortex (categorical: t(125.9) = 5.44, p < 0.001; sensory: t(64.46) = 5.59, p < 0.001), superior occipital cortex (categorical: t(125.8) = 6.92, p < 0.001; sensory: t(64.46) = 3.91, p = 0.001), middle occipital cortex (categorical: t(125.98) = 8.26, p < 0.001; sensory: t(64.51) = 7.81, p < 0.001), inferior occipital cortex (categorical: t(125.98) = 7.52, p < 0.001; sensory: t(64.49) = 6.00, p < 0.001), lingual gyrus (categorical: t(125.83) = 6.77, p < 0.001; sensory: t(64.46) = 4.12, p < 0.001), and the calcarine fissure and surrounding cortex (categorical: t(125.92) = 10.05, p < 0.001; sensory: t(64.54) = 6.1, p < 0.001) ROI.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

A, Brain-model fit values across ROIs: We calculated the Brain-model fit values for all bilateral ROIs in the AAL3 atlas. For both types of features, the brain-model fit showed higher values in occipital and temporal regions. B, Regions representing both types of scene features: 10 ROIs across the cortex represent both types of scene features. Asterisks denote differences in brain-model fit between feature types in a specific ROI. ***p < 0.001, **p < 0.01, *p < 0.05 (FDR corrected p values).

For each of the 10 ROIs showing consistent brain-model fit with both sensory and categorical scene features, we compared the coefficients of these feature types. This involved calculating the differences between the z score-transformed coefficients to determine if any region exhibited a stronger fit for one feature type over the other. Following the FDR correction for multiple comparisons, we observed that the superior occipital cortex (Zdiff = 3.00; p = 0.007), lingual gyrus (Zdiff = 2.65; p = 0.01), and the calcarine fissure and surrounding cortex (Zdiff = 3.95; p < 0.001) ROIs more strongly represented the categorical features of scenes. No other significant differences were found across ROIs (all p > 0.16).

The previous analysis revealed that several brain regions demonstrated significant brain-model fits for both categorical and sensory scene features. However, to preclude the potential confound of the visual model capturing objects within the scenes rather than scenes as a whole, we conducted a follow-up analysis. This analysis tested whether a model trained on object classification would perform similarly to one trained on scene classification. For this, we calculated the brain-model fit of sensory and categorical scene features using a version of VGG-16 trained for object classification. We then compared the brain-model fit estimates of the 42 bilateral ROIs previously calculated with a DNN trained on scene classification against the brain-model fit estimates obtained with the same DNN trained on object classification.

Our results indicated that the brain-model fits of the sensory features did not differ significantly between the model trained for scene classification (M = 0.011;SD = 0.009) and the model trained for object classification (M = 0.009; SD = 0.008; t(41) = 1.85; p = 0.07; Cohen's d = 0.23). In contrast, for the representations of categorical features, the brain-model fits obtained using the scene classification model (M = 0.015; SD = 0.009) were significantly higher compared with those obtained using the object classification model (M = 0.006; SD = 0.005; t(41) = 8.63; p < 0.001; Cohen's d = 1.22). These results reveal that while the early stages of models trained for scene and object classification show comparable results in capturing the representational structure of the brain, the later stages of the network trained with scenes are significantly more effective in capturing the representational structure of the brain during scene perception. This suggests that the processing of scenes cannot be reduced to the processing of the objects within them.

Only categorical representations predict subsequent scene memory

To test and compare the influence of sensory and categorical feature representations on scene memory, we conducted a mixed-effects logistic regression using the brain-model fit of both types of features as predictors and memory as the outcome variable, as well as including subject and scene as random factors. The inclusion of both types of feature representations within the same model enabled a direct comparison of their influence on scene memory, while also accounting for shared information between them. Since the Yule's Q analysis revealed a significant relationship between both types of memory tests, we classified trials that were recalled with high vividness and subsequently recognized as “remembered” and coded them as “1” for the mixed-effects logistic regression. All other trials were coded as “0.”

We applied the mixed-effects model to the 10 ROIs that demonstrated a significant fit for both types of feature representations. Following the adjustment for multiple comparisons using the FDR correction, we identified that, among these 10 ROIs, associations between memory and brain-model fit were significant in three ROIs. In the precuneus, we observed a positive association with subsequent memory exclusively for categorical feature representations (β = 0.15; SE = 0.05; p = 0.011), while sensory feature representations (β = −0.04; SE = 0.05; p = 0.77) did not show a significant link to memory. A similar pattern was found in the inferior temporal cortex, where categorical feature representations (β = 0.12; SE = 0.05; p = 0.018) were positively associated with subsequent memory, while sensory feature representations (β = 0.02; SE = 0.05; p = 0.69) did not exhibit a significant relationship with memory outcomes. Lastly, within the superior occipital cortex, we found that only categorical feature representations (β = 0.13; SE = 0.05; p = 0.017) displayed a positive association with subsequent memory, whereas sensory feature representations (β = −0.03; SE = 0.05; p = 0.77) did not exhibit a significant connection to memory performance.

To validate our results in these three ROIs, we directly compared the influence of each type of feature representation in explaining subjects’ memory by employing a likelihood ratio test. This test compared the fit between different statistical models in predicting the outcome variable and directly compares the models’ ability to account for the data (Nie, 2006). In our case, for the three ROIs, we compared the ability of three models to explain behavior. The first model only contained the intercept and the random effects of subjects and scenes, the second contained the random effects as well as the sensory brain-model fit values for each scene and the last model contained the random effects, as well as the sensory and categorical brain-model fit values for each scene. This stepwise approach allowed us to directly compare the ability of sensory and categorical feature representations to explain subsequent memory.

The likelihood ratio test revealed that, in comparison with a model devoid of any form of feature representations, sensory feature representations in the precuneus (χ2(1) = 0.12; p = 0.91), inferior temporal cortex (χ2(1) = 0.69; p = 0.40), and superior occipital cortex (χ2(1) = 0.04; p = 0.85) did not make a significant contribution to the model's capacity to explain the behavioral data. Conversely, a model that incorporated categorical feature representations displayed a notably improved fit to the data in the precuneus (χ2(1) = 7.85; p = 0.005), inferior temporal cortex (χ2(1) = 4.66; p = 0.03), and superior occipital cortex (χ2(1) = 5.65; p = 0.017), when compared with a model using only sensory feature representations as predictors of memory.

In summary, from the group of ROIs that showed both sensory and categorical representations, only categorical representations in the precuneus, inferior temporal cortex, and superior occipital cortex predicted scene memory. Consistent with our hypothesis, stronger encoding of categorical scene features was associated with better memory performance. To further investigate the effect of categorical representations on memory recognition, we conducted a follow-up analysis on the discrimination between forced-choice recognition targets and lures Figure 4.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Stronger categorical representations predict memory performance. Model coefficients of the relationship between subsequent memory and brain-model fit (±standard mean error) are displayed as a function of representation type and ROI. Across both tasks only the strength of categorical representations is associated with memory performance. *p < 0.05, all p values were FDR corrected.

Participants rely on categorical features to discriminate forced-choice recognition of targets and lures

To confirm that categorical, but not sensory, representations were associated with memory performance, we conducted a follow-up analysis to examine which type of features participants rely on when discriminating targets versus lures. To do this, we generated MTLD values for each scene (the procedure is depicted in Fig. 5A) and correlated such values (excluding three scenes with z scores greater than 3 or less than −3) with the average forced-choice recognition memory for each scene. To control for the influence of the alternate type of information, we conducted two partial correlations: the first correlating sensory MTLD with memory while controlling for categorical MTLD and the second correlating categorical MTLD with memory while controlling for sensory MTLD.

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

Only the categorical features of scenes impact forced-choice recognition memory performance. A, We investigated the impact of sensory and categorical features of scenes on forced-choice recognition performance. Specifically, we examined the relationship between each target image (e.g., the airport on the left) and its three lures (e.g., the three airports on the center). For each target image we calculated the dissimilarity (1 – r) between the target image and its three lures, according to the sensory (ds) and categorical (dc) models. As we can see by the different values of ds and dc, the model perceived the relationship between the target and lures differently depending on whether sensory or categorical features were considered. B, We found a significant partial correlation between memory performance and the average dissimilarity between the target and the lures only for categorical features, while controlling for dissimilarity of sensory features. The plus sign represents the average memory for the airport as a function of the target–lure dissimilarity, for the two types of features. **p < 0.01.

As illustrated by Figure 5B, we found that forced-choice recognition performance was positively correlated with the target–lure dissimilarity values of the categorical features of images (r(90) = 0.33; p = 0.002), but not with the sensory features of images (r(90) = 0.15; p = 0.15). These results confirm that participants rely on categorical features when recognizing previously encoded scenes.

To directly compare the influences of categorical and sensory MTLD on recognition memory, we employed a likelihood ratio test. Similar to our approach with the fMRI data, we initially created a model with only the intercept. The second model included sensory MTLD values for each scene, and the third model incorporated both sensory and categorical MTLD values. These models were then compared sequentially. The likelihood ratio test showed that sensory MTLD (χ2(1) = 1.35; p = 0.25) did not significantly contribute to the model's explanatory power for scene recognition, compared with a model with only the intercept. Conversely, the model with categorical MTLD values demonstrated a significant fit to the data, compared with the sensory MTLD-only model (χ2(1) = 10.4; p = 0.001).

In sum, our follow-up analysis on target–lure discrimination converges with the conclusion from the mixed-effects model on RSA data that categorical, but not sensory, representations predict forced-choice recognition memory.

Discussion

In the current study, we investigated the contributions of sensory and categorical visual representations to scene memory. Our research yielded two primary findings. First, we demonstrated that the encoding of categorical scene representations was linked to memory performance. Second, we identified a positive correlation between memory accuracy and the categorical distinctiveness of a scene relative to its distractors. However, no significant relationship was found for sensory features. We discuss these findings in greater detail below.

Memory performance were predicted only by categorical scene representations

Consistent with our hypothesis, the stronger encoding of categorical features was associated with better memory performance. Previous research about scene perception has shown that the neural representation of categorical scene features explains participants behavior across scene classification tasks (Groen et al., 2018; King et al., 2019), while memory research has shown that the categorical features of scenes are related to scene memory (Konkle et al., 2010; Isola et al., 2014). Our results build upon these previous findings, demonstrating not only the significant role of categorical features in scene processing but also illustrating how the neural representation of these categorical features supports subsequent memory for scenes.

The influence of higher-level features on subsequent memory performance has also been demonstrated in a recent study by Liu et al. (2021). In their research, they utilized intracortical electroencephalographic recordings while subjects viewed objects and subsequently underwent memory testing. By employing DNNs to model visual and semantic features, the authors found that greater representational transformations from sensory to categorical features correlated with improved memory for objects. To our knowledge, no study has directly compared the transformation of sensory with semantic representations of scene features and the impact of this transformation for memory. Nonetheless, considering that higher-level categorical information in scenes has been shown to emerge as early as the first 100 ms of visual processing (Lowe et al., 2018), one might speculate that scenes are a type of visual stimulus undergoing a rapid transformation from sensory to categorical representations, which could explain their mnemonic advantage (Standing, 1973).

In our current study, we have demonstrated that categorical representations within regions spanning the occipital, parietal, and temporal lobes predict subsequent memory for naturalistic stimuli. Our findings are in line with those of a recent study conducted by Hebscher et al. (2023). In their research, they showed that within the same regions, more similar neural representations of naturalistic stimuli (recordings of subjects performing different actions) were correlated with enhanced memory performance. While the authors used DNNs to model the sensory and categorical features of their stimuli and identified the type of information encoded in these ROIs, they did not link the brain-model fit values to subsequent memory. To gain a comprehensive understanding of how neural representations influence memory for naturalistic stimuli, future studies should strive to establish links between both types of representational information: feature representation and neural similarity between events.

Our current findings provide additional insights into the role of memory representations, building upon earlier research from our group. In their exploratory study, Davis et al. (2021) found that in the inferior temporal gyrus, both types of representations predicted object memory, while only sensory representations predicted subsequent memory in the precuneus. Contrasting with these earlier findings, our current study reveals a distinct pattern. We found that in both the inferior temporal gyrus and the precuneus, only the representations of categorical features are correlated with scene memory. This different pattern of results could be explained by how scenes context influences object processing (Biderman and Mudrik, 2018; Furtak et al., 2022) which, in turn, affects the neural representations of those objects, rendering these representations context dependent (Lowe et al., 2017; Wang et al., 2018). Therefore, while further studies are necessary to confirm this hypothesis, scene context could provide additional semantic support for individual objects, making their sensory characteristics less significant for subsequent memory performance.

In direct contrast to our findings, Bone et al. (2020) reported a contradictory pattern, such that the reactivation of early visual features of images, and their corresponding cortical sensory representations were predictive of subsequent forced-choice recognition. One possible explanation for the opposite direction of these results is that the nature of the distractors used in these two experiments differed in one critical factor: their visual dissimilarity with the target image. We discuss this critical factor in the context of our second main finding in the next section.

The effect of lure dissimilarity confirms the role of categorical representations on forced-choice recognition memory

Our second main finding was that the categorical dissimilarity between the target image and its lures predicted forced-choice recognition memory, but sensory dissimilarity has no such influence on scene memory. Previous research about memory for scenes has shown that the encoding of more similar scenes according to their categorical, but not sensory, features negatively influences memory performance (Konkle et al., 2010). Our results expand this previous finding by showing that only the categorical (dis)similarity of the target and distractor images can predict forced-choice recognition memory.

To the best of our knowledge, this is the first study to model the relationship between a target image and its distractors using different types of visual features to investigate their unique contributions to forced-choice recognition memory. This relationship can help to explain the difference between the results of Bone et al. (2020) and our results. We hypothesized that while the distractors used in our study were more dissimilar in terms of their categorical features (e.g., lures representing different models of airplanes), and thus participants relied more on these types of features to recognize them, the distractors used by Bone et al. (2020) were more dissimilar on their sensory features (e.g., a picture of the same statue with a differently colored background), thus making participants relying more on those features during the memory test.

Limitations and further considerations

One limitation of our study that could have influenced our results is related to the nature of the encoding task. We asked participants to rate how well the scene image represented its category, a task that relies on participants paying attention to the more categorical features of scenes, while not attending their more basic sensory features. By making participants pay more attention to the categorical features of images, they could have encoded more strongly these types of features. An important follow-up experiment would be to test if making participants focus on the basic sensory aspects of the scenes, in order to test how that influences the encoding of sensory features.

Given the rapid development of neural network models of vision, it is important to acknowledge a potential limitation of our study. While DNNs have been used as a model of visual processing (Peters and Kriegeskorte, 2021; Jozwik et al., 2023), there has been an increased attention on the limits between this computational model and its ability to explain human visual processing (Xu and Vaziri-Pashkam, 2021). While we recognize this as a limitation, it is worth noting that we employed a DNN architecture widely used on memory research (Bone et al., 2020; Davis et al., 2021), trained on a database that has been shown to generate accurate model of brain activity during visual scene processing (Cichy et al., 2017; Groen et al., 2018; Greene and Hansen, 2020). Nevertheless, it would be an important step forward to test if more biologically plausible DNN, such as architectures with a recurrent structure (Kubilius et al., 2018), offers an improved model to study the influence of visual representations on memory.

Although not the central focus of our study, an intriguing avenue for future research emerges from our result where early stages of visual processing displayed higher brain-model fit values for categorical features compared with those for sensory features. This outcome, while unexpected, aligns with several studies that have reported higher brain-model fit values for later DNN layers as opposed to early ones in early visual regions (Devereux et al., 2018; Groen et al., 2018; Davis et al., 2021). Considering the reciprocal connections between early and late regions within the visual cortex hierarchy (Bastos et al., 2012; Muckli et al., 2015), it may not be surprising that early visual regions show sensitivity to visual features modeled by the later layers of a DNN, particularly when taking into account the integration of neural signals recorded through the slow acquisition time of fMRI. Nonetheless, further investigation utilizing more time-sensitive measures is required to fully explore this phenomenon.

Conclusions

By integrating computational models of scene processing with fMRI data, our study has yielded evidence supporting the role of categorical scene representations in memory. Firstly, our findings demonstrate that robust encoding of categorical features contributes to scene memory. Secondly, we established that categorical representations also play an important role in scene recognition. Furthermore, our results suggest that, when confronted with stimuli that encompass multiple layers of visual details, the brain effectively utilizes pre-existing knowledge (captured by categorical models) to perceive and recall these complex environments. In summary, our study advances our understanding of the specific visual information that underlies our ability to recall and navigate our surrounding visual environment.

Footnotes

  • This study was supported by the National Institutes of Health, RF1-AG066901 and R01-AG075417.

  • The authors declare no competing financial interests.

  • Correspondence should be addressed to Roberto Cabeza at cabeza{at}duke.edu.

SfN exclusive license.

References

  1. ↵
    1. Anderson MJ,
    2. Legendre P
    (1999) An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model. J Stat Comput Simul 62:271–303. https://doi.org/10.1080/00949659908811936
    OpenUrlCrossRef
  2. ↵
    1. Ashburner J
    (2007) A fast diffeomorphic image registration algorithm. NeuroImage 38:95–113. https://doi.org/10.1016/j.neuroimage.2007.07.007
    OpenUrlCrossRefPubMed
  3. ↵
    1. Bastos AM,
    2. Usrey WM,
    3. Adams RA,
    4. Mangun GR,
    5. Fries P,
    6. Friston KJ
    (2012) Canonical microcircuits for predictive coding. Neuron 76:695–711. https://doi.org/10.1016/j.neuron.2012.10.038 pmid:23177956
    OpenUrlCrossRefPubMed
  4. ↵
    1. Bates D,
    2. Mächler M,
    3. Bolker B,
    4. Walker S
    (2015) Fitting linear mixed-effects models using lme4. J Stat Softw 67:1–48. https://doi.org/10.18637/jss.v067.i01
    OpenUrlCrossRefPubMed
  5. ↵
    1. Biderman N,
    2. Mudrik L
    (2018) Evidence for implicit—but not unconscious—processing of object-scene relations. Psychol Sci 29:266–277. https://doi.org/10.1177/0956797617735745
    OpenUrl
  6. ↵
    1. Bone MB,
    2. Ahmad F,
    3. Buchsbaum BR
    (2020) Feature-specific neural reactivation during episodic memory. Nat Commun 11:1945. https://doi.org/10.1038/s41467-020-15763-2 pmid:32327642
    OpenUrlCrossRefPubMed
  7. ↵
    1. Castelhano MS,
    2. Krzyś K
    (2020) Rethinking space: a review of perception, attention, and memory in scene processing. Annu Rev Vis Sci 6:563–586. https://doi.org/10.1146/annurev-vision-121219-081745
    OpenUrl
  8. ↵
    1. Cichy RM,
    2. Khosla A,
    3. Pantazis D,
    4. Oliva A
    (2017) Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks. NeuroImage 153:346–358. https://doi.org/10.1016/j.neuroimage.2016.03.063 pmid:27039703
    OpenUrlCrossRefPubMed
  9. ↵
    1. Davis SW,
    2. Geib BR,
    3. Wing EA,
    4. Wang W-C,
    5. Hovhannisyan M,
    6. Monge ZA,
    7. Cabeza R
    (2021) Visual and semantic representations predict subsequent memory in perceptual and conceptual memory tests. Cereb Cortex 31:974–992. https://doi.org/10.1093/cercor/bhaa269 pmid:32935833
    OpenUrlCrossRefPubMed
  10. ↵
    1. Devereux BJ,
    2. Clarke A,
    3. Tyler LK
    (2018) Integrated deep visual and semantic attractor neural networks predict fMRI pattern-information along the ventral object processing pathway. Sci Rep 8:10636. https://doi.org/10.1038/s41598-018-28865-1 pmid:30006530
    OpenUrlCrossRefPubMed
  11. ↵
    1. Dima DC,
    2. Perry G,
    3. Singh KD
    (2018) Spatial frequency supports the emergence of categorical representations in visual cortex during natural scene perception. NeuroImage 179:102–116. https://doi.org/10.1016/j.neuroimage.2018.06.033 pmid:29902586
    OpenUrlCrossRefPubMed
  12. ↵
    1. Dimsdale-Zucker HR,
    2. Ranganath C
    (2018) Representational similarity analyses. In: Handbook of behavioral neuroscience (Manahan-Vaugha D, ed) Vol 28, pp 509–525. Elsevier.10.1016/B978-0-12-812028-6.00027-6
    OpenUrl
  13. ↵
    1. Eickenberg M,
    2. Gramfort A,
    3. Varoquaux G,
    4. Thirion B
    (2017) Seeing it all: convolutional network layers map the function of the human visual system. NeuroImage 152:184–194. https://doi.org/10.1016/j.neuroimage.2016.10.001
    OpenUrlCrossRef
  14. ↵
    1. Epstein RA,
    2. Baker CI
    (2019) Scene perception in the human brain. Annu Rev Vis Sci 5:373–397. https://doi.org/10.1146/annurev-vision-091718-014809 pmid:31226012
    OpenUrlCrossRefPubMed
  15. ↵
    1. Furtak M,
    2. Mudrik L,
    3. Bola M
    (2022) The forest, the trees, or both? Hierarchy and interactions between gist and object processing during perception of real-world scenes. Cognition 221:104983. https://doi.org/10.1016/j.cognition.2021.104983
    OpenUrl
  16. ↵
    1. Greene MR,
    2. Hansen BC
    (2020) Disentangling the independent contributions of visual and conceptual features to the spatiotemporal dynamics of scene categorization. J Neurosci 40:5283–5299. https://doi.org/10.1523/JNEUROSCI.2088-19.2020 pmid:32467356
    OpenUrlAbstract/FREE Full Text
  17. ↵
    1. Groen II,
    2. Greene MR,
    3. Baldassano C,
    4. Fei-Fei L,
    5. Beck DM,
    6. Baker CI
    (2018) Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior. Elife 7:e32962. https://doi.org/10.7554/eLife.32962 pmid:29513219
    OpenUrlCrossRefPubMed
  18. ↵
    1. Hebscher M,
    2. Bainbridge WA,
    3. Voss JL
    (2023) Neural similarity between overlapping events at learning differentially affects reinstatement across the cortex. NeuroImage 277:120220. https://doi.org/10.1016/j.neuroimage.2023.120220 pmid:37321360
    OpenUrlPubMed
  19. ↵
    1. Henderson JM,
    2. Hollingworth A
    (1999) High-level scene perception. Annu Rev Psychol 50:243–271. https://doi.org/10.1146/annurev.psych.50.1.243
    OpenUrlCrossRefPubMed
  20. ↵
    1. Isola P,
    2. Xiao J,
    3. Parikh D,
    4. Torralba A,
    5. Oliva A
    (2014) What makes a photograph memorable? IEEE Trans Pattern Anal Mach Intell 36:1469–1482. https://doi.org/10.1109/TPAMI.2013.200
    OpenUrlCrossRef
  21. ↵
    1. Jozwik KM,
    2. Kietzmann TC,
    3. Cichy RM,
    4. Kriegeskorte N,
    5. Mur M
    (2023) Deep neural networks and visuo-semantic models explain complementary components of human ventral-stream representational dynamics. J Neurosci 43:1731–1741. https://doi.org/10.1523/JNEUROSCI.1424-22.2022 pmid:36759190
    OpenUrlAbstract/FREE Full Text
  22. ↵
    1. Kahana MJ
    (2000) Contingency analyses of memory. In: The Oxford handbook of memory (Tulving E, Craik FIM, eds), pp 59–72. New York, NY: Oxford University Press.
  23. ↵
    1. Kaiser D,
    2. Häberle G,
    3. Cichy RM
    (2020) Cortical sensitivity to natural scene structure. Hum Brain Mapp 41:1286–1295. https://doi.org/10.1002/hbm.24875 pmid:31758632
    OpenUrlCrossRefPubMed
  24. ↵
    1. Kim H,
    2. Cabeza R
    (2009) Common and specific brain regions in high- versus low-confidence recognition memory. Brain Res 1282:103–113. https://doi.org/10.1016/j.brainres.2009.05.080 pmid:19501072
    OpenUrlCrossRefPubMed
  25. ↵
    1. King ML,
    2. Groen IIA,
    3. Steel A,
    4. Kravitz DJ,
    5. Baker CI
    (2019) Similarity judgments and cortical visual responses reflect different properties of object and scene categories in naturalistic images. NeuroImage 197:368–382. https://doi.org/10.1016/j.neuroimage.2019.04.079 pmid:31054350
    OpenUrlPubMed
  26. ↵
    1. Konkle T,
    2. Brady TF,
    3. Alvarez GA,
    4. Oliva A
    (2010) Scene memory is more detailed than you think: the role of categories in visual long-term memory. Psychol Sci 21:1551–1556. https://doi.org/10.1177/0956797610385359 pmid:20921574
    OpenUrlCrossRefPubMed
  27. ↵
    1. Kriegeskorte N
    (2015) Deep neural networks: a new framework for modeling biological vision and brain information processing. Annu Rev Vis Sci 1:417–446. https://doi.org/10.1146/annurev-vision-082114-035447
    OpenUrl
  28. ↵
    1. Kriegeskorte N,
    2. Diedrichsen J
    (2019) Peeling the onion of brain representations. Annu Rev Neurosci 42:407–432. https://doi.org/10.1146/annurev-neuro-080317-061906
    OpenUrlCrossRefPubMed
  29. ↵
    1. Kriegeskorte N,
    2. Kievit RA
    (2013) Representational geometry: integrating cognition, computation, and the brain. Trends Cogn Sci 17:401–412. https://doi.org/10.1016/j.tics.2013.06.007 pmid:23876494
    OpenUrlCrossRefPubMed
  30. ↵
    1. Kubilius J,
    2. Schrimpf M,
    3. Nayebi A,
    4. Bear D,
    5. Yamins DLK,
    6. DiCarlo JJ
    (2018) CORnet: modeling the neural mechanisms of core object recognition. Neuroscience [Preprint]:1–9. https://doi.org/10.1101/408385
  31. ↵
    1. Kuhl BA,
    2. Chun MM
    (2014) Successful remembering elicits event-specific activity patterns in lateral parietal cortex. J Neurosci 34:8051–8060. https://doi.org/10.1523/JNEUROSCI.4328-13.2014 pmid:24899726
    OpenUrlAbstract/FREE Full Text
  32. ↵
    1. Kuznetsova A,
    2. Brockhoff PB,
    3. Christensen RHB
    (2017) Lmertest package: tests in linear mixed effects models. J Stat Softw 82:1–26. https://doi.org/10.18637/jss.v082.i13
    OpenUrlCrossRefPubMed
  33. ↵
    1. Lee H,
    2. Samide R,
    3. Richter FR,
    4. Kuhl BA
    (2019) Decomposing parietal memory reactivation to predict consequences of remembering. Cereb Cortex 29:3305–3318. https://doi.org/10.1093/cercor/bhy200
    OpenUrlCrossRefPubMed
  34. ↵
    1. Liu J,
    2. Zhang H,
    3. Yu T,
    4. Ren L,
    5. Ni D,
    6. Yang Q,
    7. Lu B,
    8. Zhang L,
    9. Axmacher N,
    10. Xue G
    (2021) Transformative neural representations support long-term episodic memory. Sci Adv 7:eabg9715. https://doi.org/10.1126/sciadv.abg9715 pmid:34623910
    OpenUrlCrossRefPubMed
  35. ↵
    1. Lowe MX,
    2. Rajsic J,
    3. Ferber S,
    4. Walther DB
    (2018) Discriminating scene categories from brain activity within 100 milliseconds. Cortex 106:275–287. https://doi.org/10.1016/j.cortex.2018.06.006
    OpenUrlCrossRefPubMed
  36. ↵
    1. Lowe MX,
    2. Rajsic J,
    3. Gallivan JP,
    4. Ferber S,
    5. Cant JS
    (2017) Neural representation of geometry and surface properties in object and scene perception. NeuroImage 157:586–597. https://doi.org/10.1016/j.neuroimage.2017.06.043
    OpenUrlCrossRefPubMed
  37. ↵
    1. Manly BFJ
    (1986) Randomization and regression methods for testing for associations with geographical, environmental and biological distances between populations. Popul Ecol 28:201–218. https://doi.org/10.1007/BF02515450
    OpenUrl
  38. ↵
    1. Mikhailova A,
    2. Lightfoot S,
    3. Santos-Victor J,
    4. Coco MI
    (2023) Differential effects of intrinsic properties of natural scenes and interference mechanisms on recognition processes in long-term visual memory. Cogn Process 25:173–187. https://doi.org/10.1007/s10339-023-01164-y
    OpenUrl
  39. ↵
    1. Muckli L,
    2. De Martino F,
    3. Vizioli L,
    4. Petro LS,
    5. Smith FW,
    6. Ugurbil K,
    7. Goebel R,
    8. Yacoub E
    (2015) Contextual feedback to superficial layers of V1. Curr Biol 25:2690–2695. https://doi.org/10.1016/j.cub.2015.08.057 pmid:26441356
    OpenUrlCrossRefPubMed
  40. ↵
    1. Mumford JA,
    2. Turner BO,
    3. Ashby FG,
    4. Poldrack RA
    (2012) Deconvolving BOLD activation in event-related designs for multivoxel pattern classification analyses. NeuroImage 59:2636–2643. https://doi.org/10.1016/j.neuroimage.2011.08.076 pmid:21924359
    OpenUrlCrossRefPubMed
  41. ↵
    1. Nie L
    (2006) Strong consistency of the maximum likelihood estimator in generalized linear and nonlinear mixed-effects models. Metrika 63:123–143. https://doi.org/10.1007/s00184-005-0001-3
    OpenUrl
  42. ↵
    1. O’Connell TP,
    2. Chun MM
    (2018) Predicting eye movement patterns from fMRI responses to natural scenes. Nat Commun 9:5159. https://doi.org/10.1038/s41467-018-07471-9 pmid:30514836
    OpenUrlCrossRefPubMed
  43. ↵
    1. Peters B,
    2. Kriegeskorte N
    (2021) Capturing the objects of vision with neural networks. Nat Hum Behav 5:1127–1144. https://doi.org/10.1038/s41562-021-01194-6
    OpenUrl
  44. ↵
    1. Popal H,
    2. Wang Y,
    3. Olson IR
    (2019) A guide to representational similarity analysis for social neuroscience. Soc Cogn Affect Neurosci 14:1243–1253. https://doi.org/10.1093/scan/nsz099 pmid:31989169
    OpenUrlCrossRefPubMed
  45. ↵
    1. Raaijmakers JGW
    (2003) A further look at the “language-as-fixed-effect fallacy”. Can J Exp Psychol 57:141–151. https://doi.org/10.1037/h0087421
    OpenUrlPubMed
  46. ↵
    1. Rolls ET,
    2. Huang C-C,
    3. Lin C-P,
    4. Feng J,
    5. Joliot M
    (2020) Automated anatomical labelling atlas 3. NeuroImage 206:116189. https://doi.org/10.1016/j.neuroimage.2019.116189
    OpenUrlCrossRefPubMed
  47. ↵
    RStudio Team (2020). RStudio: Integrated development environment for R. RStudio. Inc., Boston, MA, 14. Available at: http://www.rstudio.com/.
  48. ↵
    1. Särkkä S,
    2. Solin A,
    3. Nummenmaa A,
    4. Vehtari A,
    5. Auranen T,
    6. Vanni S,
    7. Lin FH
    (2012) Dynamic retrospective filtering of physiological noise in BOLD fMRI: DRIFTER. Neuroimage 60:1517–1527. https://doi.org/10.1016/j.neuroimage.2012.01.067 pmid:22281675
    OpenUrlCrossRefPubMed
  49. ↵
    1. Standing L
    (1973) Learning 10000 pictures. Q J Exp Psychol 25:207–222. https://doi.org/10.1080/14640747308400340
    OpenUrlCrossRefPubMed
  50. ↵
    1. Tulving E,
    2. Wiseman S
    (1975) Relation between recognition and recognition failure of recallable words. Bull Psychon Soc 6:79–82. https://doi.org/10.3758/BF03333153
    OpenUrl
  51. ↵
    1. Wang W,
    2. Brashier NM,
    3. Wing EA,
    4. Marsh EJ,
    5. Cabeza R
    (2018) Neural basis of goal-driven changes in knowledge activation. Eur J Neurosci 48:3389–3396. https://doi.org/10.1111/ejn.14196 pmid:30290029
    OpenUrlPubMed
  52. ↵
    1. Wickham H
    (2011) Ggplot2. Wiley Interdiscip Rev Comput Stat 3:180–185. https://doi.org/10.1002/wics.147
    OpenUrlCrossRef
  53. ↵
    1. Wing EA,
    2. Ritchey M,
    3. Cabeza R
    (2015) Reinstatement of individual past events revealed by the similarity of distributed activation patterns during encoding and retrieval. J Cogn Neurosci 27:679–691. https://doi.org/10.1162/jocn_a_00740 pmid:25313659
    OpenUrlCrossRefPubMed
  54. ↵
    1. Winkler AM,
    2. Ridgway GR,
    3. Webster MA,
    4. Smith SM,
    5. Nichols TE
    (2014) Permutation inference for the general linear model. NeuroImage 92:381–397. https://doi.org/10.1016/j.neuroimage.2014.01.060 pmid:24530839
    OpenUrlCrossRefPubMed
  55. ↵
    1. Xu Y,
    2. Vaziri-Pashkam M
    (2021) Limits to visual representational correspondence between convolutional neural networks and the human brain. Nat Commun 12:2065. https://doi.org/10.1038/s41467-021-22244-7 pmid:33824315
    OpenUrlCrossRefPubMed
  56. ↵
    1. Yamins DLK,
    2. Hong H,
    3. Cadieu CF,
    4. Solomon EA,
    5. Seibert D,
    6. DiCarlo JJ
    (2014) Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc Natl Acad Sci U S A 111:8619–8624. https://doi.org/10.1073/pnas.1403112111 pmid:24812127
    OpenUrlAbstract/FREE Full Text
  57. ↵
    1. Zhou B,
    2. Lapedriza A,
    3. Khosla A,
    4. Oliva A,
    5. Torralba A
    (2018) Places: A 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell 40:1452–1464. https://doi.org/10.1109/TPAMI.2017.2723009
    OpenUrlCrossRefPubMed
Back to top

In this issue

The Journal of Neuroscience: 44 (21)
Journal of Neuroscience
Vol. 44, Issue 21
22 May 2024
  • Table of Contents
  • About the Cover
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Visual Recognition Memory of Scenes Is Driven by Categorical, Not Sensory, Visual Representations
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Visual Recognition Memory of Scenes Is Driven by Categorical, Not Sensory, Visual Representations
Ricardo Morales-Torres, Erik A. Wing, Lifu Deng, Simon W. Davis, Roberto Cabeza
Journal of Neuroscience 22 May 2024, 44 (21) e1479232024; DOI: 10.1523/JNEUROSCI.1479-23.2024

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
Visual Recognition Memory of Scenes Is Driven by Categorical, Not Sensory, Visual Representations
Ricardo Morales-Torres, Erik A. Wing, Lifu Deng, Simon W. Davis, Roberto Cabeza
Journal of Neuroscience 22 May 2024, 44 (21) e1479232024; DOI: 10.1523/JNEUROSCI.1479-23.2024
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • Episodic memory
  • recognition memory
  • representational similarity analysis
  • scene memory

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Articles

  • V2b neurons act via multiple targets to produce in phase inhibition during locomotion
  • Functional and Regional Specificity of Noradrenergic Signaling for Encoding and Retrieval of Associative Recognition Memory in the Rat
  • The Neurobiology of Cognitive Fatigue and Its Influence on Effort-Based Choice
Show more Research Articles

Behavioral/Cognitive

  • Zooming In and Out: Selective Attention Modulates Color Signals in Early Visual Cortex for Narrow and Broad Ranges of Task-Relevant Features
  • The Amygdala Regulates Social Motivation for Selective Vocal Imitation in Zebra Finches
  • Continuous Diffusion-Detected Neuroplasticity during Motor Learning
Show more Behavioral/Cognitive
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Accessibility
(JNeurosci logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.