Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE
PreviousNext
Research Articles, Behavioral/Cognitive

Deep Neural Networks and Visuo-Semantic Models Explain Complementary Components of Human Ventral-Stream Representational Dynamics

Kamila M. Jozwik, Tim C. Kietzmann, Radoslaw M. Cichy, Nikolaus Kriegeskorte and Marieke Mur
Journal of Neuroscience 8 March 2023, 43 (10) 1731-1741; https://doi.org/10.1523/JNEUROSCI.1424-22.2022
Kamila M. Jozwik
1Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tim C. Kietzmann
2Institute of Cognitive Science, University of Osnabrück, 49069 Osnabrück, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Radoslaw M. Cichy
3Department of Education and Psychology, Freie Universität Berlin, 14195 Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nikolaus Kriegeskorte
4Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York 10027
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marieke Mur
5Department of Psychology, Western University, London, Ontario N6A 3K7, Canada
6Department of Computer Science, Western University, London, Ontario N6A 3K7, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marieke Mur
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

Deep neural networks (DNNs) are promising models of the cortical computations supporting human object recognition. However, despite their ability to explain a significant portion of variance in neural data, the agreement between models and brain representational dynamics is far from perfect. We address this issue by asking which representational features are currently unaccounted for in neural time series data, estimated for multiple areas of the ventral stream via source-reconstructed magnetoencephalography data acquired in human participants (nine females, six males) during object viewing. We focus on the ability of visuo-semantic models, consisting of human-generated labels of object features and categories, to explain variance beyond the explanatory power of DNNs alone. We report a gradual reversal in the relative importance of DNN versus visuo-semantic features as ventral-stream object representations unfold over space and time. Although lower-level visual areas are better explained by DNN features starting early in time (at 66 ms after stimulus onset), higher-level cortical dynamics are best accounted for by visuo-semantic features starting later in time (at 146 ms after stimulus onset). Among the visuo-semantic features, object parts and basic categories drive the advantage over DNNs. These results show that a significant component of the variance unexplained by DNNs in higher-level cortical dynamics is structured and can be explained by readily nameable aspects of the objects. We conclude that current DNNs fail to fully capture dynamic representations in higher-level human visual cortex and suggest a path toward more accurate models of ventral-stream computations.

SIGNIFICANCE STATEMENT When we view objects such as faces and cars in our visual environment, their neural representations dynamically unfold over time at a millisecond scale. These dynamics reflect the cortical computations that support fast and robust object recognition. DNNs have emerged as a promising framework for modeling these computations but cannot yet fully account for the neural dynamics. Using magnetoencephalography data acquired in human observers during object viewing, we show that readily nameable aspects of objects, such as 'eye', 'wheel', and 'face', can account for variance in the neural dynamics over and above DNNs. These findings suggest that DNNs and humans may in part rely on different object features for visual recognition and provide guidelines for model improvement.

  • categories
  • features
  • object recognition
  • recurrent deep neural networks
  • source-reconstructed MEG data
  • vision

Introduction

When we view objects in our visual environment, the neural representation of these objects dynamically unfolds over time across the cortical hierarchy of the ventral visual stream. In brain recordings from both humans and nonhuman primates, this dynamic representational unfolding can be quantified from neural population activity, showing a staggered emergence of ecologically relevant object information such as facial features, followed by object categories, and then the individuation of these inputs into specific exemplars (Sugase et al., 1999; Hung et al., 2005; Meyers et al., 2008; Carlson et al., 2013; Clarke et al., 2013; Cichy et al., 2014; Ghuman et al., 2014; Isik et al., 2014; Hebart et al., 2018; Kietzmann et al., 2019b). These neural reverberations are thought to reflect the cortical computations that support object recognition.

Deep neural networks (DNNs) have recently emerged as a promising computational framework for modeling these cortical computations (Kietzmann et al., 2019a; Doerig et al., 2022). DNNs explain significant amounts of variance in neural data obtained from visual cortex in both humans and nonhuman primates (Khaligh-Razavi and Kriegeskorte, 2014; Yamins et al., 2014; Güçlü and van Gerven, 2015; Cichy et al., 2017; Bankson et al., 2018; Bonner and Epstein, 2018; Groen et al., 2018; Jozwik et al., 2018; Schrimpf et al., 2018; Zeman et al., 2020; Storrs et al., 2020b). The transformation of object representations from shallower to deeper layers of feedforward DNNs roughly matches the transformation of object representations observed in visual cortex as neural responses unfold over space and time (Khaligh-Razavi and Kriegeskorte, 2014; Güçlü and van Gerven, 2015; Cichy et al., 2017; Bankson et al., 2018; Jozwik et al., 2018; Zeman et al., 2020). Furthermore, DNNs that incorporate dynamics through recurrent processing provide additional explanatory power, possibly by better approximating the dynamic computations that the brain relies on for perceptual inference (O'Reilly et al., 2013; Liao and Poggio, 2016; Spoerer et al., 2017; Kubilius et al., 2018; Tang et al., 2018; Kar et al., 2019; Kietzmann et al., 2019a, b; Rajaei et al., 2019; Spoerer et al., 2020). However, DNNs still leave substantial amounts of variance in brain responses unexplained (Groen et al., 2018; Schrimpf et al., 2018; Bracci et al., 2019; Kietzmann et al., 2019b), and differences among feedforward architectures are small (Jozwik et al., 2019a, b), even after training and fitting (Storrs et al., 2020b). This raises the question of what representational features are left unaccounted for in the dynamic neural data.

To address this question, we enriched our modeling strategy with visuo-semantic object information. By “visuo-semantic”, we mean nameable properties of visual objects. Our visuo-semantic models consist of object labels generated by human observers, describing lower-level object features such as 'green', higher-level object features such as 'eye', and categories such as 'face'. The visuo-semantic labels can be interpreted as vectors in a space defined by humans at the behavioral level. In contrast to DNNs, our visuo-semantic models are not image computable. However, they provide unique benchmarks for comparison with image-computable models. Prior work indicates that visuo-semantic labels explain significant amounts of response variance in higher-level primate visual cortex (Tanaka, 1996; Kanwisher et al., 1997; Epstein and Kanwisher, 1998; Downing et al., 2001; Haxby et al., 2001; Kriegeskorte et al., 2008; Yamane et al., 2008; Freiwald et al., 2009; Huth et al., 2012; Issa and DiCarlo, 2012; Mur et al., 2012; Jozwik et al., 2016, 2018). Moreover, visuo-semantic models outperform DNN architectures (AlexNet, Krizhevsky et al., 2012; VGG, Simonyan and Zisserman, 2014) at predicting perceived object similarity in humans (Jozwik et al., 2017). In addition, a recent functional magnetic resonance imaging (fMRI) study showed that combining DNNs with a semantic feature model is beneficial for explaining visual object representations at advanced processing stages of the ventral visual stream (Devereux et al., 2018). Given these findings, we hypothesized that visuo-semantic models capture representational features in ventral-stream neural dynamics that DNNs fail to account for.

We tested this hypothesis on temporally resolved magnetoencephalography (MEG) data, which can capture representational dynamics at a millisecond timescale. Human brain data acquired at this rapid sampling rate provide rich information about temporal dynamics and, by extension, about the underlying neural computations. For example, in an MEG study that used source reconstruction to localize time series to distinct areas of the ventral visual stream, time series analyses revealed temporal interdependencies between areas suggestive of recurrent information processing (Kietzmann et al., 2019b).

In this work, we used representational similarity analysis (RSA) to test both DNNs and visuo-semantic models for their ability to explain representational dynamics observed across multiple ventral-stream areas in the human brain. As DNNs, we used feedforward CORnet-Z (Zero) and locally recurrent CORnet-R (Recurrent) models, which are inspired by the anatomy of monkey visual cortex (Kubilius et al., 2018). As visuo-semantic models, we used existing human-generated labels of object features and categories (Jozwik et al., 2016). We analyzed previously published source-reconstructed MEG data acquired in healthy human participants while they were viewing object images from a range of categories (Cichy et al., 2014; Kietzmann et al., 2019b). We investigated three distinct stages of processing in the ventral cortical hierarchy, lower-level visual areas (V1-V3), intermediate visual areas (V4t/lateral occipital cortex [LO]), and higher-level visual areas (inferior temporal cortex [IT]/parahippocampal cortex [PHC]) (Glasser et al., 2016). At each stage of processing, we tested both model classes for their ability to explain variance in the temporally evolving representations. This strategy allowed us to test what visuo-semantic object information is unaccounted for by DNNs as ventral-stream processing unfolds over space and time.

Materials and Methods

Stimuli

Stimuli were 92 colored images of real-world objects spanning a range of categories, including humans, nonhuman animals, natural objects, and manmade objects (12 human body parts, 12 human faces, 12 animal bodies, 12 animal faces, 23 natural objects, and 21 manmade objects). Objects were segmented from their backgrounds (Fig. 1a) and presented to human participants and models on a gray background.

Visuo-semantic models

Visuo-semantic models have been described in Jozwik et al. (2016, 2017), where further details can be found.

Definition of visuo-semantic models

To create visuo-semantic models, human observers generated feature labels (e.g., eye) and category labels (e.g., animal) for the 92 images (Jozwik et al., 2016). The visuo-semantic models are schematically represented in Figure 1, b and c. Feature labels were divided into colors, textures, shapes, and object parts, whereas category labels were divided into subordinate categories, basic categories, and superordinate categories. Labels were obtained in a set of two experiments. In experiment 1, a group of 15 human observers (mean age, 26 years; 11 females) generated feature and category labels for the object images. Human observers were native English speakers and had normal or corrected-to-normal vision. In the instruction, we defined features as visible elements of the shown object, including colors, textures, shapes and object parts. We defined a category as a group of objects that the shown object is an example of. The instruction contained two example images, not part of the 92 object-image set, with feature and category descriptions. We asked human volunteers to list a minimum of five descriptions, both for features and for categories. The 92 images were shown, in random order, on a computer screen using a Web-based implementation, with text boxes next to each image for human observers to type feature or category descriptions. We subsequently selected, for features and categories separately, those descriptions that were generated by at least three of 15 human observers. This threshold corresponds to the number of human observers who on average mentioned a particular feature or category for a particular image. The threshold is relatively lenient, but it allows the inclusion of a rich set of descriptions, which were further pruned in experiment 2. We subsequently removed descriptions that were either inconsistent with the instructions or redundant. Observers generated 212 feature labels and 197 category labels. These labels are the model dimensions. In experiment 2, a separate group of 14 human observers (mean age, 28 years; 7 females) judged the applicability of each model dimension to each image, thereby validating the dimensions generated in experiment 1 and providing, for each image, its value (present or absent) on each of the dimensions. Human observers were native English speakers and had normal or corrected-to-normal vision. During the experiment, the object images and the descriptions, each in random order, were shown on a computer screen using a Web-based implementation. The object images formed a column, whereas the descriptions formed a row; together they defined a matrix with one entry, or checkbox, for each possible image-description pair. We asked the human observers to judge for each description whether it correctly described each object image and, if so, to tick the associated checkbox. The image values on the validated model dimensions define the model (if agreed by at least 75% of human observers from experiment 2). To increase the stability of the models during subsequent fitting, we iteratively merged binary vectors that were highly correlated (r > 0.9), alternately computing pairwise correlations between the vectors and averaging highly correlated vector pairs, until all pairwise correlations were below threshold. The final feature and category models consisted of 119 and 110 dimensions, respectively.

Construction of the visuo-semantic representational dissimilarity matrices

To compare the models to the measured brain representations, the models and the data should reside in the same representational space. This motivates transforming our models to representational dissimilarity matrix (RDM) space. For each model dimension, we computed for each pair of images the squared difference between their values on that dimension. The squared difference reflects the dissimilarity between the two images in a pair. Given that a specific feature or category can either be present or absent in a particular image, image dissimilarities along a single model dimension are binary; they are zero if a feature or category is present or absent in both images, and one if a feature or category is present in one image but absent in the other. The dissimilarities were stored in an RDM, yielding as many RDMs as model dimensions. The full visuo-semantic model consists of 229 RDM predictors (119 feature predictors and 110 category predictors).

Deep neural networks

CORnet-Z and CORnet-R architectures have been described in Kubilius et al. (2018), where further details can be found.

Architecture and training

We used feedforward (CORnet-Z) and locally recurrent (CORnet-R; Kubilius et al., 2018) models in our analyses. The architectures of the two DNNs are schematically represented in Figure 1b. The architecture of CORnets is inspired by the anatomy of monkey visual cortex. Each processing stage in the model is thought to correspond to a cortical visual area, so the four model layers correspond to areas V1, V2, V4, and IT, respectively (Kubilius et al., 2018). The output of the last model layer is mapped to the behavioral choices of the model using a linear decoder. We chose the two CORnets because they have similar architectures, but one is purely feedforward, and the other is feedforward plus locally recurrent; they are one of the best models for predicting visual responses in monkey and human IT (Schrimpf et al., 2018; Jozwik et al., 2019a, b); and their architectures are relatively simple compared with other DNNs. Each visual area in CORnet-Z consists of a single convolution, followed by a rectified linear unit (ReLU) nonlinearity and max pooling. CORnet-R introduces local recurrent dynamics within an area. The recurrence occurs only within an area; there are no bypass or feedback connections between areas. For each area, the input is downscaled twofold, and the number of channels is increased twofold by passing the input through a convolution, followed by group normalization (Wu and He, 2018) and a ReLU nonlinearity. The internal state of the area (initially zero) is added to the result and passed through another convolution, again followed by group normalization and a ReLU nonlinearity, resulting in the new internal state of the area. At time step t0 there is no input to V2 and beyond, and as a consequence no image-elicited activity is present beyond V1. From time step t1 onward, the image-elicited activity is present in all visual areas as the output of the previous area is immediately propagated forward. CORnet-R was trained using five time steps (t0–t4). Both DNNs were trained on 1.2 million images from the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) database (Russakovsky et al., 2015). The ILSVRC database provides annotations that contain a category label for each image, assigning the object in an image to one of 1000 categories, for example, 'daisy', 'macaque', and 'speedboat'. The task of the networks is to classify each object image into one of the 1000 categories.

Construction of the DNN representational dissimilarity matrices

DNN representations of the 92 images were computed from the layer activations of CORnet-Z and CORnet-R. For CORnet-Z, we included the decoder layer and the final processing stage (output) from each visual area layer, which resulted in five layers. For CORnet-R, we included the decoder layer and the final processing stage from each visual area layer for each time step, which resulted in 21 layers. For each layer of CORnet-Z and CORnet-R, we extracted the unit activations in response to the images and converted these into one activation vector per image. For each pair of images, we computed the dissimilarity (1 minus Spearman's correlation) between the activation vectors. This yielded an RDM for each DNN layer. The resulting RDMs capture which stimulus information is emphasized and which is de-emphasized by the DNNs at different stages of processing. The full DNN model consists of 26 RDM predictors (five CORnet-Z predictors and 21 CORnet-R predictors).

MEG source-reconstructed data

Acquisition and analysis of the MEG data have been described in Cichy et al. (2014), where further details can be found. The source reconstruction of the MEG data has been described in Kietzmann et al. (2019b), where further details can be found.

Participants

Sixteen healthy human volunteers participated in the MEG experiment (mean age 26, 10 females). MEG source reconstruction analyses were performed for a subset of 15 participants for whom structural and functional MRI data were acquired. Participants had normal or corrected-to-normal vision. Before scanning, the participants received information about the procedure of the experiment and gave their written informed consent for participating. The experiment was conducted in accordance with the Ethics Committee of the Massachusetts Institute of Technology Institutional Review Board and the Declaration of Helsinki.

Experimental design and task

Stimuli were presented at the center of the screen for 500 ms while participants performed a paper clip detection task. Stimuli were overlaid with a light gray fixation cross and displayed at a width of 2.9° visual angle. Participants completed 10–14 runs. Each image was presented twice in every run in random order. Participants were asked to press a button and blink their eyes in response to a paper clip image shown randomly every three to five trials. These trials were excluded from further analyses. Each participant completed two MEG sessions.

MEG data acquisition and preprocessing

MEG signals were acquired from 306 channels (204 planar gradiometers, 102 magnetometers) using an Elekta Neuromag TRIUX system at a sampling rate of 1000 Hz. The data were bandpass filtered between 0.03 and 330 Hz, cleaned using spatiotemporal filtering, and downsampled to 500 Hz. Baseline correction was performed using a time window of 100 ms before stimulus onset.

MEG source reconstruction

The source reconstructions were performed using the minimum norm estimation (MNE) Python toolbox (Gramfort, 2013). We used structural T1 scans of individual participants to obtain volume conduction estimates using single-layer boundary element models (BEMs) based on the inner skull boundary. Instead of BEMs being based on the FreeSurfer watershed algorithm originally used in the MNE Python toolbox, we extracted BEMs using FieldTrip software as the original method yielded poor reconstruction results. The source space consisted of 10,242 source points per hemisphere. The source points were positioned along the gray/white matter boundary, as estimated via FreeSurfer. We defined source orientations as surface normals with a loose orientation constraint. We used an iterative closest point procedure for MEG/MRI alignment based on fiducials and digitizer points along the head surface, after initial alignment based on fiducials. We estimated the sensor noise covariance matrix from the baseline period (100–0 ms before stimulus onset) and regularized it according to the Ledoit–Wolf procedure (Ledoit and Wolf, 2004). We projected source activations onto the surface normal, obtaining one activation estimate per point in source space and time. Source reconstruction allowed us to estimate temporal dynamics in specific brain regions. Source reconstruction provides an estimate of what brain regions the signal is coming from rather than a direct measurement of representations in different brain regions (Hauk et al., 2022).

Definition of regions of interest

We used a multimodal brain atlas (Glasser et al., 2016) to define regions of interest (ROIs). We defined three ROIs covering lower-level (V1–V3), intermediate (V4t/LO1–3), and higher-level visual areas (IT/PHC, consisting of TE1-2p, fusiform face complex [FFC], ventral visual complex [VVC], ventromedial visual area [VMV]2–3, parahippocampal area [PHA]1–3). We converted the atlas annotation files to fsaverage coordinates (Fischl et al., 1999) and mapped them to each individual participant using spherical averaging.

Construction of the MEG representational dissimilarity matrices

We computed temporally changing RDM movies from the source-reconstructed MEG data for each participant, ROI, hemisphere, and session. We first extracted a trial-average multivariate source time series for each stimulus. We then computed an RDM at each time point by estimating the pattern distance between all pairs of images using correlation distance (1 minus Pearson's correlation). The RDM movies were averaged across hemispheres and sessions, resulting in one RDM movie for each participant and ROI.

Evaluating and comparing model performance

To assess performance of the models at explaining variance in the source-reconstructed MEG data, we performed first- and second-level model fitting as described below. Model fitting within the RSA framework has been described in Khaligh-Razavi and Kriegeskorte (2014), Jozwik et al. (2016, 2017), Kietzmann et al. (2019b), Storrs et al. (2020a), and Kaniuth and Hebart (2021), where further details can be found.

First-level model fitting: obtaining cross-validated model predictions

We could predict the brain representations by making the assumption that each model dimension, that is, each visuo-semantic object label or each DNN layer, contributes equally to the representation. Our visuo-semantic models use the squared Euclidean distance as the representational dissimilarity measure, which is the sum across dimensions of the squared response difference for a given pair of stimuli. The squared differences simply sum across dimensions, so the model prediction would be the sum of the single-dimension model RDMs. A similar reasoning applies to our DNN model, which uses the correlation distance as the representational dissimilarity measure. The correlation distance is proportional to the squared Euclidean distance between normalized patterns. However, we expect that not all model dimensions contribute equally to brain representations. To improve model performance, we linearly combined the different model dimensions to yield an object representation that best predicts the source-reconstructed MEG data. Because the squared differences sum across dimensions in the squared Euclidean distance, weighting the dimensions and computing the RDM is equivalent to a weighted sum of the single-dimension RDMs. When a dimension is multiplied by weight w, then the squared differences along that dimension are multiplied by w2. We can therefore perform the fitting on the RDMs. We performed model fitting for the DNN model (26 predictors), the visuo-semantic model (229 predictors), and for the following visuo-semantic submodels: color (10 predictors), texture (12 predictors), shape (15 predictors), object parts (82 predictors), subordinate categories (38 predictors), basic categories (67 predictors), and superordinate categories (5 predictors). We included a constant term in each model to account for homogeneous changes in dissimilarity across the whole RDM. For each model, we estimated the model weights using regularized (L2) linear regression, implemented in MATLAB using the Glmnet package (https://hastie.su.domains/glmnet_matlab/?). We standardized the predictors before fitting and constrained the weights to be non-negative. To prevent biased model predictions because of overfitting to the images, model predictions were estimated by cross-validation to a subset of the images held out during fitting. For each cross-validation fold, we randomly selected 84 of the 92 images as the training set and eight images as the test set, with the constraint that test images had to contain four animate objects (two faces and two body parts) and four inanimate objects. We used the pairwise dissimilarities of the training images to estimate the model weights. The model weights were then used to predict the pairwise dissimilarities of the eight held-out images. This procedure was repeated many times until predictions were obtained for all pairwise dissimilarities. For each cross-validation fold, we determined the best regularization parameter (i.e., the one with the minimum squared error between prediction and data) using nested cross-validation to held-out images within the training set. We performed the first-level fitting procedure for each participant, ROI, and time point.

Second-level model fitting: estimating model performance

We estimated model performance using a second-level general linear model (GLM) approach. We used the cross-validated RDM predictions from the first-level model fitting as GLM predictors. We included a constant term in the GLM to account for homogeneous changes in dissimilarity across the whole RDM. We fit the GLM predictors to the source-reconstructed MEG data using non-negative least squares. We first estimated the variance explained by each individual model when fit in isolation (reduced GLM). We next estimated the variance explained by the visuo-semantic and DNN models when fit simultaneously (full GLM). We then computed the unique variance explained by each model by subtracting the variance explained by the reduced GLMs from the variance explained by the full GLM. For example, to compute the unique variance explained by the visuo-semantic model, we subtracted the variance explained by the DNN model from the variance explained by the full GLM. This approach allowed us to address whether visuo-semantic models capture representational features in ventral-stream dynamics that DNNs fail to account for and vice versa. We also estimated the unique variance explained in the source-reconstructed MEG data for visuo-semantic submodels in the presence of the DNN model, again by fitting a full GLM (all models included) and reduced GLMs (excluding the model of interest). We performed the second-level GLM fitting procedure for each participant, ROI, and time point.

Statistical inference on model performance

To evaluate the significance of the (unique) variance explained by each model across participants, we first subtracted an estimate of the prestimulus baseline in each participant and then performed a one-sided Wilcoxon signed-rank test against zero. The prestimulus baseline was defined as the average (unique) variance explained between 200 and 0 ms before stimulus onset. We also tested whether and when the (unique) variance explained differed between the visuo-semantic and DNN models using a two-sided Wilcoxon signed-rank test. We controlled the expected false discovery rate at 0.05 across time points for each model evaluation, model comparison, and ROI. We used a continuity criterion (minimally 10 consecutive significant time points sampled every 2 ms = 20 ms) to report significant time points in the manuscript text. For completeness, Figures 2 and 3 show significant time points both before and after applying the continuity criterion. Lines shown in Figures 2, 3, and 4 were low-pass filtered at 80 Hz (Butterworth IIR filter, order 6) for better visibility. Statistical inference is based on unsmoothed data.

Data availability

The datasets and code generated during the current study are available from the corresponding authors on request.

Results

DNNs better explain lower-level visual representations, visuo-semantic models better explain higher-level visual representations

We first evaluated the overall ability of the DNN and visuo-semantic models to explain the time course of information processing along the human ventral visual stream. We hypothesized that visuo-semantic models capture representational features in neural data that DNNs may fail to account for. Figure 1 shows an overview of our approach. We computed RDM movies from the source-reconstructed MEG data to characterize how the ventral-stream object representations evolved over time in each participant. We computed an RDM movie for each participant and ROI and explained variance in the movies using a DNN model and a visuo-semantic model. The DNN model consisted of internal object representations in layers of CORnet-Z, a purely feedforward model, and CORnet-R, a locally recurrent variant (Kubilius et al., 2018), to account for both feedforward and locally recurrent computations. The visuo-semantic model consisted of human-generated labels of object features (e.g., brown, furry, round, ear; 119 labels) and categories (e.g., Great Dane, dog, organism; 110 labels) for the object images presented during the MEG experiment (Jozwik et al., 2016). We computed model predictions by linearly combining either all DNN layers or all visuo-semantic labels to best explain variance in the RDM movies across time. We evaluated the model predictions on data for images left out during fitting. For each model, we tested whether and when the variance explained in the RDM movies exceeded the prestimulus baseline using a one-sided Wilcoxon signed-rank test. We also tested whether and when the amounts of explained variance differed between the two models using a two-sided Wilcoxon signed-rank test. We controlled the expected false discovery rate at 0.05 across time points. We applied a continuity criterion (20 ms) for reporting results in the text.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Schematic overview of approach: stimulus set, models, data, and model fitting. a, Stimulus set. Stimuli are 92 colored images of real-world objects spanning a range of categories, including humans, nonhuman animals, natural objects, and manmade objects. b, Visuo-semantic models and DNNs. Visuo-semantic models consist of human-generated labels of object features and categories for the 92 images. Example labels are shown for the dog face encircled in a. DNNs are feedforward and locally recurrent CORnet architectures trained with category supervision on the ILSVRC database. These architectures are inspired by the processing stages of the primate ventral visual stream from V1 to IT. c, Object representations for each model. We characterized object representations by computing RDMs. We computed one RDM per model dimension, that is, one for each visuo-semantic label or DNN layer. For each visuo-semantic model dimension, RDMs were computed by extracting the value for each image on that dimension and computing pairwise dissimilarities (squared difference) between the values. For each CORnet-Z and CORnet-R layer, RDMs were computed by extracting an activity pattern across model units for each image and computing pairwise dissimilarities (1 minus Spearman's r) between the activity patterns. d, Human source-reconstructed MEG data for an example participant. MEG data were acquired in 15 healthy adult human participants while they were viewing the 92 images (stimulus duration, 500 ms). We analyzed source-reconstructed data from three ROIs, V1–V3, V4t/LO, and IT/PHC. We computed an RDM for each participant, region, and time point. RDMs were computed by extracting an activity pattern for each image and computing pairwise dissimilarities (1 minus Pearson's r) between the activity patterns. e, Schematic overview of model fitting. We tested two model classes, a visuo-semantic model consisting of all category and feature RDMs and a DNN model consisting of all CORnet-Z and CORnet-R layer RDMs. The RDMs serve as model predictors. We first fit each model to the MEG RDMs for each participant, region, and time point, using cross-validated regularized linear regression. The cross-validated model predictions were then used in a second-level GLM approach to estimate the variance explained by each model separately and by both models combined.

For lower-level visual cortex (V1–V3), the DNN model explained significant amounts of variance between 60 and 638, and 818 and 884 ms after stimulus onset, whereas the visuo-semantic model did so between 118 and 660 ms after stimulus onset (118–142, 146–178, 194–256, 264–414, 430–458, 486–520, 570–598, 608–660 ms; Fig. 2a). The DNN model explained more variance than the visuo-semantic model during the early (66–128 ms) as well as the late (422–516, 520–544, 820–844 ms) phases of the response. For intermediate visual cortex (V4t/LO), the DNN model explained variance predominantly between 62 and 610 ms after stimulus onset (62–562, 590–610, 820–848, 854–874, 952–976 ms), whereas the visuo-semantic model explained variance predominantly between 110 and 562 ms after stimulus onset (110–478, 482–562, 832–854 ms; Fig. 2a). The amount of explained variance did not significantly differ between the two models. The results for lower-level visual cortex indicate that the DNN model outperformed the visuo-semantic model at explaining object representations during the early phase of the response (<128 ms after stimulus onset), as well as the late phase of the response (>422 ms after stimulus onset). In contrast, for higher-level visual cortex (IT/PHC), the visuo-semantic model outperformed the DNN model. The DNN model explained variance only between 182 and 270 ms after stimulus onset (Fig. 2a). The visuo-semantic model explained variance during a longer time window, between 96 and 658 ms after stimulus onset (96–464, 468–500, 542–578, 606–658 ms; Fig. 2a). Furthermore, the visuo-semantic model explained more variance than the DNN model between 146 and 488 ms after stimulus onset (146–188, 196–234, 326–344, 348–402, 412–464, 468–488 ms). In summary, the results across the ventral-stream regions show a reversal in which model best explains variance in the RDM movies, from the DNN model in lower-level visual cortex, starting at 66 ms after stimulus onset, to the visuo-semantic model in higher-level visual cortex, starting at 146 ms after stimulus onset.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

DNNs better explain lower-level visual representations, and visuo-semantic models better explain higher-level visual representations. a, Variance explained by the DNNs (green) and visuo-semantic models (blue) in the source-reconstructed MEG data. Top, Significant variance explained is indicated by green and blue points (one-sided Wilcoxon signed-rank test, p < 0.05 corrected). Significant differences between models in variance explained are indicated by gray points (two-sided Wilcoxon signed-rank test, p < 0.05 corrected). Lighter colors indicate individually significant time points, and darker colors indicate time points that additionally satisfy a continuity criterion (minimally 20 ms of consecutive significant time points). The shaded area around the lines shows the SEM across participants. The x-axis shows time relative to stimulus onset. The gray horizontal bar on the x-axis indicates the stimulus duration. b, Unique variance explained by the DNNs and visuo-semantic models in the source-reconstructed MEG data. To estimate the unique variance explained by each model class, we used a second-level GLM approach. Unique variance explained was computed by subtracting the variance explained by a reduced GLM (excluding the model class of interest) from the variance explained by a full GLM (including both model classes). Conventions are the same as in a.

Visuo-semantic models explain unique variance in higher-level visual representations

Our results suggest that DNNs and visuo-semantic models explain complementary components of human ventral-stream representational dynamics. To explicitly test this hypothesis, we assessed the unique contributions of the two models. For this, we first computed the best RDM predictions for each model class and then used the resulting cross-validated RDM predictions in a second-level GLM in which we combined the two model classes. We computed the unique contribution of a model class by subtracting the variance explained by the reduced model (i.e., the GLM without the model class of interest) from the variance explained by the full model (including both model classes). For lower-level visual cortex (V1–V3), the DNN model explained unique variance between 60 and 638 and 818 and 884 ms after stimulus onset, whereas the visuo-semantic model did so between 124 and 654 ms after stimulus onset (124–142, 148–170, 228–246, 298–364, 368–412, 612–654 ms; Fig. 2b). For intermediate visual cortex (V4t/LO), the DNN model explained unique variance predominantly between 62 and 610 ms after stimulus onset (62–558, 590–610, 820–848, 952–976 ms), whereas the visuo-semantic model did so predominantly between 118 and 546 ms after stimulus onset (118–478, 490–546, 832–854 ms; Fig. 2b). These results indicate that the DNN and visuo-semantic models each explained a significant amount of unique variance in lower-level and intermediate visual cortex compared with the baseline period. However, for lower-level visual cortex, the DNN model explained more unique variance than the visuo-semantic model during the early (66–128 ms) as well as the late phases of the response (422–516, 520–544, 820–844 ms). For intermediate visual cortex, the unique variance explained did not significantly differ between the two models. For higher-level visual cortex (IT/PHC), only the visuo-semantic model explained unique variance, between 104 and 640 ms after stimulus onset (104–464, 468–500, 542–578, 608–640 ms). Furthermore, the visuo-semantic model explained significantly more unique variance than the DNN model between 146 and 488 ms after stimulus onset (146–188, 196–234, 326–344, 348–402, 412–464, 468–488 ms; Fig. 2b). These results indicate that in the context of a visuo-semantic predictor the tested DNNs explain unique variance at lower-level but not higher-level stages of visual processing. which instead show a unique contribution of visuo-semantic models. Visuo-semantic models appear to explain components of the higher-level visual representations that DNNs fail to fully capture, starting at 146 ms after stimulus onset.

Object parts and basic categories contribute to the unique variance explained by visuo-semantic models in higher-level visual representations

To better understand which components of the visuo-semantic model contribute to explaining unique variance in the higher-level visual representations, we repeated our analyses separately for subsets of object features and subsets of categories. We grouped the visuo-semantic labels into the following subsets: color, texture, shape, and object parts and subordinate, basic, and superordinate categories (Fig. 1b). The dimensionality of the submodels was naturally smaller than that of the full visuo-semantic model, which consisted of 229 object labels. The number of dimensions for the submodels was as follows: color (10), texture (12), shape (15), object parts (82), subordinate categories (38), basic categories (67), superordinate categories (5). Some of the submodels explained a similar amount of variance as the full visuo-semantic model (Fig. 3a), which indicates that including fewer dimensions did not necessarily reduce model performance. A more in-depth understanding of the relationship between model dimensionality and performance remains an important objective for future study. Here, we found that among the object features, only object parts explained variance in higher-level visual cortex (IT/PHC; Fig. 3a). Furthermore, object parts explained unique variance in higher-level visual cortex, whereas the DNN model did not (Fig. 3b). Among the categories, subordinate and basic categories explained variance in higher-level visual cortex (Fig. 3a). Furthermore, each of these models explained unique variance in higher-level visual cortex, whereas the DNN model did not (Fig. 3b). We next evaluated the three best predictors among the object features and categories together in the context of the DNN predictor. Although object parts, subordinate categories, basic categories, and DNNs all explained variance in higher-level visual cortex, only object parts and basic categories explained unique variance (Fig. 3a,b).

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Object parts and basic categories contribute to the unique variance explained by visuo-semantic models in higher-level visual representations. a, Variance explained by the object features (color, texture, shape, object parts), categories (subordinate, basic, superordinate), and deep neural networks in the source-reconstructed MEG data. Conventions are the same as in Figure 2a. b, Unique variance explained by the object features, categories, and deep neural networks in the source-reconstructed MEG data. Conventions are the same as in Figure 2b.

Discussion

Neural representations of visual objects dynamically unfold over time as we are making sense of the visual world around us. These representational dynamics are thought to reflect the cortical computations that support human object recognition. Here, we show that DNNs and human-derived visuo-semantic models explain complementary components of representational dynamics in the human ventral visual stream, estimated via source-reconstructed MEG data. We report a gradual reversal in the importance of DNN and visuo-semantic features from lower- to higher-level visual areas. DNN features explain variance over and above visuo-semantic features in lower-level visual areas V1–V3 starting early in time (at 66 ms after stimulus onset). In contrast, visuo-semantic features explain variance over and above DNN features in higher-level visual areas IT/PHC starting later in time (at 146 ms after stimulus onset). Among the visuo-semantic features, object parts and basic categories drive the advantage over DNNs. Our results suggest that a significant component of the variance unexplained by DNNs in higher-level visual areas is structured and can be explained by relatively simple, readily nameable aspects of the images. Figure 4 shows a visual summary of our results. Consistent with our hypothesis, our findings suggest that current DNNs fail to fully capture the visuo-semantic features represented in higher-level human visual cortex and suggest a path toward more accurate models of ventral-stream computations.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

DNNs and visuo-semantic models explain complementary components of human ventral-stream representational dynamics. To summarize our findings, we computed a model difference score based on the results shown in Figure 2b. We subtracted the unique variance explained by the visuo-semantic models from that explained by the DNNs in the dynamic ventral-stream representations. Difference scores are shown for each ROI during the first 600 ms of stimulus processing. Results show a gradual reversal in the relative importance of DNN versus visuo-semantic features in explaining the visual representations as they unfold over space and time. Between 66 and 128 ms after stimulus onset, DNNs outperform visuo-semantic models in lower-level areas V1–V3 (gray line, positive deflection). This early time window is thought to be dominated by feedforward and local recurrent processing. In contrast, starting 146 ms after stimulus onset, visuo-semantic models outperform DNNs in higher-level visual areas IT/PHC (red line, negative deflection). The same pattern of complementary contributions of DNNs and visuo-semantic models seems to reappear during the late phase of the response, starting ∼400 ms after stimulus onset, when responses may reflect interactions between visual areas. These results show that DNNs fail to account for a significant component of variance in higher-level cortical dynamics, which is instead accounted for by visuo-semantic features, in particular object parts and basic categories. The peak of visuo-semantic model performance in higher-level areas (red vertical line) precedes the peak in intermediate areas (blue vertical line). This sequence of events aligns with the timing of possible feedback information flow from higher-level to intermediate areas (light gray rectangle and arrow) as reported in Kietzmann et al. (2019b). The shaded area around the lines shows the SEM across participants.

Our finding that DNNs outperform visuo-semantic models at explaining lower-level cortical dynamics replicates and extends prior fMRI work, which showed that DNNs explain response variance across all stages of the ventral stream, whereas visuo-semantic models predominantly explain response variance in higher-level visual cortex (Huth et al., 2012; Khaligh-Razavi and Kriegeskorte, 2014; Güçlü and van Gerven, 2015; Devereux et al., 2018; Jozwik et al., 2018). Using source-reconstructed MEG data, we show that the advantage of DNNs over visuo-semantic models in V1–V3 emerges early in time, starting within 70 ms after stimulus onset. The early advantage lasts for ∼60 ms. During this early time window, the response is likely dominated by feedforward and local recurrent processing as opposed to top-down feedback signals from higher-level areas (Isik et al., 2014; Kietzmann et al., 2019b). DNNs also outperform visuo-semantic models in V1–V3 late in time, starting around 420 ms after stimulus onset. The late advantage lasts for ∼120 ms. Prior analysis of the same source-reconstructed MEG data showed a relative increase in the explanatory power of lower-level visual features (GIST model; Oliva and Torralba, 2006) and face clustering in V1–V3 during this late time window (Kietzmann et al., 2019b). These effects were observed in the presence of a slightly elevated noise ceiling. During the late time window, the response may reflect an interplay between bottom-up stimulus processing and top-down feedback signals. Our results show the importance of analyzing temporally resolved neuroimaging data for revealing when in time competing models account for the rapid dynamic unfolding of human ventral-stream representations.

Our findings show that DNNs, despite reaching human-level performance on large-scale object recognition tasks (Schrimpf et al., 2018), fail to fully capture visuo-semantic features represented in higher-level human visual cortex, in particular object parts and basic categories. Higher-level visual representations in dynamic MEG data instead more closely resemble human perceptual judgements of object properties. In line with our results, prior fMRI work showed that DNNs only adequately accounted for higher-level visual representations after adding new representational features (Khaligh-Razavi and Kriegeskorte, 2014; Devereux et al., 2018; Storrs et al., 2020a, b). The new features were either explicit semantic features (Devereux et al., 2018) or were created by linearly combining DNN features to emphasize categorical divisions observed in the higher-level visual representations, including the division between faces and nonfaces and between animate and inanimate objects (Khaligh-Razavi and Kriegeskorte, 2014; Storrs et al., 2020a). Our results show that visuo-semantic models start outperforming DNNs in higher-level visual areas ∼150 ms after stimulus onset. This timeline coincides with the emergence of animate clustering in these areas (Kietzmann et al., 2019b) as well as with the emergence of conceptual object representations as reported in prior MEG work (Bankson et al., 2018). Our results are also consistent with an earlier MEG study that showed that adding semantic features to a simpler HMAX (Hierarchical Model and X) was beneficial for modeling object representations in visual cortex starting ∼200 ms after stimulus onset (Clarke et al., 2015). DNNs may, at least in part, use different object features for object recognition than humans do. This conclusion is consistent with prior reports that DNNs rely more strongly on lower-level image features such as texture for object categorization (Geirhos et al., 2019).

Although we refer to both DNNs and visuo-semantic object labels as models, there are substantial differences between the two. DNNs are image computable, which means that they can compute a representation for any image. In contrast, visuo-semantic object labels are generated by human observers. How the human brain computes these labels remains unknown. This can be considered a disadvantage relative to DNNs, which are computationally explicit. We have full knowledge of their computational units and of the transformations applied to the image at each processing stage. However, it is challenging to pinpoint what these processing stages represent and how they may differ from those in humans. Visuo-semantic object labels, on the other hand, are easy to interpret. By comparing DNNs and visuo-semantic models in their ability to capture human ventral-stream representational dynamics, we can identify features in the data that DNNs fail to account for and use outcomes to guide model improvement.

Our results are consistent with theories that propose an integral role for feedback in visual perception (Rao and Ballard, 1999; Bar, 2003; Ahissar and Hochstein, 2004). As summarized in Figure 4, within the first 120 ms of stimulus processing, we observe a peak in the relative contribution of DNNs in lower-level and intermediate visual cortex, followed by a peak in the relative contribution of visuo-semantic models in higher-level visual cortex. These peaks may reflect a feedforward sweep of initial stimulus processing, which is thought to support perception of the gist of the visual scene and initial analysis of category information (Oliva and Torralba, 2006; Lowe et al., 2018; Kirchner and Thorpe, 2006; Liu et al., 2009). The initial peaks are followed by a visuo-semantic peak in intermediate visual cortex ∼150 ms after stimulus onset, which appears after a period of possible feedback information flow from higher-level to intermediate visual cortex (Kietzmann et al., 2019b), and additional fluctuations in relative model performance as time unfolds. These fluctuations include a reappearance of the advantage of DNNs over visuo-semantic models in lower-level visual cortex ∼420 ms after stimulus onset. The observed sequence of events is consistent with the reverse hierarchy theory of visual perception, which proposes an initial feedforward analysis for vision at a glance followed by explicit feedback signaling for vision with scrutiny (Ahissar and Hochstein, 2004). Future research should study visual perception under challenging viewing conditions, including occlusion and clutter, which are expected to strongly engage feedback signals and recurrent computation (Lamme and Roelfsema, 2000; O'Reilly et al., 2013; Spoerer et al., 2017; Tang et al., 2018; Kar et al., 2019; Rajaei et al., 2019; Kietzmann et al., 2019b).

Our study makes several important contributions to the existing body of work on modeling ventral-stream computations with DNNs. First, our results suggest that introducing locally recurrent connections to DNNs, to more closely match the architecture of the ventral visual stream, is not sufficient to fully capture the representational dynamics observed in higher-level human visual cortex. Second, our results tie together space and time through analysis of source-reconstructed MEG data. We show that DNNs outperform visuo-semantic models in lower-level visual areas V1–V3 starting at 66 ms after image onset, whereas visuo-semantic models outperform DNNs in higher-level visual areas IT/PHC starting at 146 ms after image onset. Third, we show that a significant component of the unexplained variance in higher-level cortical dynamics is structured and can be explained by readily nameable aspects of object images, specifically object parts and basic categories. In prior behavioral work using the same image set and visuo-semantic labels, we showed that category labels, but not object parts, outperformed DNNs at explaining object similarity judgments (Jozwik et al., 2017). These results suggest that compared with responses in ventral visual cortex behavioral similarity judgements may more strongly emphasize semantic object information (Mur et al., 2013; Jozwik et al., 2017; Groen et al., 2018). Future studies should extend this work to richer stimulus and model sets.

To build more accurate models of human ventral-stream computations, we need to provide DNNs with a more human-like learning experience. Two important areas for improvement are visual diet and learning objectives. Each of these shapes the internal object representations that develop during visual learning. Humans have a rich visual diet and learn to distinguish between ecologically relevant categories at multiple levels of abstraction, including faces, humans, and animals (Mur et al., 2013; Jozwik et al., 2016). DNNs have a more constrained visual diet and are trained on category divisions that do not entirely match the ones that humans learn in the real world. For example, the most common large-scale image dataset for training DNNs with category supervision (Russakovsky et al., 2015; Khaligh-Razavi and Kriegeskorte, 2014; Güçlü and van Gerven, 2015; Cichy et al., 2017; Kubilius et al., 2018; Schrimpf et al., 2018; Jozwik et al., 2019b; Storrs et al., 2020a, b), the ILSVRC 2012 dataset (Russakovsky et al., 2015), contains subordinate categories that most humans would not be able to distinguish, including dog breeds such as Schipperke and Groenendael, and lacks some higher-level categories relevant to humans, including face and animal. The path forward is unfolding along two main directions. The first is enrichment of the visual diet of DNNs by better matching the visual variability present in the real world, for example, by increasing variability in viewpoint or by training on videos instead of static images (Barbu et al., 2019; Zhuang et al., 2019). The second is to more closely match human learning objectives, for example, by introducing more human-like category objectives or unsupervised objectives (Higgins et al., 2020; Konkle and Alvarez, 2022; Mehrer et al., 2021; Zhuang et al., 2021). Training DNNs on more human-like visual diets and learning objectives may give rise to representational features that more closely match the visuo-semantic features represented in human higher-level visual cortex.

Footnotes

  • This work was supported by Wellcome Trust Grant 206521/Z/17/Z to K.M.J; European Research Council Grant ERC-StG-2022-101039524 (TIME) to T.C.K; German Research Council Grants CI241/1-1, CI241/3-1, and CI241/7-1 to R.M.C.; European Research Council Grant ERC-StG-2018-803370 to R.M.C.; and Natural Sciences and Engineering Research Council of Canada Discovery Grant RGPIN-2019-06741 to M.M. We thank Martin Schrimpf for input on the CORnets methods. We thank Taylor Schmitz for helpful comments on the manuscript.

  • The authors declare no competing financial interests.

  • Correspondence should be addressed to Kamila M. Jozwik at jozwik.kamila{at}gmail.com or Marieke Mur at mmur{at}uwo.ca

SfN exclusive license.

References

  1. ↵
    1. Ahissar M,
    2. Hochstein S
    (2004) The reverse hierarchy theory of visual perceptual learning. Trends Cogn Sci 8:457–464. https://doi.org/10.1016/j.tics.2004.08.011 pmid:15450510
    OpenUrlCrossRefPubMed
  2. ↵
    1. Bankson B,
    2. Hebart M,
    3. Groen I,
    4. Baker C
    (2018) The temporal evolution of conceptual object representations revealed through models of behavior, semantics and deep neural networks. Neuroimage 178:172–182. https://doi.org/10.1016/j.neuroimage.2018.05.037 pmid:29777825
    OpenUrlCrossRefPubMed
  3. ↵
    1. Bar M
    (2003) A cortical mechanism for triggering top-down facilitation in visual object recognition. J Cogn Neurosci 15:600–609. https://doi.org/10.1162/089892903321662976 pmid:12803970
    OpenUrlCrossRefPubMed
  4. ↵
    1. Barbu A,
    2. Mayo D,
    3. Alverio J,
    4. Luo W,
    5. Wang C,
    6. Gutfreund D,
    7. Tenenbaum J,
    8. Katz B
    (2019) ObjectNet: a large-scale bias-controlled dataset for pushing the limits of object recognition models. Paper presented at the meeting of Advances in Neural Information Processing Systems, Vancouver, Canada, November.
  5. ↵
    1. Bonner MF,
    2. Epstein RA
    (2018) Computational mechanisms underlying cortical responses to the affordance properties of visual scenes. PLoS Computat Biol 14:e1006111.
    OpenUrl
  6. ↵
    1. Bracci S,
    2. Ritchie JB,
    3. Kalfas I,
    4. Op de Beeck HP
    (2019) The ventral visual pathway represents animal appearance over animacy, unlike human behavior and deep neural networks. J Neurosci 39:6513–6525. https://doi.org/10.1523/JNEUROSCI.1714-18.2019 pmid:31196934
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Carlson T,
    2. Tovar DA,
    3. Kriegeskorte N
    (2013) Representational dynamics of object vision: the first 1000 ms. J Vis 13(10):1, 1–19.
    OpenUrlAbstract/FREE Full Text
  8. ↵
    1. Cichy RM,
    2. Pantazis D,
    3. Oliva A
    (2014) Resolving human object recognition in space and time. Nat Neurosci 17:455–462.
    OpenUrlCrossRefPubMed
  9. ↵
    1. Cichy RM,
    2. Khosla A,
    3. Pantazis D,
    4. Torralba A
    (2017) Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci Rep 6:27755.
    OpenUrl
  10. ↵
    1. Clarke A,
    2. Taylor KI,
    3. Devereux B,
    4. Randall B,
    5. Tyler LK
    (2013) From perception to conception: how meaningful objects are processed over time. Cereb Cortex 23:187–197. https://doi.org/10.1093/cercor/bhs002 pmid:22275484
    OpenUrlCrossRefPubMed
  11. ↵
    1. Clarke A,
    2. Devereux BJ,
    3. Randall B,
    4. Tyler LK
    (2015) Predicting the time course of individual objects with MEG. Cereb Cortex 3602–3612.
  12. ↵
    1. Devereux BJ,
    2. Clarke A,
    3. Tyler LK
    (2018) Integrated deep visual and semantic attractor neural networks predict fMRI pattern-information along the ventral object processing pathway. Sci Rep 8:10636. https://doi.org/10.1038/s41598-018-28865-1 pmid:30006530
    OpenUrlCrossRefPubMed
  13. ↵
    1. Doerig A,
    2. Sommers R,
    3. Seeliger K,
    4. Richards B,
    5. Ismael J,
    6. Lindsay G,
    7. Kording K,
    8. Konkle T,
    9. Van Gerven MAJ,
    10. Kriegeskorte N,
    11. Kietzmann TC
    (2022) The neuroconnectionist research programme. arXiv:2209.03718.
  14. ↵
    1. Downing PE,
    2. Jiang Y,
    3. Shuman M,
    4. Kanwisher N
    (2001) A cortical area selective for visual processing of the human body. Science 293:2470–2473. https://doi.org/10.1126/science.1063414 pmid:11577239
    OpenUrlAbstract/FREE Full Text
  15. ↵
    1. Epstein R,
    2. Kanwisher N
    (1998) A cortical representation of the local visual environment. Nature 392:598–601.
    OpenUrlCrossRefPubMed
  16. ↵
    1. Fischl B,
    2. Sereno MI,
    3. Tootell RBH,
    4. Dale AM
    (1999) High-resolution intersubject averaging and a coordinate system for the cortical surface. Hum Brain Mapp 8:272–284.
    OpenUrlCrossRefPubMed
  17. ↵
    1. Freiwald WA,
    2. Tsao DY,
    3. Livingstone MS
    (2009) A face feature space in the macaque temporal lobe. Nat Neurosci 12:1187–1196. pmid:19668199
    OpenUrlCrossRefPubMed
  18. ↵
    1. Geirhos R,
    2. Rubisch P,
    3. Michaelis C,
    4. Bethge M,
    5. Wichmann FA,
    6. Brendel W
    (2019) ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv:1811.12231. https://doi.org/10.48550/arXiv.1811.12231.
  19. ↵
    1. Ghuman AS,
    2. Brunet NM,
    3. Li Y,
    4. Konecky RO,
    5. Pyles JA,
    6. Walls SA,
    7. Destefino V,
    8. Wang W,
    9. Richardson RM
    (2014) Dynamic encoding of face information in the human fusiform gyrus. Nat Commun 5:5672. https://doi.org/10.1038/ncomms6672 pmid:25482825
    OpenUrlCrossRefPubMed
  20. ↵
    1. Güçlü U,
    2. van Gerven MJ
    (2015) Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J Neurosci 35:10005–10014.
    OpenUrlAbstract/FREE Full Text
  21. ↵
    1. Glasser MF,
    2. Coalson TS,
    3. Robinson EC,
    4. Hacker CD,
    5. Harwell J,
    6. Yacoub E,
    7. Ugurbil K,
    8. Andersson J,
    9. Beckmann CF,
    10. Jenkinson M,
    11. Smith SM,
    12. Van Essen DC
    (2016) A multi-modal parcellation of human cerebral cortex. Nature 536:171–178. https://doi.org/10.1038/nature18933 pmid:27437579
    OpenUrlCrossRefPubMed
  22. ↵
    1. Gramfort A
    (2013) MEG and EEG data analysis with MNE-Python. Front Neurosci 7:267.
    OpenUrlCrossRefPubMed
  23. ↵
    1. Groen II,
    2. Greene MR,
    3. Baldassano C,
    4. Fei-Fei L,
    5. Beck DM,
    6. Baker CI
    (2018) Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior. Elife 7:e32962.
    OpenUrlCrossRef
  24. ↵
    1. Hauk O,
    2. Stenroos M,
    3. Treder MS
    (2022) Towards an objective evaluation of EEG/MEG source estimation methods—The linear approach. Neuroimage 255:119177. https://doi.org/10.1016/j.neuroimage.2022.119177 pmid:35390459
    OpenUrlCrossRefPubMed
  25. ↵
    1. Haxby JV,
    2. Gobbini MI,
    3. Furey ML,
    4. Ishai A,
    5. Schouten JL,
    6. Pietrini P
    (2001) Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293:2425–2430.
    OpenUrlAbstract/FREE Full Text
  26. ↵
    1. Hebart MN,
    2. Bankson BB,
    3. Harel A,
    4. Baker CI,
    5. Cichy RM
    (2018) The representational dynamics of task and object processing in humans. Elife 7:e32816.
    OpenUrlCrossRefPubMed
  27. ↵
    1. Higgins I,
    2. Chang L,
    3. Langston V,
    4. Hassabis D,
    5. Summerfield C,
    6. Tsao D,
    7. Botvinick M
    (2020) Unsupervised deep learning identifies semantic disentanglement in single inferotemporal neurons. Nat Commun 12:6456.
    OpenUrl
  28. ↵
    1. Hung CP,
    2. Kreiman G,
    3. Poggio T,
    4. DiCarlo JJ
    (2005) Fast readout of object identity from macaque inferior temporal cortex. Science 310:863–866. https://doi.org/10.1126/science.1117593 pmid:16272124
    OpenUrlAbstract/FREE Full Text
  29. ↵
    1. Huth AG,
    2. Nishimoto S,
    3. Vu AT,
    4. Gallant JL
    (2012) A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron 76:1210–1224. https://doi.org/10.1016/j.neuron.2012.10.014 pmid:23259955
    OpenUrlCrossRefPubMed
  30. ↵
    1. Isik L,
    2. Meyers EM,
    3. Leibo JZ,
    4. Poggio T
    (2014) The dynamics of invariant object recognition in the human visual system. J Neurophysiol 111:91–102. https://doi.org/10.1152/jn.00394.2013 pmid:24089402
    OpenUrlCrossRefPubMed
  31. ↵
    1. Issa EB,
    2. DiCarlo JJ
    (2012) Precedence of the eye region in neural processing of faces. J Neurosci 32:1529–2401.
    OpenUrl
  32. ↵
    1. Jozwik KM,
    2. Kriegeskorte N,
    3. Mur M
    (2016) Visual features as stepping stones toward semantics: explaining object similarity in IT and perception with non-negative least squares. Neuropsychologia 83:201–226. https://doi.org/10.1016/j.neuropsychologia.2015.10.023 pmid:26493748
    OpenUrlCrossRefPubMed
  33. ↵
    1. Jozwik KM,
    2. Kriegeskorte N,
    3. Storrs KR,
    4. Mur M
    (2017) Deep convolutional neural networks outperform feature-based but not categorical models in explaining object similarity judgments. Front Psychol 8:1726. https://doi.org/10.3389/fpsyg.2017.01726 pmid:29062291
    OpenUrlCrossRefPubMed
  34. ↵
    1. Jozwik KM,
    2. Kriegeskorte N,
    3. Cichy RM,
    4. Mur M
    (2018) Deep convolutional neural networks, features, and categories perform similarly at explaining primate high-level visual representations. Paper presented at the Conference on Cognitive Computational Neuroscience, Philadelphia, September.
  35. ↵
    1. Jozwik KM,
    2. Schrimpf M,
    3. Kanwisher N,
    4. DiCarlo JJ
    (2019a) To find better neural network models of human vision, find better neural network models of primate vision. bioRxiv 688390. https://doi.org/10.1101/688390.
  36. ↵
    1. Jozwik KM,
    2. Lee M,
    3. Marques T,
    4. Schrimpf M,
    5. Bashivan P
    (2019b) Large-scale hyperparameter search for predicting human brain responses in the Algonauts challenge. bioRxiv 689844. https://doi.org/10.1101/689844. https://doi.org/10.1101/689844
  37. ↵
    1. Kaniuth P,
    2. Hebart MN
    (2021) Feature-reweighted RSA: A method for improving the fit between computational models, brains, and behavior. Neuroimage 257:119294.
    OpenUrl
  38. ↵
    1. Kanwisher N,
    2. McDermott J,
    3. Chun MM
    (1997) The fusiform face area: a module in human extrastriate cortex specialized for face perception. J Neurosci 17:4302–4311. https://doi.org/10.1523/JNEUROSCI.17-11-04302.1997 pmid:9151747
    OpenUrlAbstract/FREE Full Text
  39. ↵
    1. Kar K,
    2. Kubilius J,
    3. Schmidt K,
    4. Issa EB,
    5. DiCarlo JJ
    (2019) Evidence that recurrent circuits are critical to the ventral stream's execution of core object recognition behavior. Nat Neurosci 22:974–983.
    OpenUrl
  40. ↵
    1. Khaligh-Razavi SM,
    2. Kriegeskorte N
    (2014) Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput Biol 10:e1003915.
    OpenUrlCrossRefPubMed
  41. ↵
    1. Kietzmann TC,
    2. McClure P,
    3. Kriegeskorte N
    (2019a) Deep neural networks in computational neuroscience. In: Oxford research encyclopedia of neuroscience. Oxford: Oxford UP.
  42. ↵
    1. Kietzmann TC,
    2. Spoerer CJ,
    3. Sörensen LKA,
    4. Cichy RM,
    5. Hauk O,
    6. Kriegeskorte N
    (2019b) Recurrence required to capture the dynamic computations of the human ventral visual stream. arXiv:1903.05946. https://doi.org/10.48550/arXiv.1903.05946. https://doi.org/10.48550/arXiv.1903.05946
  43. ↵
    1. Kirchner H,
    2. Thorpe SJ
    (2006) Ultra-rapid object detection with saccadic eye movements: visual processing speed revisited. Vision Res 46:1762–1776. https://doi.org/10.1016/j.visres.2005.10.002 pmid:16289663
    OpenUrlCrossRefPubMed
  44. ↵
    1. Konkle T,
    2. Alvarez GA
    (2022) A self-supervised domain-general learning framework for human ventral stream representation. Nat Commun 13:491.
    OpenUrl
  45. ↵
    1. Kriegeskorte N,
    2. Mur M,
    3. Ruff D,
    4. Kiani R,
    5. Bodurka J,
    6. Esteky H,
    7. Tanaka K,
    8. Bandettini P
    (2008) Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron 60:1126–1141. https://doi.org/10.1016/j.neuron.2008.10.043 pmid:19109916
    OpenUrlCrossRefPubMed
  46. ↵
    1. Krizhevsky A,
    2. Sutskever I,
    3. Hinton GE
    (2012) Imagenet classification with deep convolutional neural networks. Commun ACM 60:84–90.
    OpenUrl
  47. ↵
    1. Kubilius J,
    2. Schrimpf M,
    3. Nayebi A,
    4. Bear D,
    5. Yamins DLK,
    6. DiCarlo JJ
    (2018) CORnet: modeling the neural mechanisms of core object recognition. bioRxiv 408385. https://doi.org/10.1101/408385. https://doi.org/10.1101/408385
  48. ↵
    1. Lamme V,
    2. Roelfsema PR
    (2000) The distinct modes of vision offered by feedforward and recurrent processing. Trends Neurosci 23:571–579. https://doi.org/10.1016/s0166-2236(00)01657-x pmid:11074267
    OpenUrlCrossRefPubMed
  49. ↵
    1. Ledoit O,
    2. Wolf M
    (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 88:365–411.
    OpenUrlCrossRef
  50. ↵
    1. Liao Q,
    2. Poggio T
    (2016) Bridging the gaps between residual learning, recurrent neural networks and visual cortex. arXiv:1604.03640. https://doi.org/10.48550/arXiv.1604.03640. https://doi.org/10.48550/arXiv.1604.03640
  51. ↵
    1. Liu H,
    2. Agam Y,
    3. Madsen JR,
    4. Kreiman G
    (2009) Timing, timing, timing: fast decoding of object information from intracranial field potentials in human visual cortex. Neuron 62:281–290. https://doi.org/10.1016/j.neuron.2009.02.025 pmid:19409272
    OpenUrlCrossRefPubMed
  52. ↵
    1. Lowe MX,
    2. Rajsic J,
    3. Ferber S,
    4. Walther DB
    (2018) Discriminating scene categories from brain activity within 100 milliseconds. Cortex 106:275–287. https://doi.org/10.1016/j.cortex.2018.06.006 pmid:30037637
    OpenUrlCrossRefPubMed
  53. ↵
    1. Mehrer J,
    2. Spoerer CJ,
    3. Jones EC,
    4. Kriegeskorte N,
    5. Kietzmann TC
    (2021) An ecologically motivated image dataset for deep learning yields better models of human vision. Proc Natl Acad Sci U S A 118:e2011417118.
    OpenUrlAbstract/FREE Full Text
  54. ↵
    1. Meyers EM,
    2. Freedman DJ,
    3. Kreiman G,
    4. Miller EK,
    5. Poggio T
    (2008) Dynamic population coding of category information in inferior temporal and prefrontal cortex. J Neurophysiol 100:1407–1419. https://doi.org/10.1152/jn.90248.2008 pmid:18562555
    OpenUrlCrossRefPubMed
  55. ↵
    1. Mur M,
    2. Da R,
    3. Bodurka J,
    4. De Weerd P,
    5. Pa B,
    6. Kriegeskorte N
    (2012) Categorical, yet graded—single-image activation profiles of human category-selective cortical regions. J Neurosci 32:8649–8662.
    OpenUrlAbstract/FREE Full Text
  56. ↵
    1. Mur M,
    2. Meys M,
    3. Bodurka J,
    4. Goebel R,
    5. Bandettini P,
    6. Kriegeskorte N
    (2013) Human object-similarity judgments reflect and transcend the primate-it object representation. Front Psychol 4:128. https://doi.org/10.3389/fpsyg.2013.00128 pmid:23525516
    OpenUrlCrossRefPubMed
  57. ↵
    1. Oliva A,
    2. Torralba A
    (2006) Building the gist of a scene: the role of global image features in recognition. Prog Brain Res 155:23–36.
    OpenUrlCrossRefPubMed
  58. ↵
    1. O'Reilly RC,
    2. Wyatte D,
    3. Herd S,
    4. Mingus B,
    5. Jilk DJ
    (2013) Recurrent processing during object recognition. Front Psychol 4:124.
    OpenUrlCrossRefPubMed
  59. ↵
    1. Rajaei K,
    2. Mohsenzadeh Y,
    3. Ebrahimpour R,
    4. Khaligh-Razavi SM
    (2019) Beyond core object recognition: recurrent processes account for object recognition under occlusion. PLoS Comput Biol 15:e1007001.
    OpenUrlCrossRefPubMed
  60. ↵
    1. Rao RPN,
    2. Ballard DH
    (1999) Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci 2:79–87. pmid:10195184
    OpenUrlCrossRefPubMed
  61. ↵
    1. Russakovsky O,
    2. Deng J,
    3. Su H,
    4. Krause J,
    5. Satheesh S,
    6. Ma S,
    7. Huang Z,
    8. Karpathy A,
    9. Khosla A,
    10. Bernstein M,
    11. Berg AC,
    12. Fei-Fei L
    (2015) ImageNet large scale visual recognition challenge. Int J Compu Vis 115:211–252.
    OpenUrl
  62. ↵
    1. Schrimpf M,
    2. Kubilius J,
    3. Hong H,
    4. Issa EB,
    5. Kar K,
    6. Prescott-Roy J,
    7. Rajalingham R,
    8. Yamins DLK,
    9. DiCarlo JJ
    (2018) Brain-score: which artificial neural network is most brain-like? bioRxiv 407007. https://doi.org/10.1101/407007. https://doi.org/10.1101/407007
  63. ↵
    1. Simonyan K,
    2. Zisserman A
    (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556. https://doi.org/10.48550/arXiv.1409.1556
  64. ↵
    1. Spoerer CJ,
    2. McClure P,
    3. Kriegeskorte N
    (2017) Recurrent convolutional neural networks: a better model of biological object recognition. Front Psychol 8:1551. https://doi.org/10.3389/fpsyg.2017.01551 pmid:28955272
    OpenUrlCrossRefPubMed
  65. ↵
    1. Spoerer CJ,
    2. Kietzmann TC,
    3. Mehrer J,
    4. Charest I,
    5. Kriegeskorte N
    (2020) Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision. PLoS Comput Biol 16:e1008215.
    OpenUrl
  66. ↵
    1. Storrs KR,
    2. Khaligh-Razavi SM,
    3. Kriegeskorte N
    (2020a) Noise ceiling on the crossvalidated performance of reweighted models of representational dissimilarity: addendum to Khaligh-Razavi and Kriegeskorte (2014). bioRxiv 003046. https://doi.org/10.1101/2020.03.23.003046. https://doi.org/10.1101/2020.03.23.003046
  67. ↵
    1. Storrs KR,
    2. Kietzmann TC,
    3. Walther A,
    4. Mehrer J,
    5. Kriegeskorte N
    (2020b) Diverse deep neural networks all predict human it well, after training and fitting. J Cogn Neurosci 33:2044–2064.
    OpenUrl
  68. ↵
    1. Sugase Y,
    2. Yamane S,
    3. Ueno S,
    4. Kawano K
    (1999) Global and fine information coded by single neurons in the temporal visual cortex. Nature 400:869–873. https://doi.org/10.1038/23703 pmid:10476965
    OpenUrlCrossRefPubMed
  69. ↵
    1. Tanaka K
    (1996) Inferotemporal cortex and object vision. Ann Rev Neurosci 19:109–139.
    OpenUrlCrossRefPubMed
  70. ↵
    1. Tang H,
    2. Schrimpf M,
    3. Lotter W,
    4. Moerman C,
    5. Paredes A,
    6. Ortega Caro J,
    7. Hardesty W,
    8. Cox D,
    9. Kreiman G
    (2018) Recurrent computations for visual pattern completion. Proc Natl Acad Sci U S A 115:8835–8840.
    OpenUrlAbstract/FREE Full Text
  71. ↵
    1. Wu Y,
    2. He K
    (2018) Group normalization. arXiv:1803.08494. https://doi.org/10.48550/arXiv.1803.08494.
  72. ↵
    1. Yamane Y,
    2. Carlson ET,
    3. Bowman KC,
    4. Wang Z,
    5. Connor CE
    (2008) A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nat Neurosci 11:1352–1360.
    OpenUrlCrossRefPubMed
  73. ↵
    1. Yamins DLK,
    2. Hong H,
    3. Cadieu CF,
    4. Solomon E,
    5. Seibert D,
    6. DiCarlo JJ
    (2014) Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc Natl Acad Sci U S A 111:8619–8624. https://doi.org/10.1073/pnas.1403112111 pmid:24812127
    OpenUrlAbstract/FREE Full Text
  74. ↵
    1. Zeman AA,
    2. Ritchie JB,
    3. Bracci S,
    4. Op de Beeck H
    (2020) Orthogonal representations of object shape and category in deep convolutional neural networks and human visual cortex. Sci Rep 10:2453. https://doi.org/10.1038/s41598-020-59175-0 pmid:32051467
    OpenUrlCrossRefPubMed
  75. ↵
    1. Zhuang C,
    2. Andonian A,
    3. Yamins D
    (2019) Unsupervised learning from video with deep neural embeddings. arXiv:1905.11954. https://doi.org/10.48550/arXiv.1905.11954.
  76. ↵
    1. Zhuang C,
    2. Yan S,
    3. Nayebi A,
    4. Schrimpf M,
    5. Frank MC,
    6. DiCarlo JJ,
    7. Yamins DLK
    (2021) Unsupervised neural network models of the ventral visual stream. Proc Natl Acad Sci U S A 118:e2014196118.
    OpenUrlAbstract/FREE Full Text
Back to top

In this issue

The Journal of Neuroscience: 43 (10)
Journal of Neuroscience
Vol. 43, Issue 10
8 Mar 2023
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Deep Neural Networks and Visuo-Semantic Models Explain Complementary Components of Human Ventral-Stream Representational Dynamics
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Deep Neural Networks and Visuo-Semantic Models Explain Complementary Components of Human Ventral-Stream Representational Dynamics
Kamila M. Jozwik, Tim C. Kietzmann, Radoslaw M. Cichy, Nikolaus Kriegeskorte, Marieke Mur
Journal of Neuroscience 8 March 2023, 43 (10) 1731-1741; DOI: 10.1523/JNEUROSCI.1424-22.2022

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
Deep Neural Networks and Visuo-Semantic Models Explain Complementary Components of Human Ventral-Stream Representational Dynamics
Kamila M. Jozwik, Tim C. Kietzmann, Radoslaw M. Cichy, Nikolaus Kriegeskorte, Marieke Mur
Journal of Neuroscience 8 March 2023, 43 (10) 1731-1741; DOI: 10.1523/JNEUROSCI.1424-22.2022
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • categories
  • features
  • object recognition
  • recurrent deep neural networks
  • source-reconstructed MEG data
  • vision

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Articles

  • Sex differences in histamine regulation of striatal dopamine
  • The Neurobiology of Cognitive Fatigue and Its Influence on Effort-Based Choice
  • Zooming in and out: Selective attention modulates color signals in early visual cortex for narrow and broad ranges of task-relevant features
Show more Research Articles

Behavioral/Cognitive

  • Zooming in and out: Selective attention modulates color signals in early visual cortex for narrow and broad ranges of task-relevant features
  • Target selection signals causally influence human perceptual decision making
  • The molecular substrates of second-order conditioned fear in the basolateral amygdala complex
Show more Behavioral/Cognitive
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Accessibility
(JNeurosci logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.