Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE
PreviousNext
Research Articles, Behavioral/Cognitive

Distinct Portions of Superior Temporal Sulcus Combine Auditory Representations with Different Visual Streams

Gabriel Fajardo, Mengting Fang and Stefano Anzellotti
Journal of Neuroscience 5 November 2025, 45 (45) e1188242025; https://doi.org/10.1523/JNEUROSCI.1188-24.2025
Gabriel Fajardo
1Department of Psychology and Neuroscience, Boston College, Chestnut Hill, Massachusetts 02467
2Department of Psychology, Columbia University, New York, New York 10027
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mengting Fang
3Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania 19104
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mengting Fang
Stefano Anzellotti
1Department of Psychology and Neuroscience, Boston College, Chestnut Hill, Massachusetts 02467
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Stefano Anzellotti
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • Peer Review
  • PDF
Loading

Abstract

In humans, the superior temporal sulcus (STS) combines auditory and visual information. However, the extent to which it relies on visual information from the ventral or dorsal stream remains uncertain. To address this, we analyzed open-source functional magnetic resonance imaging data collected from 15 participants (6 females and 9 males) as they watched a movie. We used artificial neural networks to investigate the relationship between multivariate response patterns in auditory cortex, the two visual streams, and the rest of the brain, finding that distinct portions of the STS combine information from the two visual streams with auditory information.

  • audio-visual
  • multivariate statistical dependence
  • neural networks
  • superior temporal sulcus

Significance Statement

The superior temporal sulcus (STS) combines auditory and visual inputs. However, visual information is processed along a ventral and a dorsal stream, and the extent to which these streams contribute to audio-visual combination is poorly understood. Is auditory information combined with visual information from both streams in a single centralized hub? Or do separate regions combine auditory information with ventral visual regions on one hand and with dorsal visual regions on the other? To address this question, we employed a multivariate connectivity method based on artificial neural networks. Our findings reveal that information from the two visual streams is combined with auditory information in distinct portions of STS, offering new insights into the neural architecture underlying multisensory perception.

Introduction

The human brain is adept at integrating visual and auditory information in order to create a coherent perception of the external world. Audio-visual integration contributes to sound localization (Zwiers et al., 2003) and plays a key role for emotion recognition (Piwek et al., 2015) as well as speech perception (Gentilucci and Cattaneo, 2005). Several phenomena demonstrate that the integration of visual and auditory cues shapes perceptual experience. In the McGurk effect, simultaneous presentation of a phoneme with a mismatched face video results in a distorted perception of the phoneme (McGurk and MacDonald, 1976). Similarly, presentation of mismatched auditory and visual stimuli can alter emotion recognition (Fagel, 2006), even when participants are explicitly instructed to focus only on one stimulus modality and ignore the other (Collignon et al., 2008), suggesting that audio-visual integration is automatic.

Audio-visual integration requires combining auditory information represented in the superior temporal gyrus with visual information encoded in occipitotemporal areas. Therefore, identifying brain regions that combine auditory and visual information is key for understanding the neural bases of audio-visual integration. Previous work found that the presentation of congruent audio-visual stimuli leads to supra-additive responses in the superior temporal sulcus (STS) compared with unimodal visual and auditory stimuli, whereas incongruent audio-visual stimuli leads to sub-additive responses (Calvert et al., 2000). In addition, participants’ susceptibility to the McGurk effect correlates with the strength of STS responses (Nath and Beauchamp, 2012). Furthermore, response patterns in the STS encode information about emotions and identity that generalizes across visual and auditory modalities (Peelen et al., 2010; Anzellotti and Caramazza, 2017). These studies indicate that the STS plays a pivotal role in combining auditory and visual information.

However, little is known about the precise visual representations that are involved. Visual information is processed by multiple streams: a ventral and a dorsal stream (Ungerleider, 1982). The ventral stream originates in ventral area V3 (V3v) and area V4, and the dorsal stream in dorsal area V3 (V3d) and area V5 (Felleman and Van Essen, 1987; Fig. 1a). Area V5 is associated with motion perception, featuring a large number of direction-selective neurons (Born and Bradley, 2005). In contrast, many neurons in V4 show sensitivity to color (Schein and Desimone, 1990). Correspondingly, a large number of neurons in the dorsal part of V3 respond to motion, and a large number of neurons in the ventral portion of V3 are tuned for color processing (Felleman and Van Essen, 1987). The existence of these different visual streams prompts questions about their relative contributions to the combination of visual and auditory information.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

a, Visual and auditory regions of interest (ROIs). b, Responses in a combination of visual (e.g., early dorsal visual stream; a, middle panel) and auditory regions were used to predict responses in the rest of the brain using MVPN. c, In order to identify brain regions that combine responses from auditory and visual regions, we identified voxels where predictions generated using the combined patterns from auditory regions and one set of visual regions jointly (as shown in b) are significantly more accurate than predictions generated using only auditory regions or only that set of visual regions.

Auditory information could be combined with visual information from both streams or with visual information from only one of the streams. If it is combined with visual information from both streams, auditory information could be combined with information from both visual streams in a single hub, or distinct regions could combine auditory information with each visual stream separately. To investigate this, we used artificial neural networks to model the relationship between patterns of response in auditory brain regions, in the initial segments of the ventral and dorsal visual streams, and in the rest of the brain (Fig. 1b), following a strategy that has been recently adopted to investigate the combination of information from multiple category-selective regions (Fang et al., 2023). Functional magnetic resonance imaging (fMRI) data collected while participants viewed rich audio-visual stimuli (Hanke et al., 2016) were analyzed with multivariate pattern dependence networks (MVPNs; Anzellotti et al., 2017; Fang et al., 2022). Searching for brain regions where responses are better predicted using a combination of auditory responses and responses in different visual streams than using auditory or visual responses in isolation revealed two distinct portions of STS that combine information between auditory regions and the two visual streams.

Materials and Methods

Experimental design and statistical analyses

Experimental paradigm

The blood oxygen level-dependent (BOLD) functional magnetic resonance imaging (fMRI) data was obtained from the StudyForrest dataset (https://www.studyforrest.org; Hanke et al., 2016; Sengupta et al., 2016). FMRI data was acquired while participants watched the movie Forrest Gump. The movie was divided into eight segments, each of which was ∼15 min long. These segments were presented to subjects in chronological order in eight separate scanner runs.

Data acquisition parameters

Fifteen right-handed subjects (6 females; 21–39 age range; mean, 29.4 years old), whose native language was German, were scanned in a 3 T Philips Achieva dStream MRI scanner equipped with a 32-channel head coil. Functional MRI data was acquired with a T2*-weighted echoplanar imaging sequence [gradient echo, 2 s repetition time (TR), 30 ms echo time, 90° flip angle, 1,943 Hz/Px bandwidth, parallel acquisition with sensitivity encoding (SENSE) reduction factor]. Scans captured 35 axial slices in ascending order, with 80 × 80 voxels (measuring 3.0 × 3.0 mm) of in-plane resolution, within a 240 mm field-of-view, utilizing an anterior-to-posterior phase encoding direction with a 10% gap between slices. The dataset also consists of root mean squared (RMS) annotations, which measure the loudness of the film.

Preprocessing

Data was first preprocessed using fMRIPrep (https://fmriprep.readthedocs.io/en/latest/index.html; Esteban et al., 2019), a robust pipeline for preprocessing a wide range of fMRI data. Anatomical MRI images were skull-stripped using ANTs (http://stnava.github.io/ANTs/; Avants et al., 2009), and FSL FAST was used for tissue segmentation. Functional MRI images were corrected for head movement using FSL MCFLIRT (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/MCFLIRT; Greve and Fischl, 2009) and were then coregistered with anatomical scans using FSL FLIRT (Jenkinson et al., 2002). Data was denoised with CompCor using five principal components extracted from the union of cerebrospinal fluid and white matter (Behzadi et al., 2007). The raw data of one subject could not be preprocessed with the fMRIPrep pipeline. The remaining 14 subjects’ data were used for the rest of the study.

ROI definition

Two sets of visual regions were identified by creating anatomical masks using Probabilistic Maps of Visual Topography in Human Cortex (Wang et al., 2015). This atlas provides probabilistic maps in MNI space of the likelihood that a voxel is a part of a certain brain region. The early ventral stream ROI was created by choosing the 80 voxels with the highest probability to be in the ventral parts of V3 (V3v) and V4 (Fig. 1a, top panel), and the early dorsal stream ROI was created by choosing the 80 voxels with the highest probability to be in the dorsal parts of V3 (V3d) and V5 (Fig. 1a, middle panel).

Since the anatomical location of auditory brain regions is more variable across subjects than visual brain regions (Rademacher et al., 2001), auditory ROIs were defined individually for each subject by identifying voxels where responses are parametrically modulated by the loudness of auditory stimuli. To this end, standard univariate GLM analyses were conducted using FSL FEAT (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FEAT; Woolrich et al., 2001), with root mean square (RMS) levels as the predictor. The 80 voxels with the highest t-scores were selected individually for each subject (example of a subject's auditory ROI mask in Fig. 1a, bottom panel). To ensure that the remaining analyses are independent from the ROI selection, we used only data from the first fMRI run for auditory ROI selection, and this run was not used in the remaining analyses (which were therefore conducted on the remaining seven runs). There were no overlapping voxels between the ROIs.

Additionally, a group-average gray matter mask was created using the gray matter probability maps that were generated during preprocessing. This gray matter mask had a total of 53,539 voxels and was used as the target of prediction in the multivariate pattern dependence analyses, explained in the following section.

MVPN: multivariate pattern dependence network

Recent research has taken advantage of the flexibility and computational power of artificial neural networks (ANNs) in order to analyze brain connectivity (Fang et al., 2022, 2023). The multivariate pattern dependence network (MVPN) method—an extension of MVPD (Anzellotti et al., 2017)—utilizes the power of ANNs to analyze the multivariate relationships between neural response patterns. It is important to note that MVPN measures the statistical relationship between response patterns in different regions, but it cannot detect the direction of information flow. We implemented MVPN in PyTorch, and the neural networks were trained on Tesla V100 graphics processing units (GPUs). In this study, we used five-layer dense neural networks with 100 nodes per hidden layer. This architecture was selected based on prior work (Fang et al., 2022), which systematically compared different network architectures and found the five-layer dense network to yield the highest overall predictive accuracy when using two different seed regions (FFA and PPA) to predict responses across the rest of the brain. The DNNs were optimized using stochastic gradient descent (SGD) with a mean squared error (MSE) loss function, a learning rate of 0.001, and a momentum of 0.9. The models were trained for 5,000 epochs. We used a batch size of 32, and batch normalization was applied to each layer's inputs. The ANNs were given as input the multivariate response patterns in one or more sets of brain regions (Fig. 1): auditory regions, ventral visual regions (V3v and V4), dorsal visual regions (V3d and V5), and all pairwise combinations. ANNs were trained to predict the patterns of responses in all gray matter voxels.

More precisely, the MVPN method works as follows. Consider an fMRI experiment with m experimental runs. We label the multivariate time courses in a predictor region as X1,…,Xm . Each matrix Xi is of size nX×Ti , where nX is the total number of voxels in the predictor region, and Ti is the number of timepoints in the ith experimental run. Similarly, let Y1,…,Ym be the multivariate time courses in the target region, where Yi is an nY×Ti matrix, nY is the total number of voxels in the target region, and Ti is the number of timepoints in the ith experimental run.

The neural networks were trained with a leave-one-run-out procedure to learn a function f such that:Ytrain=f(Xtrain)+Etrain, where Xtrain and Ytrain are data in the predictor region and data in the target region, respectively, during training. Etrain is the error term. Formally, for the ith experimental run, data in the rest of the runs made up the training set D∖i , where:D∖i={(X1,Y1),…,(Xi−1,Yi−1),(Xi+1,Yi+1),…,(Xm,Ym)}, while the dataset Di={(Xi,Yi)} is the left out run i testing set.

We used the proportion of variance explained between the predictor region and all other voxels in the gray matter mask in order to measure multivariate statistical dependence. For each target region voxel j, the variance explained varExpli(j) was calculated as follows:varExpli(j)=max{0,1−var(Yi(j)−fj(Xi))var(Yi(j))}, where Xi is the time course in the predictor region for the ith run and fj(Xi) is the MVPN prediction for the jth voxel. The values varExpli(j) obtained for the different runs i=1,…,m were averaged, thus yielding varExpl¯(j) .

Combined-minus-max whole-brain analysis

In order to identify brain regions that depend on the combination of auditory and visual response patterns, we analyzed the StudyForrest dataset with a novel approach we introduced in a recent study (Fang et al., 2023): the “combined-minus-max” approach, described in the following paragraphs. Since run 1 was used to functionally localize auditory regions (see above, ROI definition), to prevent circularity in the analysis, we used experimental runs 2 through 8 for the combined-minus-max analysis (a total of 7 runs).

In the combined-minus-max approach, first, we used MVPN to calculate the variance explained in each gray matter voxel using individual ROIs as predictors (early dorsal stream, early ventral stream, auditory stream). Then, we used pairs of these ROIs as joint inputs of the MVPN model in order to predict the neural responses of each gray matter voxel (Fig. 1b). We tested all pairs of the three streams: (1) posterior dorsal stream and auditory stream, (2) posterior ventral stream and auditory stream, and (3) posterior ventral stream and posterior dorsal stream.

If a voxel only encodes information from one of the streams, using the responses from multiple streams as predictors should not improve the variance explained. On the contrary, if the responses in the voxel are better predicted by a neural network including multiple streams combined than by a single stream, we can conclude that the voxel combines information from multiple streams. Therefore, we searched for voxels that combine information from multiple streams by computing an index given by the difference between the proportion of variance explained by a model using two streams jointly (the “combined” model) and the proportion of variance explained by a model using the best predicting stream among the two (the “max” model). This procedure is illustrated in Figure 1c.

Formally, for each voxel j, we can compute the variance explained by MVPN using as input responses from pairs of ROIs, varExplpair(j) , and the variance explained using as input responses from the best predicting individual ROIs, varExplmax(j) . For each voxel j, the difference in variance explained is then calculated as follows:ΔvarExpl(j)=varExplpair(j)−varExplmax(j). This ΔvarExpl(j) gives us a multistream dependence (MSD) index for each voxel that allowed us to identify candidate brain regions that jointly combine information from different streams. We calculated the statistical significance of ΔvarExpl values across subjects using statistical nonparametric mapping, utilizing the SnPM extension for SPM (http://nisox.org/Software/SnPM13/; Nichols and Holmes, 2002).

Control analysis

When using the combined-minus-max approach, there is still the possibility that the better predictive accuracy of the combined model might be due to the larger number of voxels in the combined analysis. To control for this possibility, we conducted a control analysis using voxels from the primary motor cortex (M1) as predictors (see Fang et al., 2023 as an example of an analogous approach). In this analysis, we randomly selected three nonoverlapping groups of 80 voxels in M1 (this number was chosen to match the number of voxels selected from the three streams: the posterior ventral, posterior dorsal, and auditory). We then used the responses from the three groups of M1 voxels to run a control analysis following the same procedure as the combined-minus-max analysis, and we computed the statistical significance of ΔvarExpl for each voxel in gray matter across subjects. Any regions showing statistical significance in this control analysis (p < 0.05, FWE-corrected with SnPM) were due to the larger number of voxels in the combined model, not multistream information combination. Therefore, they were excluded from the multistream dependence (MSD) analysis described above.

Face-selective ROI analysis

Face perception requires the combination of both static and dynamic information (Dobs et al., 2014). In addition, some face-selective regions have been found to represent identity during the perception of both visual and auditory stimuli (Anzellotti and Caramazza, 2017). Therefore, we applied the combined-minus-max approach to investigate the MSD effect in face-selective regions (Kanwisher et al., 2002; Yovel, 2016).

We used the first run in the category localizer to identify three face-selective ROIs: the occipital face area (OFA), the fusiform face area (FFA), and the face-selective posterior superior temporal sulcus (STS). Data were modeled with a standard GLM using FSL FEAT (Woolrich et al., 2001). Each seed ROI was defined as a sphere with a 9 mm radius centered in the peak for the contrast faces > bodies, artifacts, scenes, and scrambled images. Data from both the left and the right hemisphere were combined for each ROI, and the 80 voxels that showed the highest z-value for the contrast were selected. Visualizations of these ROIs can be found in Figure 3a. We then analyzed the variance explained measures for each voxel in these face-selective ROIs across our three pairings (posterior dorsal stream and auditory stream, posterior ventral stream and auditory stream, and posterior dorsal stream and posterior ventral stream).

Code/software accessibility

The code to implement the analysis can be obtained at https://github.com/sccnlab/PyMVPD. A description of the code can be found in Fang et al. (2022).

Results

STS combines information from auditory regions with information from different visual streams

To identify brain regions that jointly encoded information from different streams, we calculated the MSD index for each voxel. This index was computed as the difference between the proportion of variance explained by the combined model and that of the max model (see Materials and Methods section for a detailed explanation of the “combined-minus-max” approach). Group-level analyses were used to identify voxels with MSD indices significantly greater than zero. These voxels were considered as candidate MSD brain regions. Clusters with peaks having p < 0.05 (FWE corrected) were included.

To ensure that the combined model's predictive accuracy was not merely due to the larger number of voxels used in comparison with the max analysis, we conducted a control analysis. In the control analysis, we used three nonoverlapping groups of 80 voxels from the primary motor cortex (M1) as predictors, matching the number of voxels used from the auditory cortex and two visual streams in the main analyses. We then ran the combined-minus-max analysis with these M1 voxel groups and obtained statistical significance for each gray matter voxel across subjects.

The control analysis showed significant effects in the sensorimotor cortex (peak MNI coordinates = [0, −21, 64], [33, −42, 67], [−39, −18, 41]), premotor cortex (peak MNI coordinates = [−57, −9, 44], [57, 12, 31]), the bilateral intraparietal sulcus (peak MNI coordinates = [30, −69, 54], [−24, −72, 50]), and the angular gyrus (peak MNI coordinates = [−45, −69, 37]). Importantly, the control analyses did not show significant effects in ventral and lateral occipitotemporal regions. Therefore, significant findings in these regions in the main analysis could not be explained just by a difference between the number of predictor voxels in the combined analysis and the max analysis. Voxels that yielded significant effects in the control analysis (p < 0.05, FWE corrected) were excluded before calculating the MSD indices in the main analysis.

Combining response patterns from auditory regions and the early dorsal stream revealed significant effects in the bilateral STS (peak MNI coordinates = [−66, −42, 4], [45, −57, 18]) and within the posterior cingulate cortex (peak MNI coordinates = [15, −27, 41]; Table 1; p < 0.05, FWE corrected). Combining responses from auditory regions and the early ventral stream also revealed effects in the right STS (Table 2; p < 0.05, FWE corrected), but in a more posterior portion (peak MNI coordinates = [48, −57, 8]), at the boundary with the occipital lobe (Fig. 2a).

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

a, Voxels showing significant effects (p < 0.05, FWE corrected) for the combination of auditory responses with responses in V3d and V5 (red) and auditory responses with responses in V3v and V4 (green). b, Voxels showing significant effects for the combination of responses in V3v and V4 with responses in V3d and V5 (blue). c, Fisher transformed Pearson’s correlation values between the auditory + dorsal and auditory + ventral combined-minus-max models, computed across the top 50 voxels in the STS (left) and the top 100 voxels across the whole brain (right) showing the greatest change in variance explained across both models. d, Pearson’s correlation values between combined-minus-max effect patterns from the auditory + dorsal and auditory + ventral models within an STS ROI. We computed these correlations across 500 splits of the participants into two equal groups, comparing pattern similarity within the same model across splits (e.g., AUD + dorsal and AUD + dorsal) to the similarity of patterns between different models across splits (e.g., AUD + dorsal in split 1 to AUD + ventral in split 2: “AD1/AV2”).

View this table:
  • View inline
  • View popup
Table 1.

Regions combining responses between auditory regions and V3d and V5 showing significant t values (p < 0.01, FWE corrected) computed from the combined-max analysis

View this table:
  • View inline
  • View popup
Table 2.

Regions combining responses between auditory regions and V3v and V4 showing significant t values (p < 0.01, FWE corrected) computed from the combined-max analysis

These findings indicate that auditory information is not combined with information from both visual streams within one single STS hub. Instead, distinct portions of STS combine information from auditory regions and information from ventral and dorsal visual regions, respectively.

Robustness of the results across different data splits

In order to further evaluate the robustness of the results, we defined a broad bilateral STS region of interest via the “Superior Temporal Gyrus” map from WFU Pick Atlas. We then extracted the patterns of the combined-minus-max effects across voxels as vectors. For each split of the data, this procedure yielded a vector for the auditory + dynamic combined-minus-max results and another vector for the auditory + static combined-minus-max results. The robustness of the patterns across the two splits of the data was assessed by computing Pearson’s correlation between the vectors for the two halves. The correlation for vectors from the same analysis (e.g., between the first and second halves of the auditory + dynamic analysis) was compared with the correlation for vectors from different analyses (e.g., between the first half of the auditory + dynamic analysis and the second half of the auditory + static analysis), following a procedure inspired by prior work (Haxby et al., 2001). If the results are robust across different splits of the data, we expected to observe higher correlations between the patterns for the same analysis across the splits compared with the patterns for two different analyses. The results were in line with the prediction: correlations between the vectors for the same analysis were higher than correlations for vectors for different analyses across splits (Fig. 2d).

Quantifying distinct spatial distributions of auditory + ventral and auditory + dorsal effects

Using the STS ROI introduced in the previous section, for each subject individually, we retrieved the 50 voxels with the highest ΔvarExpl across both models (auditory + dorsal combined-minus-max and auditory + ventral combined-minus-max). We then computed Pearson’s correlation between the ΔvarExpl values of both models across these voxels, using a strategy inspired by previous work (Peelen et al., 2006). Since correlations that range from −1 to 1 violate the normality assumption, correlation values were Fisher transformed and submitted to a two-tailed t test across subjects to probe for spatial correlations between the two effects (auditory + dorsal and auditory + ventral; Fig. 2c). This revealed a significantly negative correlation (t(13) = −2.16, p < 0.05), suggesting that combination effects for auditory information with different visual streams involve spatially distinct neural substrates. To expand this investigation to other regions, we ran this analysis again at the whole brain level, this time using the 100 voxels with the highest ΔvarExpl across both models. The Fisher z transformed t test also revealed a significantly negative interaction (t(13) = −4.54, p < 0.001). These findings indicate that dorsal and ventral areas do indeed contribute to spatially distinct effects of combination with auditory cortex.

Ventral temporal cortex combines information from different visual streams

These results raise the question of whether and where information from early dorsal (V3d and V5) and ventral (V3v and V4) visual regions is combined. We adopted the same strategy to test this, searching for voxels that are better predicted by both visual streams jointly than by either stream in isolation. This analysis identified regions in the calcarine sulcus (V1 and V2) that are located upstream of V3, V4, and V5, and in regions in ventral occipitotemporal cortex, that are located downstream (peak MNI coordinates = [21, −102, 1]; Table 3; p < 0.05, FWE corrected; Fig. 2b). Notably, no effects for the combination of the two visual streams were observed in the STS. This is consistent with the finding that the combination of auditory information with different visual streams involves distinct cortical regions: if it happened in a single STS subregion, we would also expect to observe effects in that subregion for combining both visual streams.

View this table:
  • View inline
  • View popup
Table 3.

Regions combining responses between V3v and V4, and V3d and V5, showing significant t values (p < 0.01, FWE corrected) computed from the combined-max analysis

Combination of visual and auditory information outside the STS

Our results also suggest the involvement of brain regions outside of the STS in combining audio-visual information. The combined-minus-max analysis for the combination of auditory and the early dorsal visual stream responses also identified brain regions in the anterior temporal lobe (ATL; peak MNI coordinates = [−54, −6, −15]), the primary somatosensory cortex (S1; peak MNI coordinates = [3, −42, 61]), the supramarginal gyrus (peak MNI coordinates = [−54, −6, −15]), and the retrosplenial cortex (peak MNI coordinates = [30, −54, 4]).

The combined-minus-max analysis of auditory and early ventral visual stream responses revealed brain regions in the intraparietal sulcus (IPS; peak MNI coordinates = [−39, −51, 57]), retrosplenial cortex (peak MNI coordinates = [6, −42, 4]), caudate nucleus (peak MNI coordinates = [15, −9, 24]), and the lingual gyrus (peak MNI coordinates = [−27, −57, 4]).

The combined-minus-max analysis for the posterior dorsal and posterior ventral visual stream response pairings identified a distinct set of brain regions compared with the previous two analyses. The largest cluster size was located in V1 (peak MNI coordinates = [21, −102, 1]; Fig. 2b). Other brain regions included the bilateral parahippocampal place area (PPA; peak MNI coordinates = [−30, −48, −9], [30, −51, −9]) and the cerebellum (peak MNI coordinates = [−24, −78, −25]).

Inspecting the combined-minus-max maps of individual participants in search of other regions that might show these effects, and that might not appear in the second-level analyses due to greater topographic variability across individuals, did not reveal other clear candidate regions. This does not rule out that additional regions might be identified in the future using more powerful data acquisition and analysis methods.

Overlap between the auditory + ventral and auditory + dorsal effects was observed in posterior cingulate and in pulvinar in some individual participants, but these effects were variable across participants—further work will be needed to establish whether these regions combine auditory information with both ventral and dorsal representations. To probe for three-way combination effects across auditory, ventral and dorsal regions we conducted a combined-minus-max analysis of the three regions combined minus the maximum across each of the three different pairs (auditory + dynamic combined, auditory + static combined, dynamic + static combined). Statistical nonparametric analysis did not reveal anything past the threshold (FWE corrected p < 0.05); future work with more sensitive methods or greater statistical power might reveal some effects.

Combination of information from auditory regions and different visual streams within face-selective ROIs

Considering the importance of combining facial information with auditory information for the recognition of speech and emotions (Gentilucci and Cattaneo, 2005; Piwek et al., 2015), we studied the combination of auditory and visual representations from different streams within functionally localized face-selective regions (Fig. 3a). In the face-selective STS, the effect of combining auditory and dorsal responses was significantly greater than that of combining auditory and ventral responses (t(13) = 3.82, p < 0.05) and than that of combining ventral and dorsal responses (t(13) = 4.55, p < 0.01; Fig. 3b, top panel). This finding could be due to the type of visual information encoded in V3d and V5: previous work has shown that these regions respond to motion (Felleman and Van Essen, 1987; Born and Bradley, 2005). Combining information about visual motion with auditory information might support audio-visual integration during speech perception and emotion recognition.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

a, Face-selective ROIs: STS, FFA, and OFA. b, Box plots depicting the difference in variance explained between the “combined” and “max” analyses across subjects in different face-selective ROI voxels. * signifies p < 0.05, ** signifies p < 0.01, and *** signifies p < 0.001. Significantly higher combined-minus-max effects were observed in the face-selective STS for the combination of the auditory and posterior dorsal stream than for the other pairings. No significant differences were observed in the FFA across the different pairings. Significantly higher combined-minus-max effects were observed in the OFA for the combination of the posterior dorsal and posterior ventral streams than for the other pairings.

Unlike the face-selective STS, the fusiform face area (FFA) did not show significant differences between the pairwise combinations (Fig. 3b, middle panel). In the occipital face area (OFA), the effect of combining information from the two visual streams was significantly stronger than combining auditory and dorsal visual responses (t(13) = 5.11, p < 0.01) and than combining auditory and ventral visual responses (t(13) = 6.73, p < 0.001; Fig. 3b, bottom panel).

Discussion

Audio-visual integration is a fundamental process that allows for the unified perception of everyday experiences. Given that distinct visual streams encode different kinds of representations, this study sought to uncover what visual representations are combined with auditory information when engaging in audio-visual integration and what brain regions support the combination of responses from auditory regions and the different visual streams. The results demonstrate that both ventral and dorsal visual information is combined with auditory information but that distinct portions of posterior STS combine auditory information with visual information encoded in the two streams. The topography of combined-minus-max effects observed in the STS could be related to the types of features encoded in dorsal and ventral visual regions. Importantly, however, these results are only possible in the presence of audio-visual combination effects. If posterior STS encoded visual features that are well predicted by dorsal visual regions in isolation, and anterior STS encoded visual features that are well predicted by ventral visual regions in isolation, subtracting the max in the combined-minus-max analysis would remove these effects.

What are the specific factors that drive the observed topography of STS effects remains an open question. Meta-analyses suggest that different portions of posterior STS play different functional roles, including audio-visual integration, biological motion perception, theory of mind, and face processing (Hein and Knight, 2008). Meta-analyses, however, make it difficult to assess the degree of overlap between areas engaged in different functions: since different functions are probed in different participants, variability in response locations due to different functions is confounded with variability arising from individual differences. More recently, the investigation of multiple stimulus types within the same participants led to a more precise characterization of the distinct portions of the STS responsible for processing language, theory of mind, faces, voices, and biological motion (Deen et al., 2015). Relevant to the present results, Deen et al. (2015) analyzed posterior-to-anterior changes in functional specialization in posterior STS, observing greater responses for Theory of Mind tasks in more posterior portions, followed by biological motion, and ultimately by greater responses to faces and voices in anterior portions. The posterior-to-anterior organization observed in the present study, therefore, could indicate that different visual inputs are combined with auditory representations to serve the needs of distinct functional subsystems that occupy adjacent areas within STS. In order to study the relationship between the topography of the effects we identified in the present work and other functional subdivisions of STS, it will be necessary to perform both sets of analyses within the same group of participants.

Previous research on ventral stream representations suggests a possible functional role for the more posterior of the two STS hubs identified in this study. Effects for the combination of auditory information and the ventral visual stream were observed in a more posterior portion of the STS, and previous research has implicated the ventral visual stream in the recognition of the identity of objects (Ungerleider, 1982). Posterior portions of the STS that combine information from ventral visual regions and auditory regions might contribute to encoding the typical sounds produced by different kinds of objects, associating dogs with barking, cars with vrooming, and so on. In contrast, more anterior portions might encode the way different movements are associated with sounds—even when the identity or category of an object is held constant. For example, in face perception, the relationship between lip movements and phonemes is known to involve audio-visual integration mechanisms that lead to phenomena such as the McGurk effect (McGurk and MacDonald, 1976). In many other instances, sounds are produced by the dynamic interactions between multiple objects. Experiments with tailored designs, which include distinct conditions that separate between these different kinds of audio-visual information, will be needed to test this hypothesis. As an alternative hypothesis, the organization of the combination of auditory and visual information into two distinct portions of posterior STS might not be due to their engagement in supporting different functions, but to unique computational requirements of integrating auditory representations with different kinds of visual representations.

Focusing on face-selective regions of interest, we found that the combination of audio-visual information in the face-selective STS relies disproportionately on visual information encoded in dorsal visual regions. This is consistent with the observation that effects for the combination of auditory information with visual information from dorsal regions were located in more anterior portions of posterior STS in our whole-brain analyses and with the previous studies indicating that face responses also peak in more anterior portions of posterior STS (Deen et al., 2015). The latter finding could be due to the type of visual information encoded in V3d and V5: previous work has shown that these regions contain neurons that respond to motion (Felleman and Van Essen, 1987; Born and Bradley, 2005). Combining information about visual motion with auditory information might support audio-visual integration during speech perception. It will be interesting to test whether the effects for the combination of auditory information and dorsal visual representations reported here are localized to the same voxels showing an association with individual differences in susceptibility to the McGurk effect reported in previous work (Nath and Beauchamp, 2012).

Finally, the combination of visual information from the two visual streams was observed in ventral occipitotemporal cortex, and ROI analyses showed that the extent of these effects includes the OFA. Classical work has proposed the importance of motion to identify and segment objects (Spelke, 1990), leading to recent computational models of motion-based segmentation (Chen et al., 2022). We hypothesize that the combination of information from the two visual streams within occipitotemporal cortex could support motion-based segmentation. Considering the anatomical location of the effects that are colocalized with the earliest stages of category-selectivity (e.g., OFA), we hypothesize that motion-based segmentation might provide the basis for category-selectivity.

Our findings also implicate brain regions beyond the STS. Regarding the candidate MSD sites that were statistically dependent on information from the auditory and posterior ventral streams, the intraparietal sulcus (IPS) was the region with the highest t value. This region has been implicated in audio-visual integration in prior work (Lewis et al., 2000; Calvert et al., 2001).

Methodologically it is worth noting that the results obtained from the MVPN combined-minus-max analyses only establish correlational relationships. To establish causality between the joint responses from the auditory and different visual streams in MSD sites, future research could employ techniques that infer causality, such as transcranial magnetic stimulation-fMRI (TMS-fMRI). Further, our method shows that two regions jointly contribute to predict responses in a third region (i.e., statistical dependence), but we cannot determine precisely whether and how this information is integrated into a multimodal representation. In addition, we used a five-layer dense neural network to model multivariate pattern dependence across all ROI sets tested in this study. However, it is possible that the optimal model architecture for capturing brain interactions may differ depending on the specific set of predictor regions. Future work using different neural network architectures may potentially uncover additional effects. Despite these limitations, the results reveal a novel aspect of the large-scale topography of STS and provide insights into the neural architecture that supports our unified perception of the world.

The present work provides evidence for distinct portions of the multisensory posterior STS: a more posterior portion characterized by the combination of auditory and ventral representations and a more anterior portion characterized by the combination of auditory and dorsal representations. Clarifying the functional and causal contributions of these subdivisions of STS to behavior will require additional work, including importantly studies with causal methodologies.

Footnotes

  • We thank Wei Qiu for technical support. We also thank the StudyForrest researchers for sharing their data. This work was supported by a startup grant from Boston College and by National Science Foundation grant 19438672 to S.A.

  • ↵*G.F. and M.F. contributed equally to this work.

  • The authors declare no competing financial interests.

  • Correspondence should be addressed to Stefano Anzellotti at stefano.anzellotti{at}bc.edu.

SfN exclusive license.

References

  1. ↵
    1. Anzellotti S,
    2. Caramazza A
    (2017) Multimodal representations of person identity individuated with fMRI. Cortex 89:85–97. https://doi.org/10.1016/j.cortex.2017.01.013
    OpenUrlCrossRefPubMed
  2. ↵
    1. Anzellotti S,
    2. Caramazza A,
    3. Saxe R
    (2017) Multivariate pattern dependence. PLoS Comput Biol 13:e1005799. https://doi.org/10.1371/journal.pcbi.1005799
    OpenUrlCrossRefPubMed
  3. ↵
    1. Avants BB,
    2. Tustison N,
    3. Song G
    (2009) Advanced normalization tools (ANTS). Insight J 2:1–35. https://doi.org/10.54294/uvnhin
    OpenUrl
  4. ↵
    1. Behzadi Y,
    2. Restom K,
    3. Liau J,
    4. Liu TT
    (2007) A component based noise correction method (CompCor) for BOLD and perfusion based fMRI. Neuroimage 37:90–101. https://doi.org/10.1016/j.neuroimage.2007.04.042
    OpenUrlCrossRefPubMed
  5. ↵
    1. Born RT,
    2. Bradley DC
    (2005) Structure and function of visual area MT. Annu Rev Neurosci 28:157–189. https://doi.org/10.1146/annurev.neuro.26.041002.131052
    OpenUrlCrossRefPubMed
  6. ↵
    1. Calvert GA,
    2. Campbell R,
    3. Brammer MJ
    (2000) Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Curr Biol 10:649–657. https://doi.org/10.1016/S0960-9822(00)00513-3
    OpenUrlCrossRefPubMed
  7. ↵
    1. Calvert GA,
    2. Hansen PC,
    3. Iversen SD,
    4. Brammer MJ
    (2001) Detection of audio-visual integration sites in humans by application of electrophysiological criteria to the BOLD effect. Neuroimage 14:427–438. https://doi.org/10.1006/nimg.2001.0812
    OpenUrlCrossRefPubMed
  8. ↵
    1. Chen Y,
    2. Mancini M,
    3. Zhu X,
    4. Akata Z
    (2022) Semi-supervised and unsupervised deep visual learning: a survey. IEEE Trans Pattern Anal Mach Intell 46:1327–1347. https://doi.org/10.1109/TPAMI.2022.3201576
    OpenUrl
  9. ↵
    1. Collignon O,
    2. Girard S,
    3. Gosselin F,
    4. Roy S,
    5. Saint-Amour D,
    6. Lassonde M,
    7. Lepore F
    (2008) Audio-visual integration of emotion expression. Brain Res 1242:126–135. https://doi.org/10.1016/j.brainres.2008.04.023
    OpenUrlCrossRefPubMed
  10. ↵
    1. Deen B,
    2. Koldewyn K,
    3. Kanwisher N,
    4. Saxe R
    (2015) Functional organization of social perception and cognition in the superior temporal sulcus. Cereb Cortex 25:4596–4609. https://doi.org/10.1093/cercor/bhv111
    OpenUrlCrossRefPubMed
  11. ↵
    1. Dobs K,
    2. Bülthoff I,
    3. Breidt M,
    4. Vuong QC,
    5. Curio C,
    6. Schultz J
    (2014) Quantifying human sensitivity to spatio-temporal information in dynamic faces. Vision Res 100:78–87. https://doi.org/10.1016/j.visres.2014.04.009
    OpenUrlCrossRefPubMed
  12. ↵
    1. Esteban O, et al.
    (2019) fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat Methods 16:111–116. https://doi.org/10.1038/s41592-018-0235-4
    OpenUrlCrossRefPubMed
  13. ↵
    1. Fagel S
    (2006) Emotional McGurk effect. In Proceedings of the international conference on speech prosody, Vol. 1.
  14. ↵
    1. Fang M,
    2. Poskanzer C,
    3. Anzellotti S
    (2022) Pymvpd: a toolbox for multivariate pattern dependence. Front Neuroinform 16:835772. https://doi.org/10.3389/fninf.2022.835772
    OpenUrlPubMed
  15. ↵
    1. Fang M,
    2. Aglinskas A,
    3. Li Y,
    4. Anzellotti S
    (2023) Angular gyrus responses show joint statistical dependence with brain regions selective for different categories. J Neurosci 43:2756–2766. https://doi.org/10.1523/JNEUROSCI.1283-22.2023
    OpenUrlAbstract/FREE Full Text
  16. ↵
    1. Felleman DJ,
    2. Van Essen DC
    (1987) Receptive field properties of neurons in area V3 of macaque monkey extrastriate cortex. J Neurophysiol 57:889–920. https://doi.org/10.1152/jn.1987.57.4.889
    OpenUrlCrossRefPubMed
  17. ↵
    1. Gentilucci M,
    2. Cattaneo L
    (2005) Automatic audiovisual integration in speech perception. Exp Brain Res 167:66–75. https://doi.org/10.1007/s00221-005-0008-z
    OpenUrlCrossRefPubMed
  18. ↵
    1. Greve DN,
    2. Fischl B
    (2009) Accurate and robust brain image alignment using boundary-based registration. Neuroimage 48:63–72. https://doi.org/10.1016/j.neuroimage.2009.06.060
    OpenUrlCrossRefPubMed
  19. ↵
    1. Hanke M,
    2. Adelhöfer N,
    3. Kottke D,
    4. Iacovella V,
    5. Sengupta A,
    6. Kaule FR,
    7. Nigbur R,
    8. Waite AQ,
    9. Baumgartner F,
    10. Stadler J
    (2016) A studyforrest extension, simultaneous fMRI and eye gaze recordings during prolonged natural stimulation. Sci Data 3:1–15. https://doi.org/10.1038/sdata.2016.92
    OpenUrl
  20. ↵
    1. Haxby JV,
    2. Gobbini MI,
    3. Furey ML,
    4. Ishai A,
    5. Schouten JL,
    6. Pietrini P
    (2001) Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293:2425–2430. https://doi.org/10.1126/science.1063736
    OpenUrlAbstract/FREE Full Text
  21. ↵
    1. Hein G,
    2. Knight RT
    (2008) Superior temporal sulcus—it's my area: or is it? J Cogn Neurosci 20:2125–2136. https://doi.org/10.1162/jocn.2008.20148
    OpenUrlCrossRefPubMed
  22. ↵
    1. Jenkinson M,
    2. Bannister P,
    3. Brady M,
    4. Smith S
    (2002) Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage 17:825–841. https://doi.org/10.1006/nimg.2002.1132
    OpenUrlCrossRefPubMed
  23. ↵
    1. Kanwisher N,
    2. McDermott J,
    3. Chun MM
    (2002) The fusiform face area: a module in human extrastriate cortex specialized for face perception.
  24. ↵
    1. Lewis JW,
    2. Beauchamp MS,
    3. DeYoe EA
    (2000) A comparison of visual and auditory motion processing in human cerebral cortex. Cereb Cortex 10:873–888. https://doi.org/10.1093/cercor/10.9.873
    OpenUrlCrossRefPubMed
  25. ↵
    1. McGurk H,
    2. MacDonald J
    (1976) Hearing lips and seeing voices. Nature 264:746–748. https://doi.org/10.1038/264746a0
    OpenUrlCrossRefPubMed
  26. ↵
    1. Nath AR,
    2. Beauchamp MS
    (2012) A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion. Neuroimage 59:781–787. https://doi.org/10.1016/j.neuroimage.2011.07.024
    OpenUrlCrossRefPubMed
  27. ↵
    1. Nichols TE,
    2. Holmes AP
    (2002) Nonparametric permutation tests for functional neuroimaging: a primer with examples. Hum Brain Mapp 15:1–25. https://doi.org/10.1002/hbm.1058
    OpenUrlCrossRefPubMed
  28. ↵
    1. Peelen MV,
    2. Wiggett AJ,
    3. Downing PE
    (2006) Patterns of fMRI activity dissociate overlapping functional brain areas that respond to biological motion. Neuron 49:815–822. https://doi.org/10.1016/j.neuron.2006.02.004
    OpenUrlCrossRefPubMed
  29. ↵
    1. Peelen MV,
    2. Atkinson AP,
    3. Vuilleumier P
    (2010) Supramodal representations of perceived emotions in the human brain. J Neurosci 30:10127–10134. https://doi.org/10.1523/JNEUROSCI.2161-10.2010
    OpenUrlAbstract/FREE Full Text
  30. ↵
    1. Piwek L,
    2. Pollick F,
    3. Petrini K
    (2015) Audiovisual integration of emotional signals from others’ social interactions. Front Psychol 6:137846. https://doi.org/10.3389/fpsyg.2015.00611
    OpenUrl
  31. ↵
    1. Rademacher J,
    2. Morosan P,
    3. Schormann T,
    4. Schleicher A,
    5. Werner C,
    6. Freund HJ,
    7. Zilles K
    (2001) Probabilistic mapping and volume measurement of human primary auditory cortex. Neuroimage 13:669–683. https://doi.org/10.1006/nimg.2000.0714
    OpenUrlCrossRefPubMed
  32. ↵
    1. Schein SJ,
    2. Desimone R
    (1990) Spectral properties of V4 neurons in the macaque. J Neurosci 10:3369–3389. https://doi.org/10.1523/JNEUROSCI.10-10-03369.1990
    OpenUrlAbstract/FREE Full Text
  33. ↵
    1. Sengupta A,
    2. Kaule FR,
    3. Guntupalli JS,
    4. Hoffmann MB,
    5. Häusler C,
    6. Stadler J,
    7. Hanke M
    (2016) A studyforrest extension, retinotopic mapping and localization of higher visual areas. Sci Data 3:1–14. https://doi.org/10.1038/sdata.2016.93
    OpenUrl
  34. ↵
    1. Spelke ES
    (1990) Principles of object perception. Cogn Sci 14:29–56. https://doi.org/10.1207/s15516709cog1401_3
    OpenUrlCrossRef
  35. ↵
    1. Ungerleider LG
    (1982) Chapter-18: Two cortical visual systems. In: Analysis of visual behavior (Ingle DJ, Goodale MA, Mansfield RJW, eds), pp 549. Cambridge, MA: MIT Press.
  36. ↵
    1. Wang L,
    2. Mruczek RE,
    3. Arcaro MJ,
    4. Kastner S
    (2015) Probabilistic maps of visual topography in human cortex. Cereb Cortex 25:3911–3931. https://doi.org/10.1093/cercor/bhu277
    OpenUrlCrossRefPubMed
  37. ↵
    1. Woolrich MW,
    2. Ripley BD,
    3. Brady M,
    4. Smith SM
    (2001) Temporal autocorrelation in univariate linear modeling of FMRI data. Neuroimage 14:1370–1386. https://doi.org/10.1006/nimg.2001.0931
    OpenUrlCrossRefPubMed
  38. ↵
    1. Yovel G
    (2016) Neural and cognitive face-selective markers: an integrative review. Neuropsychologia 83:5–13. https://doi.org/10.1016/j.neuropsychologia.2015.09.026
    OpenUrlCrossRefPubMed
  39. ↵
    1. Zwiers MP,
    2. Van Opstal AJ,
    3. Paige GD
    (2003) Plasticity in human sound localization induced by compressed spatial vision. Nat Neurosci 6:175–181. https://doi.org/10.1038/nn999
    OpenUrlCrossRefPubMed
Back to top

In this issue

The Journal of Neuroscience: 45 (45)
Journal of Neuroscience
Vol. 45, Issue 45
5 Nov 2025
  • Table of Contents
  • About the Cover
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Distinct Portions of Superior Temporal Sulcus Combine Auditory Representations with Different Visual Streams
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Distinct Portions of Superior Temporal Sulcus Combine Auditory Representations with Different Visual Streams
Gabriel Fajardo, Mengting Fang, Stefano Anzellotti
Journal of Neuroscience 5 November 2025, 45 (45) e1188242025; DOI: 10.1523/JNEUROSCI.1188-24.2025

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
Distinct Portions of Superior Temporal Sulcus Combine Auditory Representations with Different Visual Streams
Gabriel Fajardo, Mengting Fang, Stefano Anzellotti
Journal of Neuroscience 5 November 2025, 45 (45) e1188242025; DOI: 10.1523/JNEUROSCI.1188-24.2025
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • Peer Review
  • PDF

Keywords

  • audio-visual
  • multivariate statistical dependence
  • neural networks
  • superior temporal sulcus

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Articles

  • Too little and too much: balanced hippocampal, but not medial prefrontal, neural activity is required for intact novel object recognition in rats
  • Structural plasticity of peptidergic and nonpeptidergic C afferent terminals in the medullary dorsal horn during craniofacial inflammatory pain
  • Comparison of signals from cerebellar Purkinje cells and deep nuclei during temporal prediction in primates
Show more Research Articles

Behavioral/Cognitive

  • Too little and too much: balanced hippocampal, but not medial prefrontal, neural activity is required for intact novel object recognition in rats
  • Ventral striatal cholinergic interneurons regulate decision making or motor impulsivity differentially across learning and biological sex
  • The Hidden Benefits of Noise: Low-Frequency tRNS and Dynamic Visual Noise Enhance Visual Processing
Show more Behavioral/Cognitive
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Accessibility
(JNeurosci logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.