Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE
PreviousNext
Research Articles, Systems/Circuits

Beyond Divisive Normalization: Scalable Feedforward Networks for Multisensory Integration Across Reference Frames

Arefeh Farahmandi, Parisa Abedi Khoozani and Gunnar Blohm
Journal of Neuroscience 8 October 2025, 45 (41) e0104252025; https://doi.org/10.1523/JNEUROSCI.0104-25.2025
Arefeh Farahmandi
Centre for Neuroscience Studies, Queen’s University, Kingston, Ontario K7L 3N6, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Arefeh Farahmandi
Parisa Abedi Khoozani
Centre for Neuroscience Studies, Queen’s University, Kingston, Ontario K7L 3N6, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gunnar Blohm
Centre for Neuroscience Studies, Queen’s University, Kingston, Ontario K7L 3N6, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Gunnar Blohm
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • Peer Review
  • PDF
Loading

Abstract

The integration of multiple sensory inputs is essential for human perception and action in uncertain environments. This process includes reference frame transformations as different sensory signals are encoded in different coordinate systems. Studies have shown multisensory integration (MSI) in humans is consistent with Bayesian optimal inference. However, neural mechanisms underlying this process are still debated. Different population coding models have been proposed to implement probabilistic inference. This includes a recent suggestion that explicit divisive normalization accounts for empirical principles of MSI. However, whether and how divisive operations are implemented in the brain is not well understood. Indeed, all existing models suffer from the curse of dimensionality and thus fail to scale to real-world problems. Here, we propose an alternative model for MSI that approximates Bayesian inference: a multilayer-feedforward neural network of MSI across different reference frames trained on the analytical Bayesian solution. This model displays all empirical principles of MSI and produces similar behavior to that reported in ventral intraparietal neurons in the brain. The model achieved this without a neatly organized and regular connectivity structure between contributing neurons, such as required by explicit divisive normalization. Overall, we show that simple feedforward networks of purely additive units can approximate optimal inference across different reference frames through parallel computing principles. This suggests that it is not necessary for the brain to use explicit divisive normalization to achieve multisensory integration.

  • Bayesian inference
  • cue combination
  • neural networks
  • population code
  • probabilistic inference
  • reference frame transformations

Significance Statement

This research presents an alternative model to divisive normalization models of multisensory integration (MSI) in the brain. Our study demonstrates that a feedforward neural network can achieve optimal MSI across different reference frames without explicitly implementing divisive operations, challenging the long-held assumption that such operations are necessary for MSI. The model displays all the empirical principles of MSI, producing similar behavior to that reported in ventral intraparietal neurons in the brain. This work offers profound insights into the putative neural computations underlying multisensory processing.

Introduction

Uncertainty is tightly linked to human sensory perception and action (Atkins et al., 2001). For example, our visual information is less reliable on a foggy day compared to a sunny day. Uncertainties arise from the information of different sensory modalities and the natural noisiness of spiking neurons (Faisal et al., 2008). Behavioral studies showed the brain integrates different sensory information in an optimal way to compensate for the uncertainties (Hillis et al., 2004). Multisensory integration (MSI) in the nervous system is akin to a Bayesian inference process (Meredith and Stein, 1986; Wolpert et al., 1995; Knill et al., 1996; Merfeld et al., 1999; Landy and Kojima, 2001; Ernst and Banks, 2002; Battaglia et al., 2003; Ernst and Bülthoff, 2004; Knill and Pouget, 2004; Körding and Wolpert, 2006; Stein and Stanford, 2008). Since each sensory modality encodes information in a different coordinate system, reference frame transformations are necessary to achieve coherent multisensory integration (Duhamel et al., 1997; Deneve and Pouget, 2004; Avillac et al., 2005; Schlack et al., 2005). Despite the abundant indications that the brain performs near-optimal probabilistic inferences, its neural implementation is a subject of debate.

Previous studies have proposed that the brain can perform statistical inference by marginalization over variables using explicit divisive normalization (Beck et al., 2011; Pitkow and Angelaki, 2017). Specifically, optimal cue integration is considered an example of statistical inference in which different sensory signals are corrupted by uncertainty. Ohshiro et al. (2011) showed a neural network implementing explicit divisive normalization at the multisensory level develops neuronal behavior reported in brain areas involved in MSI. Additionally, the authors compared the prediction of their model with a model based on subtractive inhibition and showed the divisive normalization provides a better prediction for the observed neuronal patterns (Ohshiro et al., 2017). Other neurophysiological studies, combined with computational modeling, provided evidence for the functional role of divisive normalization throughout the cortex, suggesting that explicit divisive normalization is a canonical cortex function (Carandini and Heeger, 2011).

Despite the established role of divisive normalization, how the brain implements such processes is a puzzle because explicit divisive normalization requires intractable division and multiplication operations, making such implementation physiologically infeasible (Pitkow and Angelaki, 2017). Moreover, Ghosh et al. (2024) showed such linear integration models perform sub-optimally and even fail in complicated multisensory tasks. Therefore, current models that utilize divisive normalization encounter limitations including the need for preconfigured connectivity structure, explicitly matching population codes, and/or the requirement of an unrealistically large number of neurons (Deneve et al., 2001; Ma et al., 2006; Beck et al., 2011; Ohshiro et al., 2011; Beck et al., 2012). Consequently, how divisive normalization can be implemented in a biologically feasible manner is unknown.

In this study, we chose a MSI task across reference frames to investigate this issue. Specifically, the task was to estimate hand position and its variability across different eye positions using retinal and proprioceptive sensory information. Unlike previous networks requiring aligned population codes, our task merged retinal and proprioceptive hand positions with different neural coding schemes. While divisive normalization is suggested for cue integration and coordinate transformations, its explicit form requires different neurons for each task (Orhan and Ma, 2017). Here, we investigated if divisive normalization is inherently performed when all network units perform the same neuronal operations (Beck et al., 2012).

We trained a multilayer feedforward neural network to perform a MSI task with standard error-based feedback that achieved near-optimal probabilistic inference without quadratic and divisive operations. This network produced empirical principles of MSI observed in ventral intraparietal (VIP) neurons, such as inverse effectiveness, cross-modal enhancement, and suppression. Also, we observed modulation of neural activity in our network by varying the cue reliability, similar to area dorsal medial superior temporal (MSTd). We compare our model with VIP and MSTd findings because these areas demonstrate canonical MSI principles. These results show that simple feedforward networks can implement MSI in the brain without explicit divisive normalization.

Materials and Methods

Task

The goal of our model is to perform MSI across reference frames. The task is to estimate the position and associated variability of the hand across different eye positions when visual and proprioceptive sensory information of hand position and proprioceptive information of eye position are available. Many previous studies showed that humans integrate information coming from different sensory modalities to decrease their uncertainty (Landy et al., 1995; Atkins et al., 2001; Landy and Kojima, 2001; Ernst and Banks, 2002; Ernst and Bülthoff, 2004; Kersten et al., 2004; Knill and Pouget, 2004; Körding and Wolpert, 2004; Stein and Stanford, 2008). Furthermore, varying eye orientation will bias retinal information demanding a reference frame transformation to compensate for eye orientation (Crawford et al., 2004; Avillac et al., 2005; Schlack et al., 2005; Blohm and Crawford, 2007). Therefore, to analytically combine visual and proprioceptive hand position, the retinal hand position is first updated based on the eye position and then optimally integrated with proprioception. This sequential analytical decomposition serves purely as a mathematical description for deriving the optimal Bayesian solution, which provides the target outputs for network training. We do not suggest or impose this serial order as a constraint on the actual neural implementation. Our network is free to learn any computational solution without predetermined sequential constraints. The overall steps of our simple one-dimensional proof of concept task are illustrated in Figure 1C. This task was designed to demonstrate the computational principles of our approach rather than to model a specific neural pathway. The use of visual and proprioceptive inputs with different neural coding schemes (Gaussian vs linear monotonic) allows us to test whether our model can integrate fundamentally different representations, a capability that would generalize to other sensory modalities and brain regions.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

Network architecture and Task. A, The network consists of four layers: input layer, two hidden layers [sensory input layer (SIL) and multisensory layer (MSL)], and a linear read-out layer. B, We used probabilistic spatial code to encode visual information and used probabilistic joint code to encode proprioceptive information of both hand position and eye orientation. C, The task is to estimate the hand position by combining the visual and proprioceptive sensory inputs across different eye orientation.

To generate the relevant input/output training data for the network, we assumed that all the representations have a Gaussian distribution with a mean and variance [visual hand position: (μv,σv2 ), proprioceptive hand position: (μp,σp2 ), and eye position: (μe,σe2 )] and are statistically independent. Therefore, the mean and variance of updated visual hand position (μv,e,σv,e2 ) can be derived as following:μv,e=μv+μe.(1) σv,e2=σv2+σe2.(2) After transforming the retinal hand position into spatial hand position, the hand position is estimated by integrating the visual and proprioceptive information of hand position. Since both signals have Gaussian distributions and are independent, the mean and variance of the combined signal (μo,σo2 ) can be derived as follows (naive Bayesian integration):μo=σo2.(μv,eσv,e2+μpσp2).(3) σo2=σv,e2.σp2σv,e2+σp2.(4) By substituting Equations 1 and 2 into Equations 3 and 4, we can find the relationship between estimated hand position (our network output) and sensory information (our network inputs):μo=σo2.(μv+μeσv2+σe2+μpσp2).(5) σo2=(σv2+σe2).σp2σv2+σe2+σp2.(6)

Network architecture

Our proposed network consisted of four layers: an input layer, two hidden layers, and a read-out layer (Fig. 1A). To generate the input layer, we had three groups of sensory input nodes: visual and proprioceptive information of hand position and proprioceptive information of eye position. The units in both input and read-out layers had linear transfer functions. There were two hidden layers in our network, each containing 64 units with sigmoid transfer function. The hidden layers included a sensory input layer (SIL) that all the sensory nodes projected to and a multisensory layer (MSL) which was designed to receive the input from the SIL and estimate the position of the hand using population coding. Then, the read-out layer mapped the population code to the desired mean and variance of the theoretical estimated hand position based on Bayesian integration (see above Task section). The network has both negative and positive weights between different layers. In the following, we provide technical details of the information coding.

Input layer

We have three different sensory inputs for our network: visual and proprioceptive information of hand position as well as proprioceptive information of eye position. Therefore, our network consists of three groups of sensory input units coding for visual, proprioceptive, and eye positions. Visual information was encoded using visual-like tuning curves, while proprioceptive information of both hand and eye was coded using muscle-like tuning (Fig. 1B).

The visual information of hand position was encoded using a receptive field-like population coding scheme. We assumed that visual information is sampled from a normal distribution with mean xv and variance σv2 . Each node in the visual group had a Gaussian tuning curve:fv,i(x)=exp(−(xv−xr,i)22σrf2),(7) Where fv,i(x) is the firing rate of the ith neuron in the visual sensory group, xr,i is the neuron’s center of receptive field, and σrf2=10∘ is the width of the receptive field. We assumed that the receptive field (RF) of all neurons in each sensory input group has the same width. Neurons were distributed uniformly along the visual field (−75∘,75∘) and therefore the distance between neurons’ response field Δx was derived as follows:Δx=Visual FieldNv−1.(8) In which Nv is the number of neurons in the visual sensory group. In addition, we assumed that the activation of each neuron was modulated based on the reliability of visual information and therefore the activity ai of each neuron multiplied by a gain factor (in which K = 50 is a constant):ai=Kσv2×fv,i(x).(9) In addition, we used random Poisson noise to include biologically realistic trial-to-trial firing rate variability. Therefore, the final activation of each node was sampled from a Poisson distribution with λ = ai. Similar coding was used in previous neural network studies (Zipser and Andersen, 1988; Blohm et al., 2009; Murdison et al., 2015).

During static position coding, tuning curves of neurons in motor cortex can be simplified to a linear, monotonic function (Paninski et al., 2004). Therefore, we used a simple linear tuning curve for coding both hand and eye positions. Furthermore, both positive and negative slopes were used for a unique mapping of position into neural coding based on push–pull mechanism (Xing and Andersen, 2000; Blohm et al., 2009; Murdison et al., 2019):fh/e,i(θh/e)=[intercepth/e,i+slopeh/e,i×θh/e]+.(10) In which fh/e,i(θh/e) is the firing rate of ith neuron for proprioceptive hand (h) or eye (e) sensory input. Intercepts and slopes are chosen randomly between ( − 1, 1) and ( − 10, 10) respectively for each unit. Joint angle and eye orientation then coded linearly in the activity of these units. Similar to visual information coding, we used reliability-based gain modulation (Eq. 9) and Poisson noise to generate the final neuronal activity of these units.

Hidden layers

Our network contained two hidden layers: SIL, which all the sensory units project to and MSL, which receives the input from the previous layer and contains the estimated hand position in population codes. We used sigmoid transfer function for both hidden layers to mimic the nonlinear transfer function of real neurons (Naka and Rushton, 1966). Specifically, we used the following equation to characterize the input (x) and output relationship a(x):a(x)=11+e−x.(11)

Read out

As mentioned before, the goal of our network was to estimate hand position by integrating visual and proprioceptive information while accounting for different eye orientations. In the task section, we provided the theoretical relationship between input information and the estimated hand position (μh,σh2) which is in accordance with behavioral (psychophysics) studies. Our network (presumably) generated the estimated hand position in population coding format at the MSL hidden layer. In order to translate population codes to their behavioral (psychophysics) counterpart, we added a linear layer to our network with two units, one for the mean value and one for the variance of the estimated hand position. The logic was that if the MSL represented correct estimates in a distributed fashion, then we should be able to read them out directly (like other brain areas would do). Therefore, our network automatically learns to map the population code to the desired mean and variance outputs.

Data generation and network training

All the sensory input signals (visual, proprioceptive, and eye) were generated using random number generator in MATLAB 2024 (randn.m). Specifically, we used normal distributions N(0,15), N(0,18), N(0,20) for visual, proprioceptive, and eye positions respectively. Similarly, the visual, proprioceptive, and eye variance were randomly sampled from the range (1,40), (1,64), and (1,64) respectively. The output was generated based on the theoretical framework provided in the task section. We generated 50,000 trials for training the network. A resilient back-propagation method (Riedmiller and Braun, 1993) was used to train the network.

Data analysis

We performed several analyses to assess the extent to which our network model and specifically our hidden layer units (HLUs) replicated the reported neuronal activities associated with MSI and reference frame transformations. These analyses were similar to those used in previous works (Avillac et al., 2005, 2007; Morgan et al., 2008; Blohm et al., 2009; Ohshiro et al., 2011; Murdison et al., 2019).

Network performance

A quantitative analysis is performed to assess how well the network estimated the position of the hand and the associated uncertainty (variance). To do so, we simulated our network with 5,000 randomly generated inputs that are not included in the training set. Then, we performed linear regression of read-out values (mean and variance of hand position) with the predicted values from our analytical models to examine how well our network is able to marginalize the eye orientation and estimate hand position.

MSI empirical findings

In this work, we compared the activity of the units in the hidden layer of our network to the reported neuronal activity associated with MSI. To examine to what extent our network’s implementation is comparable to an explicit divisive normalization network, we used the same parameters as previous studies (Avillac et al., 2007; Morgan et al., 2008; Ohshiro et al., 2011) to quantify the activity of HLUs: additivity index (AI), response additivity (RA), and response enhancement (RE). These three parameters have been used in the literature to quantify the effect of MSI at the neuronal level. All parameters quantify how bimodal activity is different compared to the activity when each stimulus is presented alone. We selected these parameters to be able to compare our results with the current literature.

  1. AI: this index compares how bimodal activity is enhanced relative to the sum of the activities to unimodal stimuli. When AI > 1, it indicates to what extent the bimodal is enhanced and larger than the summation of unimodal responses. Conversely, if AI ≤ 1, it shows how much the bimodal response is suppressed and less than sum of unimodal responses.AI=RbimodalRunimodal,1+Runimodal,2.(12)

  2. RE or Amplification index: it has been shown that generally neurons in the multisensory areas are most responsive to one of the stimuli (Avillac et al., 2007). This parameter compares how the response of a unit to a most effective stimulus is affected when both stimuli are presented. In other words, this parameter evaluates to what extent representing the non-effective stimulus suppresses or enhances the response of the effective stimulus. A positive value of this index represents multisensory enhancement, meaning that the second stimulus enhanced the response of the neuron. Also, a negative value represents multisensory suppression, meaning that the second stimulus suppressed the response of the neuron.RE=Rbimodal−max(Runimodal,1,Runimodal,2)Rbimodal+max(Runimodal,1,Runimodal,2).(13)

  3. RA: this parameter compares the bimodal activity to the arithmetic sum of unimodal activities. A positive value of RA represents superadditivity and a negative value represents subadditivity. Superadditivity means the neuronal response is higher than the summation of responses to individual stimulus, and subadditivity means that the neuronal response is lower than the summation of responses to individual stimulus.RA=Rbimodal−(Runimodal,1+Runimodal,2)Rbimodal+(Runimodal,1+Runimodal,2).(14)

Code accessibility

Code is available on github (https://zenodo.org/records/14213670).

Results

In this study, we propose that a feedforward neural network which is trained to perform MSI across reference frames is functionally similar to an explicit divisive normalization. To support our hypothesis, we first examine if our network is capable of performing the MSI task across reference frames. In the next step, we compare the activity of our network units in the hidden layers with the reported neuronal activities from multisensory areas in the brain. Specifically, we investigate if our network prediction is comparable with the prediction of an explicit divisive normalization model (Ohshiro et al., 2011). Finally, we explore the possible mechanisms that our network might use to perform MSI.

Network performance

First, we evaluated the performance of our network to confirm that our network has learned to perform the relevant aspect of the task before analyzing the activity of hidden units. To do so, we compared the predicted mean and variance of the hand position with the analytical solution of Bayesian integration (Fig. 2). The regression analysis of the position revealed that the predicted values of our network match the desired value with good precision (Fig. 2A, slope 0.9 9 for the regression fit and R2 = 0.89). Additionally, we calculated the error in hand position estimation and observed that the mean error was relatively small (Fig. 2B: μ≈7∘,σ≈6∘ ). Similar results were observed for the variance of hand position estimation (Fig. 2C,D). These results indicate that our network accounted for the added uncertainty due to coordinate transformations and contributed this uncertainty into the integration process.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Network performance. A, Predicted hand position by the network compared to the desired position which is derived based on the analytical solution. B, Distribution of the position estimation errors produced by the network for a random set of test inputs (visual and proprioceptive hand positions and proprioceptive eye orientations). C, Predicted uncertainty is associated with hand position estimation compared to the desired value predicted by the analytical solution. D, Distribution of the uncertainty estimation errors produced by the network for a random set of test inputs.

After establishing that our network is able to perform MSI, we aim to examine whether the units of the network mimic the neural activations reported in the brain regions involved in MSIs (e.g., VIP, MSTd, etc.).

Inverse effectiveness

Numerous studies showed that neurons in multisensory areas show enhanced responses to bimodal inputs (multisensory enhancement) versus to unimodal inputs (Meredith and Stein, 1986; Perrault et al., 2003; Stanford et al., 2005; Stein and Stanford, 2008). In other words, the multisensory response is stronger than the response to each individual stimulus. However, this enhancement is not linear and follows the inverse effectiveness principle. Inverse effectiveness is a reported phenomenon in multisensory areas of the brain which indicates that multisensory enhancement becomes stronger when both stimuli are weak and decreases by increasing the input intensity (Stanford et al., 2005; Stein and Stanford, 2008). Ohshiro et al. (2011) quantified this phenomenon through their divisive normalization model. They calculated the additivity index by dividing the bimodal response to the arithmetic sum of unimodal responses and showed that AI is larger for weak bimodal inputs. This means that multisensory enhancement of weak inputs is higher than the sum of activities measured from single inputs (superadditivity). However, for strong inputs this enhancement disappears and activity is equal (additivity) or lower (subadditivity) than the arithmetic sum of activities produced by single inputs (Ohshiro et al., 2011).

To assess if the units of our network demonstrated similar behavior, we varied the intensity (i.e., reliability in this paper) of our visual and proprioceptive inputs by varying their variances (see Materials and Methods for further details). Similar to Ohshiro et al. (2011), we calculated the AI for each unit in the multisensory layer by dividing the multisensory response to the sum of the two unimodal responses and showed the result for example units in Figure 3. In agreement with the divisive normalization model (Ohshiro et al., 2011), as the reliability of the unimodal inputs increases, the AI becomes weaker showing less multisensory enhancement for more reliable inputs. This inverse effectiveness phenomenon can be seen in Figure 3 in units with different bimodal response behavior.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Inverse effectiveness in multisensory layer (MSL). A, Additivity index for some sample units in MSL. We calculated the additivity index of each unit for different intensities (i.e., reliability) of our visual and proprioceptive inputs by varying their variances. In our model, reliability directly modulates neural response strength, where lower reliability results in lower response magnitude. These units only displays superadditivity (AI > 1) when the reliability of both stimuli is very low which demonstrates the principle of inverse effectiveness. B, Corresponding bimodal response of sample units with different reliability of input. The inverse effectiveness phenomenon can be seen in units with different bimodal response behavior.

Our model only displays superadditivity (AI > 1) when the reliability of both stimuli is very low (Fig. 3A). As the reliability of both stimuli increases, the multisensory response becomes sub-additive or additive (AI ≤ 1). This observation in MSL of our network demonstrates the principle of inverse effectiveness, which aligns with empirical findings reported from multisensory areas of the brain such as the superior colliculus (Perrault et al., 2005; Alvarado et al., 2007). Note that this is only true for the MSL and the units in the SIL of our model are not producing inverse effectiveness. Similar to the explicit divisive normalization model (Ohshiro et al., 2011), our network reveals inverse effectiveness regardless of whether weak inputs (inputs with low reliability) produce superadditivity or not, as shown in Figure 3B with different bimodal response behaviors (Perrault et al., 2005; Stanford et al., 2005).

In addition, we used a complementary way, similar to Avillac et al. (2007), to further determine the inverse effectiveness. Avillac et al. (2007) calculated an amplification index which is the percentage of response enhancement for integrative neurons in the VIP region. They showed this index as a function of the dominant unimodal response with the same input for units with enhanced multisensory responses to assess inverse effectiveness (Fig. 4B). They observed that as the dominant unimodal response increases, the amplification index decreases as an indicator of the inverse effectiveness principle. We calculated the same index for all enhanced units in the MSL ,where each unit is represented by a unique combination of color and marker, across different input conditions and plotted them against the dominant unimodal response (Fig. 4A). The black line represents a regression fit across all units (slope = −35.33, R2 = 0.44), showing a clear negative trend consistent with inverse effectiveness. To better illustrate unit-specific behaviors, we added individual regression lines (gray) for each unit responses. These unit-specific trends further support that our model captures the inverse effectiveness principle observed in VIP neurons, showing that the amplification index consistently decreases as the dominant unimodal response increases.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Inverse effectiveness principle in our model and data (VIP). A, Amplification index plotted as a function of the dominant unimodal response for integrative units in our multisensory layer (MSL) showing enhancive responses (n = 33 of 64 total units). Each unique combination of color and marker represents one unit, with multiple data points reflecting responses across different reliability conditions. Population regression fit (black line: slope = −35.33, R2 = 0.44) and individual unit regression fits (gray lines) both demonstrate inverse effectiveness. B, Corresponding data for integrative neurons in area VIP with enhancive multisensory responses (replotted with permission from Avillac et al., 2007).

One additional finding is that the distribution of units in Figure 4A shows a dichotomous pattern, with units clustering into two groups. Upon closer examination, some degree of clustering is also present in the VIP data (Fig. 4B), though it appears more continuous due to having only one data point per neuron versus our multiple data points per unit across different reliability conditions. The more pronounced dichotomy in our model might reflect broader parameter space sampling, as our computational analysis examines responses across much larger variance ranges than typically accessible in experimental studies. To test this hypothesis, we trained a network on narrower variance ranges (approximately 0.25–9 deg2 ) and found more continuous distributions (slope = −24.98, R2 = 0.69) resembling VIP data, while our original network trained on broader variance ranges (0.25–50 deg2 ) maintained the dichotomous clustering pattern even when tested with narrow variance values (slope = −13.83, R2 = 0.25). Importantly, all empirical principles of MSI reported in this study were maintained in the narrow-range trained network, confirming that our computational findings are robust across different parameter regimes.

Cross-modal enhancement and suppression in bimodal and unimodal responses

Neurophysiological recordings in multisensory regions [such as the superior colliculus (SC) and the VIP area] illustrated that neurons show both multisensory enhancement and suppression (Meredith and Stein, 1986; Avillac et al., 2007). Among these studies, Avillac et al. (2007) quantified neuronal behavior in area VIP of monkeys using both response enhancement and RA (see Materials and Methods for further details). Using these two parameters, Avillac et al. (2007) showed that VIP neurons act heterogeneously, exhibiting both cross-modal enhancement and suppression along with nonlinear super- and sub-additivity (Fig. 5C). To examine whether our model replicates the reported behaviors, we simulated the trained network to perform MSI while varying the retinal and proprioception information of the hand for a fixed position and reliability of the eyes. We selected 11 uniformly distributed positions for both the retinal and proprioceptive positions in the range of trained positions. Then, we calculated both RE and RA for all the possible combinations of the different retinal and proprioceptive information of the hand position as shown in Figure 5A,B for both the MSL and SIL layers. As Figure 5A illustrates the units in the SIL layer only show cross-modal suppression and subadditivity. However, the units in the MSL layer of our network replicated a similar variability as reported in VIP neurons (Fig. 5B). For further investigation of units in the MSL layer, the enhancement and additivity of bimodal responses is plotted as a function of the reliability of the retinal and proprioceptive hand information. We show that the average of both the enhancement and AI decreases for more reliable sensory information, which is also consistent with inverse effectiveness (Fig. 5D).

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

Multisensory enhancement and suppression in our model. A, Units in the SIL layer of our network only showed cross-modal suppression. B, Units in the MSL layer showed behavior comparable to recordings from area VIP and prediction of divisive normalization. C, Summary of the reported data from area VIP (Avillac et al., 2007). D, Response enhancement and additivity are plotted as a function of stimulus reliability. The average of both indices decrease by increasing reliability of stimulus in units of MSL layer. E, Recording from sample units of our MSL layer revealed the existence of units responsive to both stimuli or only to one of the stimuli (i.e., bimodal and unimodal neurons). F, A comparison between sample units of our MSL layer and neurons in area VIP (Avillac et al., 2007) showing different responsiveness to the visual stimulus. Representing the proprioceptive stimulus alongside with the visual stimulus resulted in suppressed bimodal activity (i.e., cross-modal suppression) or Cross-modal enhancement. V, P; visual and proprioceptive unimodal responses, VP; Bimodal response, V+P; summation of unimodal responses.

In addition to the observed population heterogeneity, Avillac et al. (2007) observed that neurons in area VIP can be categorized as bimodal or unimodal neurons. That is, bimodal neurons are determined as neurons that are responsive to both stimuli when presented alone. However, unimodal neurons respond only to one of the stimuli. Interestingly, unimodal neurons showed multisensory enhancement and suppression, similar to bimodal neurons. As shown in Figure 5B, the MSL of our network replicated the population behavior of VIP neurons. However, that population behavior could result from merely bimodal neurons. Therefore, we analyzed individual units in the MSL of our simulated network with the aim of unraveling the details of each unit’s behavior (Fig. 5E). Figure 5E illustrates the MSL layer of our network contains some units behaving similar to unimodal neurons as well as other units behaving like bimodal neurons. For example, Figure 5E top right shows a unimodal unit and Figure 5E top left shows a bimodal unit in the MSL layer. Also, unimodal units exhibit both cross-modal suppression and cross-modal enhancement (Fig. 5F). In general, the MSL units of our network replicate both population and individual neuronal activity reported in the area VIP (Avillac et al., 2007).

MSI and cue reliability

Extensive work has shown that MSI at the behavioral level can be explained by a Bayesian framework (Ernst and Banks, 2002). In special cases, when both modalities have Gaussian distributions and are independent, the cue combination can be modeled as a weighted sum of the two stimuli and the weights are determined based on stimulus reliability (Landy et al., 1995; Atkins et al., 2001; Landy and Kojima, 2001; Ernst and Banks, 2002; Ernst and Bülthoff, 2004; Kersten et al., 2004; Knill and Pouget, 2004; Körding and Wolpert, 2004; Stein and Stanford, 2008). That is, the stimulus with higher reliability would have a higher weight (equivalent to a higher contribution) in the integration. Interestingly, Morgan et al. (2008) established that MSI in the macaque visual cortex depends on stimuli reliability and showed that modulating cue reliability changes the activity of multisensory neurons in area MSTd. More specifically, the authors showed that neuronal responses to bimodal stimuli can be well explained by a weighted sum of neuronal responses to unimodal stimuli and that the weights are determined based on cue reliability: lower reliability of one of the stimuli results in a lower weight in integration (Morgan et al., 2008).

To examine if such a rule also holds among individual units in our network, we simulated the trained network varying one sensory stimulus reliability (here visual) while keeping the other sensory reliability fixed. We fixed eye position for this simulation and selected four levels of reliabilities: from low to high. We fixed the proprioceptive reliability at the medium level. Morgan et al. (2008) showed that the activity of neurons in area MSTd changes by decreasing the visual coherence (decreasing reliability). Figure 6A,B shows the RF (defined here as the pattern of neuronal response across the space of sensory inputs) of a sample unit from the SIL layer of our network for a low and a high visual reliability, respectively. As can be seen, the neuronal response of this sample unit changed due to changes of the reliability of visual information. A sample unit in the MSL of our network displayed a similar behavior (Fig. 6C,D). In other words, the activity of units in our network is modulated by varying the reliability of visual information. This is an evidence that different sensory inputs are combined through gain modulation to perform reference frame transformation in MSI tasks (Blohm and Crawford, 2009; Blohm, 2012). To quantify cue reliability at the population level, we fitted the bimodal response of each network unit with the weighted sum of the two unimodal responses:Rbimodal(vis, prop)=wvis×Rvis+wprop×Rprop.(15) The visual and proprioceptive weights were calculated for each of the visual reliabilities. The linear fit was a good approximation to responses of the network units, with average R2 values of 0.68, 0.67, 0.69 and 0.70 for the four simulated reliabilities, respectively. Similar to Morgan et al. (2008), increasing the visual reliability resulted in increased visual weights (Fig. 6E) and analogously decreased proprioceptive weights (Fig. 6F). These results are comparable to reported data from area MSTd and the explicit divisive normalization model predictions (Ohshiro et al., 2011).

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

Cue reliability changes the response of multisensory integration. A–D, Receptive field for two sample units in the SIL (A, B) and MSL layer (C, D) for two different visual variabilities. The activity of units in both layers is modulated by varying the reliability of visual information. E, F, The increase of visual reliability resulted in an increase in the visual weights (E) and a decrease in proprioceptive weights (F).

Gain modulation and RF shifts in MSI

Previous studies have suggested that gain modulation could serve as a general purpose mechanism to implement MSI across different reference frames (Andersen et al., 1985; Duhamel et al., 1997; Salinas and Sejnowski, 2001; Deneve and Pouget, 2004; Avillac et al., 2005; Schlack et al., 2005; Ma et al., 2006; Blohm and Crawford, 2009; Blohm et al., 2009; Chang et al., 2009; Blohm, 2012; Murdison et al., 2015). Gain fields are defined as a multiplicative factor that scales the RF of a neuron up/down (Blohm and Crawford, 2009). As we saw cue reliability in units of our network, we can evaluate gain modulation and compute gain indices for our units. To do so, we calculated the RF of each unit, similar to Figure 6A–D, by varying the retinal and proprioceptive hand positions for different eye rotations along with the associated variabilities. We observed that our units in both the SIL and the MSL are modulated by different visual or proprioceptive positions, thus mimicking the RF-like behavior (Fig. 7A–J, top panels). In the next step, we computed gain modulation index across the whole RF (Blohm, 2012):Gain Index=max(Rbimodal)−min(Rbimodal)max(Rbimodal)+min(Rbimodal).(16) Figure 7A–J, bottom panels shows the distribution of gain indices for all units by varying different parameters such as eye orientation in both the SIL and MSL layers. This observed gain modulation could potentially explain the functionality of our network in performing MSI across different reference frames.

Figure 7.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 7.

Gain modulation is characterized as up-/down-regulation of RFs by a secondary input without shifting the location of the RF. Top panels represents unit example while bottom panel represent population response. Units displayed gain modulation in both layers of the network. Additionally, the units in MSL showed a combination of shifted RF and gain modulations. A–D, varying proprioceptive and eye position information caused gain modulation of the visual RF of the units in both the SIL and the MSL. E–J, varying the variability of visual, proprioceptive, and eye orientation information caused gain modulation of visual RFs of units throughout the network.

There is another behavioral study, combined with single-cell recording data, by Zeng et al. (2023) that investigates RF shifts in neurons within the area VIP. This study found that these visual RF shifts often do not align with perceptual shifts, exhibiting opposite patterns, while vestibular RF shifts tend to align with perceptual shifts. While our study lacks a behavioral component, we examined RF shifts within our network to explore how varying eye position influence visual and proprioceptive RF shifts across different layers.

Our results revealed distinct patterns of RF shifts in the SIL and the MSL layer as shown in Figure 8A,B. These RF shifts were quantified as the slope of a linear regression fitted to unit responses as a function of eye position (ranging from −45∘ to 45∘ ), while keeping the other sensory input, either proprioceptive or visual, fixed. Figure 8C illustrates two units, showing that the shift in proprioceptive and visual RF due to changes in eye position can occur in alignment or in opposite directions. Furthermore, Figure 8A,B presents the distribution of RF shifts across all units in both layers. In particular, RF shifts were insignificant within the SIL layer (Wilcoxon test; p > 0.05), confirming that such shifts are a hallmark of multisensory processing and do not occur at non-multisensory stages. Interestingly, we observed various combinations of RF shifts for the MSL layer: some units displayed opposing shifts, while others shifted in the same direction. These findings suggest that the joint distribution of RF shifts emerges as an inherent property of MSI through gain modulation within our network, consistent with experimental observations by Zeng et al. (2023).

Figure 8.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 8.

Visual and proprioceptive RF shifts by varying eye position in the layers of our network. A, RF shifts in sensory input layer and B, multisensory layer. C, Visual and proprioceptive RF center for different eye positions in two sample units of multisensory layer. This showed that the shifted RF center is not always in the same direction for visual and proprioceptive input and opposite shifted RF can be observed in some units of multisensory layer.

Discussion

We proposed a distributed neuronal computation model to perform statistical inference. As a proof of concept, we trained a feedforward neural network to implement MSI across different coordinate frames and showed that the network performs the inference task with a small number of units without a neatly organized and regular connectivity structure required by explicit divisive normalization models. Furthermore, analyzing the emergent properties of our network showed similarities with reported neurophysiological behaviors of neurons in multisensory areas (SC, MSTd, and VIP), reproducing inverse effectiveness, cross-modal suppression, and multisensory suppression. Although we compared our model to data from VIP and MSTd (which primarily process self-motion rather than hand movements), this comparison validates that our network captures general computational principles of MSI. Future work could adapt our approach to data from areas more directly involved in reach planning, such as the parietal reach region or premotor cortex. This provides an interpretable explanation for distributed MSI across reference frames.

Critical comparison with other models

Many previous works have focused on MSI and coordinate transformation as an example of statistical inference (Deneve and Pouget, 2004; Ma et al., 2006; Beck et al., 2011). These studies suggest that populations of neurons with probabilistic population codes (PPCs) represent probability distributions of stimuli rather than deterministic values. This particular class of neural networks, basis function networks with multidimensional attractors, can implement probabilistic inference for many tasks. Later, the authors established that for a specific class of neuronal variability, i.e., independent Poisson noise, there is a closed-form solution for tasks such as cue integration (Deneve et al., 2001; Ma et al., 2006). That is, cue integration can be implemented in a simple neuron by neuron summation of sensory inputs. Similarly, we deployed PPCs to code sensory information and assumed that populations of neurons code the probability distribution rather than single values. However, we did not impose representations and rather allowed our network to learn the feedforward weights and consequently its own basis function, resulting in fundamental differences between our model and basis function networks. The first key distinction is that our network can perform the task with significantly fewer units and connections resulting in lower computational complexity. For instance, the proposed network by Deneve et al. (2001) and Beck et al. (2011) requires Nd.s = 1012 units for combining s = 2 cues in 3-D space (d = 3), assuming each dimension is spanned by N = 100 units. In contrast, our network requires only N.d.s ≈ 600 units for performing the same task, albeit with slightly lower accuracy. Furthermore, unlike basis function networks, our network supports different population coding representations for sensory inputs and outputs. Finally, while basis function networks (Pouget and Snyder, 2000; Deneve et al., 2001; Deneve and Pouget, 2004) depend on strictly constrained connectivity structures between neurons, our network operates effectively without such restrictions.

As an alternative class of models for MSI, explicit divisive normalization networks have been suggested (Ohshiro et al., 2011). Unlike PPC models discussed above, these models replicate many reported empirical principles of MSI, including inverse effectiveness and cross-modal suppression, at the neuronal level. However, divisive normalization models, similar to PPC models, require neat connectivity configurations and a substantial number of synaptic connections for the normalization pool. More generally, explicit implementation of divisive normalization results in intractable computations with virtually infinite synaptic connectivity (Pitkow and Angelaki, 2017). Compare to explicit divisive normalization, our model goes further by integrating inputs with fundamentally different neural coding schemes and reproduces reported multisensory empirical principles without any explicit divisive operations and with considerably lower complexity (i.e., required units and connections).

A third category of MSI models use dynamic neural networks with lateral inhibition, subtractive normalization, and neuronal sigmoid transfer functions to reproduce both inverse effectiveness and cross-modal suppression (Magosso et al., 2008; Ursino et al., 2009). It can be shown that, in such networks, inverse effectiveness mostly resulted from the sigmoid feature of neurons while cross-modal suppression resulted from the lateral inhibition in the MSL. Our network is similar, but we show that feedforward connectivity is sufficient to produce MSI while still replicating the reported cross-modal suppression. Similarly, Ghosh et al. (2024) proposed a spiking neural network with nonlinear transfer functions demonstrating that nonlinear fusion is optimal for multisensory detection tasks. While their findings align with our work, their model focused on multimodal detection tasks but assumed identical representations across sensory channels. This contrasts with neuroscientific evidence showing that different modalities employ distinct representational coding schemes across reference frames (Zipser and Andersen, 1988; Crawford et al., 2004).

Model implications

Our network replicates the empirical principles of MSI exhibited by single neurons and is comparable to a neural network implementing explicit divisive normalization (Ohshiro et al., 2011). This implies that divisive normalization can be seen as an emergent functional characteristic of a system performing distributed statistical inferences with noisy inputs, without explicit divisive normalization. One of the main aspects of inferences in complex environments is marginalizing out the nuisance parameters. For example, in our model, eye orientation does not directly affect the cue combination; however, failing to account for eye orientation results in mis-estimation of hand position. This observation might explain the discrepancies between the PPC framework (Ma et al., 2006) and the explicit divisive normalization model (Ohshiro et al., 2011). Specifically, linear summation across individual sensory coding neurons in the PPC framework fails to explain the empirical principles of MSI (i.e., non-linearities) at the neuronal level. Meanwhile, explicit divisive normalization can successfully replicate the empirical principles at the single neuron level. Based on our model, the observed non-linearities are emergent properties of the marginalization operations that are crucial for cue integration. Our network implicitly implements divisive normalization with purely additive, feedforward computations whose weights are adapted during learning (Chalk et al., 2018). This resulted in fixed decoding (read-out layer) and context-dependent dynamic RFs in our network (gain modulation in Fig. 7 and cue reliability in Fig. 6). Intriguingly, such dynamic RFs have been reported in different areas of the brain (Kabara and Bonds, 2001; Yeh et al., 2009; Fournier et al., 2011; Trott and Born, 2015). This suggests marginalization through implicit divisive normalization might be a neuronal mechanism that makes the population activity decodable by downstream areas. This could explain the prevalence of apparent divisive normalization in many areas of the brain, such as the primary visual cortex, the olfactory system, and the hippocampus, and supports the speculation that it acts as a canonical operation throughout the cortex (Carandini and Heeger, 2011).

Model limitations and potential future work

Our model has several limitations that can be the subject of future work. First, we purposefully used a very simple network as a proof of concept: all units have the same sigmoid transfer function, and the network contains only feedforward connections. Future extensions of the model might shed light on the functional role of heterogeneity in connectivity structures and cell types (Burkhalter and Bernardo, 1989; Darmanis et al., 2015; Harris and Shepherd, 2015). Furthermore, our model only considered a specific class of PPCs in which neuronal activity (with Poisson variability) scales proportionally with sensory reliability. Reliability was defined as the inverse variance of sensory inputs to be comparable with experimental studies (Fetsch et al., 2011). While this class of population coding is reported in many brain areas (Seung and Sompolinsky, 1993; Salinas and Abbott, 1994; Sanger, 1996; Snippe, 1996; Zemel et al., 1998; Wu et al., 2001), other classes of population coding have also been suggested (Krekelberg et al., 2006; Morgan et al., 2008; Fetsch et al., 2011). Orhan and Ma (2017) implemented several statistical inferences with generative neural networks and showed their network performs the inferences without any constraint on the type of coding mechanisms used for sensory information. Similarly, we expect that changing the population coding should not affect the network’s ability to perform the task. However, it is not clear how different input and output coding schemes influence the specific solutions that a network generates to perform the MSI task.

Lastly, we trained our network using back-propagation methods. There is a long-lasting debate in the literature on the feasibility of physiological systems to implement back-propagation (Grossberg, 1987; Crick, 1989), mainly due to its nonlocality characteristics. However, recent work provided evidence that back-propagation can be estimated using more physiologically feasible functions (Bengio et al., 2015; Lillicrap et al., 2016). It would be valuable to investigate how much the results obtained depend on the exact training algorithm. However, we do not expect fundamental differences in the network’s performance, as previous studies have shown that different optimization methods tend to converge to similar solutions when reaching the global minimum (Blohm et al., 2009).

Conclusion

The results of this study reveal that MSI can be achieved in simple feedforward networks of purely additive units without the requirement of explicit divisive normalization. This is in line with the current view that the observed neuronal activities can be explained as emergent properties of the optimization processes required for statistical inferences.

Footnotes

  • This work was supported by the Natural Sciences and Engineering Research Council of Canada and the Canada Foundation for Innovation.

  • The authors declare no competing financial interests.

  • Correspondence should be addressed to Arefeh Farahmandi at 21afna{at}queensu.ca.

SfN exclusive license.

References

  1. ↵
    1. Alvarado JC,
    2. Vaughan JW,
    3. Stanford TR,
    4. Stein BE
    (2007) Multisensory versus unisensory integration: contrasting modes in the superior colliculus. J Neurophysiol 97:3193–3205. https://doi.org/10.1152/jn.00018.2007
    OpenUrlCrossRefPubMed
  2. ↵
    1. Andersen RA,
    2. Essick GK,
    3. Siegel RM
    (1985) Encoding of spatial location by posterior parietal neurons. Science 230:456–458. https://doi.org/10.1126/science.4048942
    OpenUrlAbstract/FREE Full Text
  3. ↵
    1. Atkins JE,
    2. Fiser J,
    3. Jacobs RA
    (2001) Experience-dependent visual cue integration based on consistencies between visual and haptic percepts. Vision Res 41:449–461. https://doi.org/10.1016/S0042-6989(00)00254-6
    OpenUrlCrossRefPubMed
  4. ↵
    1. Avillac M,
    2. Denève S,
    3. Olivier E,
    4. Pouget A,
    5. Duhamel JR
    (2005) Reference frames for representing visual and tactile locations in parietal cortex. Nat Neurosci 8:941–949. https://doi.org/10.1038/nn1480
    OpenUrlCrossRefPubMed
  5. ↵
    1. Avillac M,
    2. Hamed SB,
    3. Duhamel JR
    (2007) Multisensory integration in the ventral intraparietal area of the macaque monkey. J Neurosci 27:1922–1932. https://doi.org/10.1523/JNEUROSCI.2646-06.2007
    OpenUrlAbstract/FREE Full Text
  6. ↵
    1. Battaglia PW,
    2. Jacobs RA,
    3. Aslin RN
    (2003) Bayesian integration of visual and auditory signals for spatial localization. J Opt Soc Am A Opt Image Sci Vis 20:1391. https://doi.org/10.1364/JOSAA.20.001391
    OpenUrlCrossRefPubMed
  7. ↵
    1. Beck JM,
    2. Latham PE,
    3. Pouget A
    (2011) Marginalization in neural circuits with divisive normalization. J Neurosci 31:15310–15319. https://doi.org/10.1523/JNEUROSCI.1706-11.2011
    OpenUrlAbstract/FREE Full Text
  8. ↵
    1. Beck JM,
    2. Ma WJ,
    3. Pitkow X,
    4. Latham PE,
    5. Pouget A
    (2012) Not noisy, just wrong: the role of suboptimal inference in behavioral variability. Neuron 74:30–39. https://doi.org/10.1016/j.neuron.2012.03.016
    OpenUrlCrossRefPubMed
  9. ↵
    1. Bengio Y,
    2. Lee DH,
    3. Bornschein J,
    4. Lin Z
    (2015) “Towards Biologically Plausible Deep Learning.” Preprint, arXiv:1502.04156.
  10. ↵
    1. Blohm G
    (2012) Simulating the cortical 3D visuomotor transformation of reach depth. PLoS One 7:e41241. https://doi.org/10.1371/journal.pone.0041241
    OpenUrlCrossRefPubMed
  11. ↵
    1. Blohm G,
    2. Crawford JD
    (2007) Computations for geometrically accurate visually guided reaching in 3-D space. J Vis 7:1–22. https://doi.org/10.1167/7.5.4
    OpenUrlAbstract/FREE Full Text
  12. ↵
    1. Blohm G,
    2. Crawford JD
    (2009) Fields of gain in the brain. Neuron 64:598–600. https://doi.org/10.1016/j.neuron.2009.11.022
    OpenUrlCrossRefPubMed
  13. ↵
    1. Blohm G,
    2. Keith GP,
    3. Crawford JD
    (2009) Decoding the cortical transformations for visually guided reaching in 3D space. Cereb Cortex 19:1372–1393. https://doi.org/10.1093/cercor/bhn177
    OpenUrlCrossRefPubMed
  14. ↵
    1. Burkhalter A,
    2. Bernardo KL
    (1989) Organization of corticocortical connections in human visual cortex. Proc Natl Acad Sci U S A 86:1071–1075. https://doi.org/10.1073/pnas.86.3.1071
    OpenUrlAbstract/FREE Full Text
  15. ↵
    1. Carandini M,
    2. Heeger DJ
    (2011) Normalization as a canonical neural computation. Nat Rev Neurosci 13:51–62. https://doi.org/10.1038/nrn3136
    OpenUrlCrossRefPubMed
  16. ↵
    1. Chalk M,
    2. Marre O,
    3. Tkačik G
    (2018) Toward a unified theory of efficient, predictive, and sparse coding. Proc Natl Acad Sci U S A 115:186–191. https://doi.org/10.1073/pnas.1711114115
    OpenUrlAbstract/FREE Full Text
  17. ↵
    1. Chang SW,
    2. Papadimitriou C,
    3. Snyder LH
    (2009) Using a compound gain field to compute a reach plan. Neuron 64:744–755. https://doi.org/10.1016/j.neuron.2009.11.005
    OpenUrlCrossRefPubMed
  18. ↵
    1. Crawford JD,
    2. Medendorp WP,
    3. Marotta JJ
    (2004) Spatial transformations for eye-hand coordination. J Neurophysiol 92:10–19. https://doi.org/10.1152/jn.00117.2004
    OpenUrlCrossRefPubMed
  19. ↵
    1. Crick F
    (1989) The recent excitement about neural networks. Nature 337:129–132. https://doi.org/10.1038/337129a0
    OpenUrlCrossRefPubMed
  20. ↵
    1. Darmanis S,
    2. Sloan SA,
    3. Zhang Y,
    4. Enge M,
    5. Caneda C,
    6. Shuer LM,
    7. Gephart MG,
    8. Barres BA,
    9. Quake SR
    (2015) A survey of human brain transcriptome diversity at the single cell level. Proc Natl Acad Sci U S A 112:7285–7290. https://doi.org/10.1073/pnas.1507125112
    OpenUrlAbstract/FREE Full Text
  21. ↵
    1. Deneve S,
    2. Pouget A
    (2004) Bayesian multisensory integration and cross-modal spatial links. J Physiol Paris 98:249–258. https://doi.org/10.1016/j.jphysparis.2004.03.011
    OpenUrlCrossRefPubMed
  22. ↵
    1. Deneve S,
    2. Latham PE,
    3. Pouget A
    (2001) Efficient computation and cue integration with noisy population codes. Nat Neurosci 4:826–831. https://doi.org/10.1038/90541
    OpenUrlCrossRefPubMed
  23. ↵
    1. Duhamel JR,
    2. Bremmer F,
    3. BenHamed S,
    4. Graf W
    (1997) Spatial invariance of visual receptive fields in parietal cortex neurons. Nature 389:845–848. https://doi.org/10.1038/39865
    OpenUrlCrossRefPubMed
  24. ↵
    1. Ernst MO,
    2. Banks MS
    (2002) Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415:429–433. https://doi.org/10.1038/415429a
    OpenUrlCrossRefPubMed
  25. ↵
    1. Ernst MO,
    2. Bülthoff HH
    (2004) Merging the senses into a robust percept. Trends Cogn Sci 8:162–169. https://doi.org/10.1016/j.tics.2004.02.002
    OpenUrlCrossRefPubMed
  26. ↵
    1. Faisal AA,
    2. Selen LP,
    3. Wolpert DM
    (2008) Noise in the nervous system. Nat Rev Neurosci 9:292–303. https://doi.org/10.1038/nrn2258
    OpenUrlCrossRefPubMed
  27. ↵
    1. Fetsch CR,
    2. Pouget A,
    3. Deangelis GC,
    4. Angelaki DE
    (2011) Neural correlates of reliability-based cue weighting during multisensory integration. Nat Neurosci 15:146–154. https://doi.org/10.1038/nn.2983
    OpenUrlCrossRefPubMed
  28. ↵
    1. Fournier J,
    2. Monier C,
    3. Pananceau M,
    4. Frégnac Y
    (2011) Adaptation of the simple or complex nature of V1 receptive fields to visual statistics. Nat Neurosci 14:1053–1060. https://doi.org/10.1038/nn.2861
    OpenUrlCrossRefPubMed
  29. ↵
    1. Ghosh M,
    2. Béna G,
    3. Bormuth V,
    4. Goodman DF
    (2024) Nonlinear fusion is optimal for a wide class of multisensory tasks. PLoS Comput Biol 20:e1012246. https://doi.org/10.1371/journal.pcbi.1012246
    OpenUrlPubMed
  30. ↵
    1. Grossberg S
    (1987) Competitive learning: from interactive activation to adaptive resonance. Cogn Sci 11:23–63. https://doi.org/10.1111/j.1551-6708.1987.tb00862.x
    OpenUrlCrossRef
  31. ↵
    1. Harris KD,
    2. Shepherd GM
    (2015) The neocortical circuit: themes and variations. Nat Neurosci 18:170–181. https://doi.org/10.1038/nn.3917
    OpenUrlCrossRefPubMed
  32. ↵
    1. Hillis JM,
    2. Watt SJ,
    3. Landy MS,
    4. Banks MS
    (2004) Slant from texture and disparity cues: optimal cue combination. J Vis 4:967–992. https://doi.org/10.1167/4.12.1
    OpenUrlCrossRefPubMed
  33. ↵
    1. Kabara JF,
    2. Bonds AB
    (2001) Modification of response functions of cat visual cortical cells by spatially congruent perturbing stimuli. J Neurophysiol 86:2703–2714. https://doi.org/10.1152/jn.2001.86.6.2703
    OpenUrlCrossRefPubMed
  34. ↵
    1. Kersten D,
    2. Mamassian P,
    3. Yuille A
    (2004) Object perception as Bayesian inference. Annu Rev Psychol 55:271–304. https://doi.org/10.1146/annurev.psych.55.090902.142005
    OpenUrlCrossRefPubMed
  35. ↵
    1. Knill DC,
    2. Kersten D,
    3. Yuille A
    (1996) Introduction: a Bayesian formulation of visual perception. In: Perception as Bayesian inference, pp 1–21. Cambridge: Cambridge University Press.
  36. ↵
    1. Knill DC,
    2. Pouget A
    (2004) The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci 27:712–719. https://doi.org/10.1016/j.tins.2004.10.007
    OpenUrlCrossRefPubMed
  37. ↵
    1. Körding KP,
    2. Wolpert DM
    (2004) Bayesian integration in sensorimotor learning. Nature 427:244–247. https://doi.org/10.1038/nature02169
    OpenUrlCrossRefPubMed
  38. ↵
    1. Körding KP,
    2. Wolpert DM
    (2006) Bayesian decision theory in sensorimotor control. Trends Cogn Sci 10:319–326. https://doi.org/10.1016/j.tics.2006.05.003
    OpenUrlCrossRefPubMed
  39. ↵
    1. Krekelberg B,
    2. Van Wezel RJ,
    3. Albright TD
    (2006) Interactions between speed and contrast tuning in the middle temporal area: implications for the neural code for speed. J Neurosci 26:8988–8998. https://doi.org/10.1523/JNEUROSCI.1983-06.2006
    OpenUrlAbstract/FREE Full Text
  40. ↵
    1. Landy MS,
    2. Kojima H
    (2001) Ideal cue combination for localizing texture-defined edges. J Opt Soc Am A Opt Image Sci Vis 18:2307–2320. https://doi.org/10.1364/JOSAA.18.002307
    OpenUrlCrossRefPubMed
  41. ↵
    1. Landy MS,
    2. Maloney LT,
    3. Johnston EB,
    4. Young M
    (1995) Measurement and modeling of depth cue combination: in defense of weak fusion. Vision Res 35:389–412. https://doi.org/10.1016/0042-6989(94)00176-M
    OpenUrlCrossRefPubMed
  42. ↵
    1. Lillicrap TP,
    2. Cownden D,
    3. Tweed DB,
    4. Akerman CJ
    (2016) Random synaptic feedback weights support error backpropagation for deep learning. Nat Commun 7:13276. https://doi.org/10.1038/ncomms13276
    OpenUrlCrossRefPubMed
  43. ↵
    1. Ma WJ,
    2. Beck JM,
    3. Latham PE,
    4. Pouget A
    (2006) Bayesian inference with probabilistic population codes. Nat Neurosci 9:1432–1438. https://doi.org/10.1038/nn1790
    OpenUrlCrossRefPubMed
  44. ↵
    1. Magosso E,
    2. Cuppini C,
    3. Serino A,
    4. Di Pellegrino G,
    5. Ursino M
    (2008) A theoretical study of multisensory integration in the superior colliculus by a neural network model. Neural Netw 21:817–829. https://doi.org/10.1016/j.neunet.2008.06.003
    OpenUrlCrossRefPubMed
  45. ↵
    1. Meredith MA,
    2. Stein BE
    (1986) Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. J Neurophysiol 56:640–662. https://doi.org/10.1152/jn.1986.56.3.640
    OpenUrlCrossRefPubMed
  46. ↵
    1. Merfeld DM,
    2. Zupan L,
    3. Peterka RJ
    (1999) Humans use internal models to estimate gravity and linear acceleration. Nature 398:615–618. https://doi.org/10.1038/19303
    OpenUrlCrossRefPubMed
  47. ↵
    1. Morgan ML,
    2. DeAngelis GC,
    3. Angelaki DE
    (2008) Multisensory integration in macaque visual cortex depends on cue reliability. Neuron 59:662–673. https://doi.org/10.1016/j.neuron.2008.06.024
    OpenUrlCrossRefPubMed
  48. ↵
    1. Murdison TS,
    2. Leclercq G,
    3. Lefèvre P,
    4. Blohm G
    (2015) Computations underlying the visuomotor transformation for smooth pursuit eye movements. J Neurophysiol 113:1377–1399. https://doi.org/10.1152/jn.00273.2014
    OpenUrlCrossRefPubMed
  49. ↵
    1. Murdison TS,
    2. Blohm G,
    3. Bremmer F
    (2019) Saccade-induced changes in ocular torsion reveal predictive orientation perception. J Vis 19:10–10. https://doi.org/10.1167/19.11.10
    OpenUrlCrossRefPubMed
  50. ↵
    1. Naka KI,
    2. Rushton WA
    (1966) An attempt to analyse colour reception by electrophysiology. J Physiol 185:556. https://doi.org/10.1113/jphysiol.1966.sp008002
    OpenUrlCrossRefPubMed
  51. ↵
    1. Ohshiro T,
    2. Angelaki DE,
    3. Deangelis GC
    (2011) A normalization model of multisensory integration. Nat Neurosci 14:775–782. https://doi.org/10.1038/nn.2815
    OpenUrlCrossRefPubMed
  52. ↵
    1. Ohshiro T,
    2. Angelaki DE,
    3. DeAngelis GC
    (2017) A neural signature of divisive normalization at the level of multisensory integration in primate cortex. Neuron 95:399–411. https://doi.org/10.1016/j.neuron.2017.06.043
    OpenUrlCrossRefPubMed
  53. ↵
    1. Orhan AE,
    2. Ma WJ
    (2017) Efficient probabilistic inference in generic neural networks trained with non-probabilistic feedback. Nat Commun 8:138. https://doi.org/10.1038/s41467-017-00181-8
    OpenUrlPubMed
  54. ↵
    1. Paninski L,
    2. Fellows MR,
    3. Hatsopoulos NG,
    4. Donoghue JP
    (2004) Spatiotemporal tuning of motor cortical neurons for hand position and velocity. J Neurophysiol 91:515–532. https://doi.org/10.1152/jn.00587.2002
    OpenUrlCrossRefPubMed
  55. ↵
    1. Perrault TJ,
    2. Vaughan JW,
    3. Stein BE,
    4. Wallace MT
    (2003) Neuron-specific response characteristics predict the magnitude of multisensory integration. J Neurophysiol 90:4022–4026. https://doi.org/10.1152/jn.00494.2003
    OpenUrlCrossRefPubMed
  56. ↵
    1. Perrault TJ,
    2. Vaughan JW,
    3. Stein BE,
    4. Wallace MT
    (2005) Superior colliculus neurons use distinct operational modes in the integration of multisensory stimuli. J Neurophysiol 93:2575–2586. https://doi.org/10.1152/jn.00926.2004
    OpenUrlCrossRefPubMed
  57. ↵
    1. Pitkow X,
    2. Angelaki DE
    (2017) Inference in the brain: statistics flowing in redundant population codes. Neuron 94:943–953. https://doi.org/10.1016/j.neuron.2017.05.028
    OpenUrlCrossRefPubMed
  58. ↵
    1. Pouget A,
    2. Snyder LH
    (2000) Computational approaches to sensorimotor transformations. Nat Neurosci 3 Suppl:1192–1198. https://doi.org/10.1038/81469
    OpenUrlCrossRefPubMed
  59. ↵
    1. Riedmiller M,
    2. Braun H
    (1993) A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: IEEE International Conference on Neural Networks, San Francisco, CA, USA, Vol. 1, pp. 586–591. https://doi.org/10.1109/ICNN.1993.298623
  60. ↵
    1. Salinas E,
    2. Abbott LF
    (1994) Vector reconstruction from firing rates. J Comput Neurosci 1:89–107. https://doi.org/10.1007/BF00962720
    OpenUrlCrossRefPubMed
  61. ↵
    1. Salinas E,
    2. Sejnowski TJ
    (2001) Gain modulation in the central nervous system: where behavior, neurophysiology, and computation meet. Neuroscientist 7:430–440. https://doi.org/10.1177/107385840100700512
    OpenUrlCrossRefPubMed
  62. ↵
    1. Sanger TD
    (1996) Probability density estimation for the interpretation of neural population codes. J Neurophysiol 76:2790–2793. https://doi.org/10.1152/jn.1996.76.4.2790
    OpenUrlCrossRefPubMed
  63. ↵
    1. Schlack A,
    2. Sterbing-D’Angelo SJ,
    3. Hartung K,
    4. Hoffmann KP,
    5. Bremmer F
    (2005) Multisensory space representations in the macaque ventral intraparietal area. J Neurosci 25:4616–4625. https://doi.org/10.1523/JNEUROSCI.0455-05.2005
    OpenUrlAbstract/FREE Full Text
  64. ↵
    1. Seung HS,
    2. Sompolinsky H
    (1993) Simple models for reading neuronal population codes. Proc Natl Acad Sci U S A 90:10749–10753. https://doi.org/10.1073/pnas.90.22.10749
    OpenUrlAbstract/FREE Full Text
  65. ↵
    1. Snippe HP
    (1996) Parameter extraction from population codes: a critical assessment. Neural Comput 8:511–529. https://doi.org/10.1162/neco.1996.8.3.511
    OpenUrlCrossRefPubMed
  66. ↵
    1. Stanford TR,
    2. Quessy S,
    3. Stein BE
    (2005) Evaluating the operations underlying multisensory integration in the cat superior colliculus. J Neurosci 25:6499–6508. https://doi.org/10.1523/JNEUROSCI.5095-04.2005
    OpenUrlAbstract/FREE Full Text
  67. ↵
    1. Stein BE,
    2. Stanford TR
    (2008) Multisensory integration: current issues from the perspective of the single neuron. Nat Rev Neurosci 9:255–266. https://doi.org/10.1038/nrn2331
    OpenUrlCrossRefPubMed
  68. ↵
    1. Trott AR,
    2. Born RT
    (2015) Input-gain control produces feature-specific surround suppression. J Neurosci 35:4973–4982. https://doi.org/10.1523/JNEUROSCI.4000-14.2015
    OpenUrlAbstract/FREE Full Text
  69. ↵
    1. Ursino M,
    2. Cuppini C,
    3. Magosso E,
    4. Serino A,
    5. Pellegrino G
    (2009) Multisensory integration in the superior colliculus: a neural network model. J Comput Neurosci 26:55–73. https://doi.org/10.1007/s10827-008-0096-4
    OpenUrlCrossRefPubMed
  70. ↵
    1. Wolpert DM,
    2. Ghahramani Z,
    3. Jordan MI
    (1995) An internal model for sensorimotor integration. Science 269:1880–1882. https://doi.org/10.1126/science.7569931
    OpenUrlAbstract/FREE Full Text
  71. ↵
    1. Wu S,
    2. Nakahara H,
    3. Amari SI
    (2001) Population coding with correlation and an unfaithful model. Neural Comput 13:775–797. https://doi.org/10.1162/089976601300014349
    OpenUrlCrossRefPubMed
  72. ↵
    1. Xing J,
    2. Andersen RA
    (2000) Models of the posterior parietal cortex which perform multimodal integration and represent space in several coordinate frames. J Cogn Neurosci 12:601–614. https://doi.org/10.1162/089892900562363
    OpenUrlCrossRefPubMed
  73. ↵
    1. Yeh CI,
    2. Xing D,
    3. Williams PE,
    4. Shapley RM
    (2009) Stimulus ensemble and cortical layer determine V1 spatial receptive fields. Proc Natl Acad Sci U S A 106:14652–14657. https://doi.org/10.1073/pnas.0907406106
    OpenUrlAbstract/FREE Full Text
  74. ↵
    1. Zemel RS,
    2. Dayan P,
    3. Pouget A
    (1998) Probabilistic interpretation of population codes. Neural Comput 10:403–430. https://doi.org/10.1162/089976698300017818
    OpenUrlCrossRefPubMed
  75. ↵
    1. Zeng F,
    2. Zaidel A,
    3. Chen A
    (2023) Contrary neuronal recalibration in different multisensory cortical areas. Elife 12:e82895. https://doi.org/10.7554/eLife.82895
    OpenUrlCrossRefPubMed
  76. ↵
    1. Zipser D,
    2. Andersen RA
    (1988) A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons. Nature 331:679–684. https://doi.org/10.1038/331679a0
    OpenUrlCrossRefPubMed
Back to top

In this issue

The Journal of Neuroscience: 45 (41)
Journal of Neuroscience
Vol. 45, Issue 41
8 Oct 2025
  • Table of Contents
  • About the Cover
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Beyond Divisive Normalization: Scalable Feedforward Networks for Multisensory Integration Across Reference Frames
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Beyond Divisive Normalization: Scalable Feedforward Networks for Multisensory Integration Across Reference Frames
Arefeh Farahmandi, Parisa Abedi Khoozani, Gunnar Blohm
Journal of Neuroscience 8 October 2025, 45 (41) e0104252025; DOI: 10.1523/JNEUROSCI.0104-25.2025

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
Beyond Divisive Normalization: Scalable Feedforward Networks for Multisensory Integration Across Reference Frames
Arefeh Farahmandi, Parisa Abedi Khoozani, Gunnar Blohm
Journal of Neuroscience 8 October 2025, 45 (41) e0104252025; DOI: 10.1523/JNEUROSCI.0104-25.2025
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Conclusion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • Peer Review
  • PDF

Keywords

  • Bayesian inference
  • cue combination
  • neural networks
  • population code
  • probabilistic inference
  • reference frame transformations

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Articles

  • Using fMRI representations of single objects to predict multiple objects in working memory in human occipitotemporal and posterior parietal cortices
  • Prefrontal default-mode network interactions with posterior hippocampus during exploration
  • Increased perceptual reliability reduces membrane potential variability in cortical neurons
Show more Research Articles

Systems/Circuits

  • Increased perceptual reliability reduces membrane potential variability in cortical neurons
  • Synergistic geniculate and cortical dynamics facilitate a decorrelated spatial frequency code in the early visual system
  • Collapsing Perisomatic Inhibition Leads to Epileptic Fast-Ripple Oscillations Caused by Pseudosynchronous Firing of CA3 Pyramidal Neurons
Show more Systems/Circuits
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Accessibility
(JNeurosci logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.