Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE

User menu

  • Log out
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Neuroscience
  • Log out
  • Log in
  • My Cart
Journal of Neuroscience

Advanced Search

Submit a Manuscript
  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Collections
    • Podcast
  • ALERTS
  • FOR AUTHORS
    • Information for Authors
    • Fees
    • Journal Clubs
    • eLetters
    • Submit
    • Special Collections
  • EDITORIAL BOARD
    • Editorial Board
    • ECR Advisory Board
    • Journal Staff
  • ABOUT
    • Overview
    • Advertise
    • For the Media
    • Rights and Permissions
    • Privacy Policy
    • Feedback
    • Accessibility
  • SUBSCRIBE
PreviousNext
Research Articles, Systems/Circuits

Recurrent Neural Circuits Overcome Partial Inactivation by Compensation and Re-learning

Colin Bredenberg, Cristina Savin and Roozbeh Kiani
Journal of Neuroscience 17 April 2024, 44 (16) e1635232024; https://doi.org/10.1523/JNEUROSCI.1635-23.2024
Colin Bredenberg
1Center for Neural Science, New York University, New York, NY 10003
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cristina Savin
1Center for Neural Science, New York University, New York, NY 10003
2Center for Data Science, New York University, New York, NY 10011
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Roozbeh Kiani
1Center for Neural Science, New York University, New York, NY 10003
3Department of Psychology, New York University, New York, NY 10003
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Roozbeh Kiani
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

Technical advances in artificial manipulation of neural activity have precipitated a surge in studying the causal contribution of brain circuits to cognition and behavior. However, complexities of neural circuits challenge interpretation of experimental results, necessitating new theoretical frameworks for reasoning about causal effects. Here, we take a step in this direction, through the lens of recurrent neural networks trained to perform perceptual decisions. We show that understanding the dynamical system structure that underlies network solutions provides a precise account for the magnitude of behavioral effects due to perturbations. Our framework explains past empirical observations by clarifying the most sensitive features of behavior, and how complex circuits compensate and adapt to perturbations. In the process, we also identify strategies that can improve the interpretability of inactivation experiments.

  • circuit perturbation
  • compensation
  • functional integrity
  • re-learning
  • recurrent neural networks
  • redundant architecture

Significance Statement

Neuroscientists heavily rely on artificial perturbations of neural activity to understand the function of brain circuits. Current interpretations of experimental results often suggest that the magnitude of a behavioral effect following a perturbation indicates the degree of involvement of the perturbed circuit in the behavior. We model a variety of neural networks with controlled levels of complexity, robustness, and plasticity, showing that perturbation experiments could yield counter-intuitive results when networks are complex enough—to allow unperturbed pathways to compensate for the perturbed neurons—or plastic enough—to allow continued learning from feedback during perturbations. To rein in these complexities, we develop a functional integrity index that captures alterations in network computations and predicts disruptions of behavior with the perturbation.

Introduction

Artificial manipulation of neural circuits are vital tools for investigation of neural computations that underlie behavior. However, their results are often challenging to interpret. Beyond sensory bottlenecks, the distributed nature of computations for higher brain functions makes attribution of a single function to a single circuit challenging, because multiple areas may jointly contribute to a function, with built-in redundancies that allow them to mutually compensate for inactivity in other regions (Wolff and Ölveczky, 2018). The capacity of neural circuits to compensate for inactivation is well documented (Vaidya et al., 2019; Jeurissen et al., 2022). This implies that other neurons in the circuit or areas of the brain which are not normally causal in producing a particular behavior can adapt to play an important role. Transient manipulation may be more difficult to adapt to, but there is evidence for compensation for even transient optogenetic inactivation (Fetsch et al., 2018).

Given the potential complexities of perturbation effects, modeling becomes instrumental for reasoning about experimental outcomes. In particular, artificial recurrent neural networks (RNNs) can implement key computations for cognitive tasks (Rigotti et al., 2010; Mante et al., 2013; Yang et al., 2019; Driscoll et al., 2022), accommodate various architectures and training objectives, and provide direct access to their inner-workings with arbitrarily precise observations and causal manipulations. Critically, the contribution of network elements to the output of RNNs is, by construction, known. These features make RNNs a powerful framework for constructing numerical “thought experiments” that test the ability of causal interventions to reveal the role that neurons play in the distributed computations performed in a complex network.

As an example of the complexities involved in causal manipulation of neural circuits, we focus on the integration of sensory evidence for discriminating the direction of random dot kinematograms (Shadlen and Newsome, 2001). The computations involved in this task engages multiple frontoparietal (Kim and Shadlen, 1999; Roitman and Shadlen, 2002; Kiani et al., 2014b; Mochol et al., 2021) and subcortical areas (Horwitz and Newsome, 1999; Ratcliff et al., 2011; Ding and Gold, 2013). Causal studies targeting the posterior parietal regions have yielded contradictory outcomes (Hanks et al., 2006; Erlich et al., 2015; Katz et al., 2016; Licata et al., 2017; Zhou and Freedman, 2019a), reflecting a distributed process, whose complexity challenges standard task designs for causal interventions. In contrast, some of the biggest success stories in explaining brain computation using RNNs center specifically on perceptual decision-making (Mante et al., 2013; Pagan et al., 2022). RNNs trained on evidence integration tasks exhibit robust low-dimensional activity dynamics which are well understood computationally (Goldman et al., 2003; Wong and Wang, 2006; Cain et al., 2013) and replicate key features of recorded neural activity (Roitman and Shadlen, 2002; Huk and Shadlen, 2005; Churchland et al., 2011; Pagan et al., 2022).

Here, we use RNNs to systematically study causal manipulations in the direction discrimination task. Trained RNNs exhibit near-optimal evidence integration by constructing a low-dimensional attractor, whose structure provides a direct route to investigating the computational integrity of the circuit. The intact network also recapitulates neural activity and behavioral responses observed in macaques performing evidence integration tasks. Using a phase plane characterization of the RNN, we provide a precise quantification of the functional integrity of any given network. We show that the inactivation of a subset of neurons affects behavior by damaging the network’s underlying low-dimensional attractor, with larger perturbations to the attractor dynamics having a greater impact on both accuracy and reaction times.

Additionally, we show that inactivation effects are variable in architectures with built-in redundancy, but this variability is explained by the integrity of the network’s attractors. In networks trained to perform multiple tasks, inactivation can have surprisingly inconsistent behavioral effects across tasks. It may affect one task and not the other, even though the circuit is designed to be necessary for both. Lastly, we demonstrate that RNNs that retain plasticity—and continue learning—after the inactivation reconfigure themselves to regain the accuracy they had prior to inactivation. The diversity of inactivation effects that we observe here, even under simplistic variations in architecture, learning, and task structure assumptions, serves as a collection of cautionary tales for interpreting causal experiments in real neural circuits. It also suggests concrete ways to improve experimental design.

Methods

Cognitive functions depend on interactions of neurons in large, recurrent networks. To explore the utility and limitations of inactivation and lesion studies for discovering the flow of information and causal interactions in these networks, we simulated RNN models with different degrees of complexity and selectively inactivated subpopulations of neurons within the simulated networks. The models were trained to perform simple perceptual decisions, commonly used for investigating cortical and subcortical neural responses and their causal contributions to behavior (Hanks et al., 2015; Katz et al., 2016; Fetsch et al., 2018; Zhou and Freedman, 2019b). Our simulations and theoretical exploration focus on an abstract analog of the direction discrimination task with random dots (Newsome and Pare, 1988; Roitman and Shadlen, 2002) as a canonical example of perceptual decision-making.

Evidence integration task

We simulated RNNs performing a simple version of the random dots task. A univariate input corresponding to momentary sensory evidence was sampled i.i.d. from a Gaussian distribution, s(t) ∼ N(kC, 1), at each step t. Trial-specific parameter C is the motion strength (coherence), and k is a sensitivity parameter that translates motion strength to sensory evidence. In our simulations, the sensitivity was k = 0.4, and C was drawn uniformly from a discrete set: [ − 0.512, − 0.256, − 0.128, − 0.064, − 0.032, 0, 0.032, 0.064, 0.128, 0.256, 0.512] on each trial. Positive and negative motion strengths indicate rightward and leftward directions, respectively. The network was trained to discriminate the two motion directions based on input evidence, as explained below.

To integrate these inputs through time, we defined the task loss to be the mean-squared error (MSE) between the network output and an integrated decision variable (DV) given byDV(t)=min(max(a∑i=0ts(i),−b),b), (1)with the rescaling factor a = 0.025 used to keep the integrated variable within the dynamic range of network output, and b = 0.5 acting as a bound on integration. This yields a loss function L of the form:L=[∑t=0T(DV(t)−o(t))2]trials, (2)where o(t) is the network’s output at time t (see details below) and the square brackets denote expectation over trials. To ensure convergence to an optimal set of weight parameters, we trained the RNNs in PyTorch (Paszke et al., 2019) using backpropagation through time (BPTT) and Adam (Kingma and Ba, 2014) with a learning rate of 2 × 10−6. Each network was trained over 25,000 trials and tested on a separate group of 1,500 trials for investigating network computations, task performance, and susceptibility to various activity perturbations. During training, the length of each trial was selected randomly from a mean-shifted and truncated exponential distribution, T ∼ 100 + exprand(200), with a maximum duration of 500 time steps. During testing, we used the maximum stimulus duration to give all trial the chance to reach the decision bounds, enabling us to determine both choice and decision time on each trial.

RNN architectural variations

The RNN dynamics take the general form:r(t)=f(Wr(t−1)+Wins(t)+η(t)), (3)where W denotes the recurrent weight matrix, Win is the input weight matrix, η is i.i.d. normal noise N(0,0.01), and f( · ) is a tanh nonlinearity. The network output is given byo(t)=Dr(t), (4)where D is a 1 × N linear decoder, and N is the number of neurons in the network.

For the first network architecture, we trained a two-population network, where the first population (P1) receives the evidence input, s(t) via a linear projection to its neurons, and a linear decoder (Eq. 4) reads out the integrated input from the second population (P2). Each population is fully recurrently connected, while the connections between the two populations are feedforward, with probability of 0.3. For the sake of simplicity, there were no feedback connections from P2 to P1. The first population had 30 neurons, and the second had 60. These two populations fulfill the roles of a low-level sensory region that relay input information (P1), and a higher-order region that integrates the information for a decision (P2). This network is much simpler than the circuit that underlies sensory decisions in a mammalian brain, where motion selective sensory neurons (e.g., middle temporal visual neurons in the primate brain) pass information about the sensory stimulus (Newsome and Pare, 1988; Salzman et al., 1990; Britten et al., 1992) to a large network of association cortex, motor cortex, and subcortical areas that form the decision (Horwitz and Newsome, 1999; Kim and Shadlen, 1999; Roitman and Shadlen, 2002; Ding and Gold, 2013; Mante et al., 2013; Kiani et al., 2014a). However, our circuit lends itself to mathematical analysis, can be trained without adding structural complications, and can be used for systematic exploration of inactivation effects.

To assess the role of connection sparsity on various properties of the network, we modified the network architecture using four different connection probability values, [1.0, 0.8, 0.6, 0.4], with 1.0 indicating no connection sparsity, and 0.4 indicating highly sparse connectivity for decision subpopulation P2, with all other parameters as above.

In the distributed network that subserves perceptual decision-making, multiple circuits could operate in parallel, performing similar computations. To investigate the impact of this redundancy, we consider a second network architecture, in which two unconnected populations (30 neurons each) independently integrated evidence from the same input stream and jointly contributed to the decision output. Note that our simulation is not meant to capture the full complexity of the equivalent brain circuits but rather to offer a minimalist design that captures a key and commonly observed feature of brain networks: parallel processing of sensory information in a variety of frontal and parietal cortical regions (e.g., lateral intraparietal, frontal eye fields, and lateral and medial prefrontal areas of the monkey brain).

Modeling inactivation

To emulate lesion or inactivation experiments, a group of neurons in the network within the decision-making subpopulation (P2 in the first architecture) were randomly selected as targets of inactivation. The fraction of neurons in this subpopulation was determined by the strength of inactivation, ranging from weak (5–10%), to medium (20%), and strong (30%). To model inactivation, we set all input, recurrent, and decoder weights for the inactivated neurons to 0, completely decoupling the inactivated neurons from both the network dynamics and output. In lesion or long-term inactivation experiments [e.g., muscimol injections or designer receptors exclusively activated by designer drugs (DREADDs)], the connections remained affected throughout all trials within a testing block. In inactivation experiments with fast timescales (e.g., optogenetic perturbations), the connections were affected for a random subset of trials intermixed with other trials in which all connections and neurons functioned normally. In the second architecture, we inactivated all neurons in one of the two subpopulations and analyzed the effect of this intervention on network responses. Note that our simulation is not meant to capture the full complexity of the equivalent brain circuits but rather to offer a minimalist design that captures a key complexity commonly observed in brain networks: parallel processing of sensory information in a variety of frontal and parietal cortical regions (e.g., lateral intraparietal, frontal eye fields, and lateral and medial prefrontal areas of the monkey brain).

Analysis of neural responses

After training, we applied several analyses to characterize the nature of the network computations and effects of perturbations. Because there is not an explicit reaction time in our training framework, we set symmetric decision boundaries on the network output o(t) as a proxy for reaction time. We quantified the time until o(t) reached one of the boundaries on each trial. The crossed boundary dictated the choice on the trial and the time to bound determined the decision time. Formally, the reaction time is given byRT=argminto(t)s.t.|o(t)|>0.4 (5)and the choice is given bychoice=sign(o(RT)). (6)For each trained RNN, we constructed a psychometric function by measuring the proportion of “left” and “right” motion choices, and we fit the psychometric function using the following logistic regression:p(right)=11+exp(b0+b1C), (7)where p(right) is the proportion of “right” choices, and bi are regression coefficients. b0 reflects the choice bias, and b1 the sensitivity of choices to changes in motion strength.

We constructed chronometric functions by stratifying the mean decision times as a function of motion strength. For simplicity, we fit the chronometric functions with nonlinear regression using the following bell-shaped function:RT=b0+b1exp(−(C/b2)2), (8)where RT is the network’s decision time.

To explore the dynamics of neural responses, we performed principal component analysis (PCA) on the network activity over time and trials. We analyzed neural trajectories associated with each choice by averaging neural firing rates within the output population for choices to the left, for each motion strength. A perfect integrator would have a mean response linearly increasing through time, and the slope of that linear increase would vary linearly with changes in motion strength (Shadlen et al., 2006). To verify this, we fit the mean output (ot) through time by linear regression, and analyzed the slope of this fit as a function of coherence. Given that a perfect integrator would show a linear increase of variance over time, we also measured the variability of the network responses at each time point and for each motion strength by quantifying the empirical variance of the network output across test trials with the same motion strength.

One-dimensional approximate dynamics and pitchfork bifurcation

Given the low-dimensional structure of the trained RNN dynamics, we can provide a one-dimensional approximation of our RNN by projecting the network dynamics along its first principal component. Let V be eigenvectors corresponding to distinct eigenvalues of the covariance matrix of the network activity, obtained by combining trials across time and stimuli. These normalized vectors define an orthonormal basis, with the first and n-th axis corresponding to the direction of maximum and minimum variance, respectively. The activity of the network in this rotated coordinate system becomes rrot(t)=V⊤r(t). Using Equation 3, this leads to dynamics:V⊤r(t)=V⊤f(WVV⊤r(t−1)+Wins(t)+η(t)), (9)where we have used the fact that the basis is orthonormal, i.e., VV⊤=I. Substituting our definition for rrot, we haverrot(t)=V⊤f(WVrrot(t−1)+Wins(t)+η). (10)Focusing on the first dimension, along the axis of maximum variance, yieldsrrot1(t)=v1⊤f(∑k=1nrrotk(t−1)Wvk+Wins(t)+η), (11)where vk denotes the kth eigenvector, and rrotk(t) is the kth entry of rrot(t). Assuming that the system is largely one dimensional, the expression for the dynamics can be further simplified asrrot1(t)≈v1⊤f(rrot1(t−1)Wv1+Wins(t)+η). (12)This approximation effectively discards the contribution of the remaining dimensions, under the assumption that their effect on the network dynamics is minimal, i.e., E[∑k=2Nrrotk(t−1)Wvk]≈0, which holds empirically for our trained networks.

Having derived a one-dimensional dynamical system approximation to the RNN activity, we can use phase plane methods to determine the nature of the learned dynamics. We are interested in the geometry of the solution our network finds, which leads us to assess its fixed point dynamics in the absence of input and noise (s(t)=0,η(t)=0). Finding these fixed points involves finding the solutions of equation:Δrrot=v1⊤f(rrot(t−1)Wv1)−v1⊤rrot(t−1)v1=0, (13)where Δrrot = rrot(t) − rrot(t − 1), and we have used the fact that the eigenvectors are normalized, i.e., v1⊤v1=1. Furthermore, using a Taylor approximation of tanh about 0, f(x) ≍ x − x3, and rearranging the terms simplifies the equation toΔrrot≈γrrot(t−1)−βrrot(t−1)3∝γ/βrrot(t−1)−rrot(t−1)3, (14)where γ=v1⊤(W−I)v1, β=v1⊤(Wv1)∘3 is empirically positive, and ( · )°3 denotes an element-wise cube. The resulting equation is cubic, meaning its fixed point equation (Δr(t) = 0) has up to three solutions. This generally results in a topology with two stable fixed points separated by one unstable fixed point. These points coalesce into a single stable fixed point, rrot(t) = 0, when the coefficient of rrot(t − 1) changes from positive to negative, with the system undergoing a supercritical pitchfork bifurcation (Strogatz, 2018).

For the network to work properly, it needs to be in the regime with two stable (if shallow) attractors, with an abrupt degradation once reaching the critical point for the phase transition. For our approximate dynamics, this transition occurs once:γ/β<0. (15)For this reason, for all of our experiments, we refer to the value α=γ/(ϵ+β) as the “bifurcation criterion”, where we have included ϵ=5×10−3 in the denominator to prevent ill conditioning caused by dividing by β values close to zero. We will use this value as a functional integrity index to quantify the impact of inactivations on neural circuits.

Re-learning with feedback after perturbation

To investigate whether perturbations have a lasting impact on the performance of a network with plastic neurons, we allowed the network (first architecture) to be trained following inactivation. Two training regimes were used to emulate different experimental techniques with slow and fast timescales for inactivation. In both regimes, we silenced a fraction of neurons in the P2 population and allowed the connection weights of the remaining neurons to change through re-learning. The first retraining regime was designed to emulate lesion and pharmacological inactivation studies, which affect the circuit for an extended time, ranging from a whole experimental session to permanent. In this regime, the affected neurons remained inactive throughout the retraining period. The second regime was designed to emulate optogenetic or other techniques with faster timescales, which allow interleaving perturbations with unperturbed trials. In this regime, we silenced the affected neurons in a random half of retraining trials and allowed them to function in the other half; synapses were modified in all trials.

To assess the efficacy of retraining in restoring the network performance, we used the state of synapses at various times during retraining to simulate 1,500 test trials and calculate the percentage of correct responses. Connection weights were kept constant in the test trials. Additionally, we calculated the projection of the network activation onto its first principal component following the initial training, after the inactivation and prior to retraining, and at various times during retraining. Finally, we calculated the corresponding functional integrity index α (Eq. 15).

Task-dependence of inactivation effects
Context-dependent integration

One way in which context can gate information processing is when the network performs the same kind of computation (e.g., integration), but on different input streams in each context. To examine the effects of inactivation on networks performing multiple similar tasks, we trained a single population model (N = 100) to perform contextual evidence integration, in which the network was required to integrate one of its two inputs (s(1) or s(2), with the same statistics as above) depending on a separate context signal, η (one-hot encoded), also provided as input to the network (Mante et al., 2013). The objective function for training wasL=[η∑t=0T(DV(1)(t)−o(t))2+(1−η)∑t=0T(DV(2)(t)−o(t))2]η,trials, (16)where η ∼ Bernoulli(0.5) is the context indicator, drawn independently for each trial with 50% probability, and DV(1) and DV(2) are given by Equation 1 for stimuli s(1) and s(2), respectively. We trained the network on 300, 000 trials, so that it reached good performance on both tasks. We then applied weak (%5 of neurons) and medium-strength (%10 of neurons) inactivations to the network. Here we have redefined the “weak” and “medium” inactivation magnitudes as the inactivation is applied to an entire 100-neuron network, rather than selectively to one subpopulation.

We calculated the psychometric and chronometric functions, as well as the sensitivity and bias, as in section Analysis of neural responses, but we also calculated the psychometric function where stimulus strength were based on the off-context input instead of the on-context input. This allowed us to see if the off-context inputs affected network behavior in any way.

Context-dependent task switching

Another way in which context can gate information processing is by requiring different computations on the same input stream. As a simple instantiation of this idea, we used a single population architecture, with a single input stream. Depending on the context cue, η, the network had to switch between traditional integration and a simpler replication task, in which the network output had to reproduce its input. The corresponding loss for this task isL=[η∑t=0T(DV(t)−o(t))2+(1−η)∑t=0T(s(t)−o(t))2]η,trials. (17)The two contexts had each 50% probability, drawn independently on each trial. Lesion experiments took the same form as for the context-dependent task switching described above.

Biologically plausible learning

It is highly unlikely that a neural system could receive detailed feedback about the difference between a DV and an integrated target trajectory. There is no supervised signal for this target trajectory, and if a neural system was able to construct the target, why not just use it to solve the task? Instead, an animal is much more likely to use reward feedback at the end of each trial to guide learning. To verify that our results hold in this situation, we adapted our training procedure to use a cross-entropy loss function, which is more amenable to a biological implementation:L(t)=−[clog(c^t)+(1−c)log(1−c^(t))]trials, (18)where c = sign(C), and c^=σ(o(t)), where σ( · ) is a sigmoid nonlinearity. For the simulations presented here, we evaluated the loss at every time step, though we achieved qualitatively similar results with only end-of-trial evaluation.

BPTT (Werbos, 1990) is well-established as a biologically implausible learning algorithm, but several approximations or alternative formulations of BPTT are arguably biologically plausible (Marschall et al., 2020). To verify that our results still hold for a biologically plausible learning algorithm, we selected Random Feedback Local Online (RFLO; Murray, 2019), the simplest approximation that still takes into account statistical dependencies over time. Specifically, recurrent weight updates are given byΔwij(t)=−λdL(t)dri(t)eijw(t), (19)eijw(t)=f′(Wr(t−1)+Wins(t)+η)(wiieijw(t−1)+rj(t−1)), (20)where λ = 0.001 is the learning rate, and where the second equation defines an “eligibility trace”, which is updated continuously at each synapse, and requires only local information available at the pre- and post-synaptic sites. The resulting learning algorithm has the form of a three-factor plasticity rule (Frémaux and Gerstner, 2016), where a reward signal, dL(t)/dri(t), is fed back and combined with pre-synaptic and post-synaptic Hebbian coactivation to produce the weight update. In our case, we allowed the feedback weights to be given by direct differentiation of the objective function, but for added biological realism, these weights could be learned (Akrout et al., 2019) or random (Lillicrap et al., 2016; Murray, 2019) and still achieve good performance.

The updates for the input weights are analogous:Δwijin(t)=−λdL(t)dri(t)eijin(t), (21)eijin(t)=f′(Wr(t−1)+Wins(t)+η)(wiieijin(t−1)+sj(t)), (22)while the updates for the decoder are given by ΔD1j(t)=−λ(dL(t)/dD1j).

For the sake of computational efficiency, we decreased the number of time steps to 30 steps per trial with a fixed duration (10,000 trials) in these simulations, and increased the signal-to-noise ratio of individual stimuli by taking s∼N(kC,0.1) for k = 0.4. Furthermore, we only modeled the P2 population (N = 60) with all-to-all connectivity, without the sensory sub-network. We also trained our networks with a larger amount of intrinsic noise (σ = 0.6) to verify that our results hold for noisier neurons.

Because these simulations had modified parameters and a different objective function, we had to redefine our decision threshold to achieve qualitatively similar psychometric functions. We set the threshold for decisions in these simulations to 1: we arrived at this value by requiring near-perfect choice accuracy for strong coherence stimuli, and a chronometric function whose mean response times peak at 0 coherence. These features clearly need not be achievable for any coherence if the task has not been well-learned, but we found in practice that a threshold value of 1 gives psychometric and chronometric functions similar to experimental data.

Because our networks had a different number of neurons and a different objective function, we also recalibrated the magnitude of inactivations. The “weak” inactivation targeted 40% of neurons, and the “strong” inactivation targeted 75% of neurons in this case. The method of performing inactivation was identical to the previous section.

Results

Hierarchical recurrent networks approximate linear integration for simple sensory decisions

We begin with the simplest hierarchical recurrent network architecture (Fig. 1a) consisting of a sensory-like population (P1) and an integrating population (P2). Neurons in each population have dense recurrent connections between them, while the sensory population projects sparsely to the integrating population. The P2 population roughly corresponds to the collection of the recurrently connected cortical and subcortical neurons involved in the decision-making process; however, it does not reflect the precise anatomy of brain networks. The stimulus in each trial randomly fluctuates around a mean stimulus strength level, akin to the dynamic random dots stimuli in direction discrimination tasks (Newsome et al., 1989; Roitman and Shadlen, 2002) where motion energy fluctuates around a mean dictated by the coherence of random dots. This fluctuating stimulus input is received by the sensory population P1, and relayed to the integrating population P2. The network is trained such that a linear readout of the activity of population P2 at each moment matches the integral of the stimulus input up to that time. All connections in the network are plastic during training and modified by BPTT (see Methods). After learning, the sensory population shows coherence tuning (see an example response profile in Fig. 1h), while the integration population develops response profiles—ramping activity—similar to those reported in posterior parietal and prefrontal cortex (Fig. 1i and j).

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

A two-stage hierarchical RNN performing linear integration of noisy inputs for a sensory decision-making task. a, Network schematic. b, Network learning throughout time in units of mean-squared error. c, Mean activity of the output unit after training for different stimulus strengths (motion coherence). Model outputs (solid points) increase linearly over time up to a saturating level implemented in the training procedure (see Methods). Lines are fits to the data points over the time range [0 50], measured in arbitrary network time units. d, The slope of changes in model output as a function of stimulus strength. e, Variance of model output as a function of time for different stimulus strengths. The linear increase is expected for integration of noisy inputs over time. The late saturation and decline of the variance, especially for stronger stimuli, is caused by the bound on the integration process. f, Psychometric function of a trained model. Data points show the probability of choosing right for different stimulus strengths ranging from strong leftward motion (negative coh) to strong rightward (positive coh). The gray curve is a logistic fit. Error bars show ±1 s.e.m. g, Chronometric function of a trained model. Data points show the mean time the model output takes to reach the decision bounds (decision time). The gray curve is a Gaussian function fit to the data points. h, Example stimulus-conditioned mean neuron responses from P1. i,j, Example stimulus-conditioned mean neuron responses from P2.

The network is trained within a couple of thousands of trials (Fig. 1b), similar to training schedules for nonhuman primates (Gold et al., 2010). After training, the P2 output closely approximates the integral of the stimulus input over time. Two hallmarks of temporal integration are linear scaling with time (Fig. 1c) and stimulus strength (Fig. 1c and d; Gold and Shadlen, 2007): if a network is receiving a constant-mean stimulus input, the integrated output will be a linear function over time, with slope equal to the mean of the stimulus. The model output represents linear integration for a wide range of inputs and times but saturates for very large values of the integral. Because of the limited dynamic range of the neural responses, the curtailed range of the integral improves the precision of the representation of the integrated evidence for weaker stimuli, where the network precision matters the most for the accuracy of choices. Another hallmark of temporal integration of noisy inputs is linear growth of the variance of the integral over time. The motion energy of the random dots stimulus at each time is an independent sample from a normal distribution, so their sum over time—integration—should have a variance that scales linearly with time (Roitman and Shadlen, 2002; Churchland et al., 2011). Our network output captures this hallmark of optimal integration (Fig. 1e).

Since the network’s integration process stops when the network output reaches a fixed decision bound, the model provides trial-by-trial estimates of the network’s decision time and choice. The time to bound is the decision time, and the sign of the network output at the time of bound crossing determines the choice (right or leftward motion). Our model decision times are in units of the network time steps. The resulting psychometric and chronometric functions of the model show profiles qualitatively similar to experimental results, in particular, faster, more accurate responses for stronger motion stimuli (Fig. 1f and g; Roitman and Shadlen, 2002; Palmer et al., 2005; Kiani et al., 2014a).

Behavioral effects of inactivation grow with the size of the inactivated population

We explored the effects of inactivation on this circuit by selectively silencing a proportion of neurons in the integrating population (Fig. 2a) and analyzing the inactivation effects on the output of the model. For a particular trained network, we measured the change in the psychometric and chronometric functions after perturbation as a means to characterize the effects of inactivation. We found that decision times are strongly sensitive to inactivation. Weak inactivations (5–10% of the population) moderately increase the decision time of the network, and medium and strong perturbations (20% and 30% of the population, respectively) cause a much larger increase (Fig. 2d and g).

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

Inactivation of the integrating circuit reduces sensitivity and increases decision times, with larger effects when a larger portion of neurons are silenced. a, Inactivation schematic. Colored ovals indicate the affected proportion of the P2 network. b,c, Psychometric functions of example networks after training (gray) and after a weak (5–10%, b) or strong (30%, c) inactivation. d, Chronometric function of the same network in panel b following weak inactivation. e, Changes of sensitivity (slope of psychometric function) across 10 trained networks for various inactivation sizes. f, Effects of inactivation on bias (shift of the mid-point of the psychometric function from 0% coherence) across the trained networks. g, Changes of mean decision times across the trained networks. Error bars show s.e.m. Error bars in panels e–f are computed over 10 trained networks. Maximal trial duration set to 500 steps.

The effect of perturbation on choice was more variable and complex. We quantified these effects by extracting measures of the sensitivity and bias for the psychometric functions, and calculating the change in these measures with weak, medium, and strong inactivation. The sensitivity of the psychometric function decreased as more of the neurons were affected, with a corresponding decrease in the average sensitivity with inactivation size (Fig. 2b, c, and e). The magnitude of the bias, however, minimally changed across inactivation levels (Fig. 2f), suggesting that the primary loss of function caused by increasing the perturbation magnitude is a loss of sensitivity. The small change of absolute bias with lesion size in our simulations is statistically insignificant (p > 0.05), and likely caused by underestimation of biases from the flatter psychometric functions associated with larger lesions. Overall, in our basic network architecture, even weak perturbations decreased sensitivity and substantially increased reaction time, with the magnitude of these effects increasing with the magnitude of inactivation.

Functional integrity index: connecting circuit perturbations to behavioral outcomes through changes in activity dynamics

The optimal solution for random dots motion discrimination involves integration along a one-dimensional axis (Wald and Wolfowitz, 1950; Shadlen et al., 2006; Drugowitsch et al., 2012; Khalvati et al., 2021), so the dynamics of the trained network are likely to lie on a low-dimensional manifold (Ganguli and Sompolinsky, 2012). Indeed, simple dimensionality reduction using PCA shows that the circuit dynamics are approximately one dimensional, with the first principal axis explaining about 70% of the neural response variance (Fig. 3a). The low-dimensional structure of the neural activity enables us to mathematically analyze the dynamical features of the trained network that enables it to perform evidence integration.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Integrating network implements a shallow bistable attractor, whose disruption determines the magnitude of the behavioral effects of inactivation. a, Fraction of explained variance as a function of the number of latent dimensions of the network responses to test stimuli. b–d, Schematic of a pitchfork bifurcation. The red dashed line shows the phase portrait for a line attractor implementing optimal evidence integration. For α > 0, the network has two stable attractors (b, arrows indicate sign of Δr), for α = 0, a saddle point (c), and for α < 0, a single stable attractor (d). e, Phase plot for the reduced network before (gray) and after (colors) perturbation. Shaded regions indicate s.e.m. across network realizations. f, Fraction of correct responses as a function of the bifurcation criterion estimated for each network.

In the absence of a stimulus, the network activity can be approximated as one-dimensional population dynamics of the general functional form:Δrt≈(α−rt−12)rt−1, (23)where Δrt is the change in population activity across time and the values of parameter α and the constant of proportionality depend on the trained network weights (see Methods). Different settings of α change the dynamical properties of the system and its ability to solve the evidence integration task. This property is illustrated in the phase plane in Figure 3b, which describes the relationship between Δrt and rt−1. When α is positive, the dynamics exhibit three fixed points (corresponding to values of rt for which Δrt is zero; Fig. 3b). Two of these are attracting (rt=±α), separated by an unstable fixed point at rt = 0: when starting from a positive value of activity r, the network will eventually converge to the positive fixed point, and similarly a negative starting condition will converge to the negative fixed point. Sensory drive to the network will push the dynamics toward one or the other, eventually converging to a final binary decision. This is similar to the phase plane of previous circuit models of evidence integration based on bistable attractor dynamics (Wong and Wang, 2006).

Ideal evidence integration sums all incoming inputs, rt=∑t=0Tst, and follows slightly different dynamics. In the phase plane, ideal integration means that rt changes at every time step asΔrt=st. (24)Hence, the ideal solution for evidence integration involves a line attractor, where no change in activity occurs in the absence of input (Δrt = 0 whenever st = 0; red dashed line in Fig. 3b–d). The trained RNN approximates this solution when α is close to zero (Fig. 3c). Once α becomes negative, the network will start to behave qualitatively differently. In this regime, the network has only one stable fixed point at the origin (Fig. 3d). Changes in network activity caused by new stimuli rapidly relax to this fixed point, causing the network to lose its ability to integrate. Formally, this sensitivity of dynamics to the value of α corresponds to a pitchfork bifurcation (Strogatz, 2018), which is why we refer to α as the bifurcation criterion.

The value of the bifurcation criterion, α, which we derive directly from the RNN activity, captures the key dynamic properties of the model networks and predicts a given network’s ability to perform evidence integration. Indeed, trained networks generally have a small positive α, corresponding to a shallow bistable attractor and close-to-ideal evidence integration (Fig. 3e).

Different forms of causal interventions, such as inactivation, will result in altered population dynamics, with a corresponding change in the bifurcation criterion α. In particular, our inactivation experiments push the network past the bifurcation point. As the magnitude of the inactivation increases, α values become increasingly negative (Fig. 3e and f) and the remaining fixed point at zero (Fig. 3d) leads to forgetting past inputs and correspondingly poor performance (Fig. 3f). Furthermore, the weak perturbations that do not meaningfully alter the bifurcation criterion have minimal impact on the function of the integration circuit. In contrast, other weak perturbations that shift the bifurcation criterion to more negative values have a substantial impact, even though the number of inactivated neurons is similar across all weak perturbations (5%). Therefore, the key predictor of the behavioral impact of inactivation is the bifurcation criterion, not the number of affected neurons. For the integration circuit, the bifurcation criterion constitutes a “functional integrity index” that connects circuit perturbations to behavioral outcomes through changes in activity dynamics.

Overall, these results establish that our network approximates integration within a bounded region of state space via a shallow bistable attractor, and that the loss of function caused by perturbations is due to the loss of this attractor structure. This dynamical systems analysis paints a more refined picture of causal interventions in the random dots motion discrimination task: inactivations that disrupt the computational structure embedded in the network (i.e., the bistable attractor) will produce behavioral impairments, while those that leave the attractor unaffected will not. The functional integrity index captures this relationship and can be used both to quantify and to predict the impact of circuit perturbations.

In distributed architectures, inactivation effects can be variable

Exploring the effects of inactivation in a unitary circuit performing integration reveals a qualitatively similar picture across networks and effect sizes. In a mammalian brain, however, sensory decisions are enabled by a distributed network consisting of multiple interacting circuits (Shadlen and Kiani, 2013; Waskom et al., 2019). Past electrophysiological studies have found neurons that represent integration of sensory evidence in the parietal cortex (Shadlen and Newsome, 2001; Churchland et al., 2008), lateral frontal cortex (Kim and Shadlen, 1999; Mante et al., 2013; Mochol et al., 2021), motor and premotor cortex (Thura and Cisek, 2014; Hanks et al., 2015; Chandrasekaran et al., 2017; Peixoto et al., 2021), the basal ganglia (Ding and Gold, 2013; Yartsev et al., 2018), superior colliculus (Horwitz and Newsome, 1999; Basso et al., 2021), and cerebellum (Deverett et al., 2018). Additionally, the hemispheric organization of the brain creates redundancies. When the subject decides between making a rightward and leftward saccade in the direction discrimination task, increased activity of neurons in the right hemisphere signals both increased evidence for the rightward saccade and decreased evidence for the leftward saccade. Therefore, DVs for both actions can be decoded from the activity in either hemisphere. The distributed nature of the computation across brain regions and hemispheres makes inactivation studies difficult to interpret. This is especially true when inactivation of a subcircuit in the network fails to produce measurable changes of behavior. Other nodes of the network could change their activity in responses to the inactivation, compensating for its effects (Li et al., 2016). Furthermore, there are a variety of more complex scenarios compatible with negative results (Dunn, 2003; Murray and Baxter, 2006; Jonas and Kording, 2017; Yoshihara and Yoshihara, 2018).

Although a detailed exploration of the distributed network that underlies decisions in the brain is beyond the scope of this paper, we take a first step in assessing the effects of architecture on inactivation experiments. In particular, we replace the unitary network structure analyzed above with a parallel architecture, where sensory inputs drive the responses of two non-interacting populations that collectively shape the network output (Fig. 4a). We train this parallel network to perform the same sensory integration task, followed by inactivating all of the neurons in one of the two parallel nodes and assessing the behavioral outcomes of the manipulation across a range of network instances. We find that even in this minimal version of a distributed computation, the effects of inactivation can be quite variable in terms of performance.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Distributing integration across multiple network nodes makes it resilient to disruptions in individual nodes. a, Schematic for a network with two parallel nodes (subcircuits) for integration of inputs. b, Psychometric curve for the parallel node network before and after a strong inactivation that takes out one of the subcircuits. The lines are logistic fits to the choices. Errors bars are s.e.m. c, Approximate phase portrait for the intact node, indicating a shallow bistable attractor. d, Phase portrait for the inactivated node before and after inactivation. e, Proportion correct after inactivation as a function of the bifurcation criterion for the node that was not inactivated. f, Sensitivity for the psychometric function before and after inactivation. Each line shows an instance of the parallel node network with unique starting points and training history. Inactivation affected sensitivity in a minority of instances. g, Bias of the psychometric function before and after inactivation. h, Mean decision time before and after inactivation.

Some networks exhibit minimal changes in the psychometric function due to inactivation (Fig. 4b and e), paired with a marked increase in reaction times (Fig. 4h). This phenomenology tracks back to the dynamical system properties of the underlying network. When examining the one-dimensional approximate phase portrait for each node in the network, we found that both exhibit the shallow bistable attractor dynamics indicative of approximately optimal sensory integration (Fig. 4c and d). The overall network output, which determines the final choice and decision time, is constructed by linearly combining the activities of both integrating populations. This architecture subsumes more specific architectures in which populations with distinct choice preference (e.g., neurons in the two hemispheres) integrate evidence for their respective choices. The complete inactivation of one of the two subcircuits (P2) disrupts its attractor structure (Fig. 4d), but leaves the attractor in P1 intact (since they do not directly interact; Fig. 4c). Therefore, integration can still be performed using the intact subcircuit. Nonetheless, the activity component from P2 is missing; as a result, the output could be weaker and it may take longer for the integrated signal to reach the same decision threshold, leading to slower responses. However, if the only measure of behavior is the choice, one may not notice any change of behavior (Fig. 4).

A systematic investigation across networks with the same distributed architecture, but different trained connections, reveals that this inactivation-resistant solution is not universal: in some networks the choice sensitivity is largely unaffected, while others display a marked loss in sensitivity after inactivation (Fig. 4f). This variability traces back to the attractor structure of the individual solutions found via learning: networks exhibit robustness to inactivation only if the non-lesioned node has an attractor (Fig. 4e). A parallel network architecture solves the task in two ways: either both networks develop attractor dynamics or only one does. Inactivating a network that has only one attractor disrupts performance (quantified by changes in sensitivity and bias), indicating that the inactivated subcircuit is in fact involved in the network computation, while inactivating the subcircuit that does not have attractor structure leaves the output essentially unaffected. However, if both nodes have attractors, “negative results” at the level of behavior need to be interpreted with caution, because intact nodes can enable consistent behavior even without the participation of inactivated nodes. Though we have only shown these effects for a simple network with two parallel nodes, robustness to inactivation is likely to become even more prominent in systems with many more parallel nodes. These results demonstrate that absence of change in choice behavior following inactivation is insufficient to conclude that a certain network node lacks a functional role in the task.

Overall, this analysis reveals a nuanced picture of inactivation: disabling an individual node in a network will produce a loss of function only if no other node in the network is capable of compensating for its loss. Put differently, the outcome of an inactivation experiment informs about interactions within the network but not necessarily the function of the inactivated node in the intact circuit.

Inactivation effects depend on connection sparsity

One way to interpret the results of inactivation in the parallel architecture is that removing the connections between the two subpopulations changes the circuit’s overall robustness. It is thus natural to wonder how the sparsity of cortical connections (Wildenberg et al., 2021) may affect our results. To quantify the effects of sparsity on inactivation effects, we systematically varied the connection probability between neurons in P2 for the networks used in Figures 1⇑–3. Intact network performance was not affected by the sparsification of recurrent interactions, likely because of the low-dimensional nature of the task. Nonetheless, sparsity did affect response to lesions. Increased sparsity led to robustness to inactivation in terms of overall accuracy (Fig. 5a), while reaction times remained sensitive to the manipulation (Fig. 5b). Critically, the functional integrity index remained effective in predicting post-inactivation accuracy (Fig. 5c). Across the tested connectivity levels, connection sparsity had a monotonic effect on inactivation-induced changes in sensitivity (Fig. 5d), with variable and not systematic effects on bias (Fig. 5e). As was the case for the parallel architecture, the effects of inactivation were strongly apparent for reaction times, even when the effects on accuracy were minimal (Fig. 5f).

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

High connection sparsity increases the robustness of networks to inactivation. a, Psychometric function for a network with a recurrent connection probability of 0.4 after training (gray) and after a 20% (medium-size) inactivation (dark blue). b, Chronometric function before and after inactivation for the same network. c, Fraction of correct responses as a function of the functional integrity index, α, estimated for each network. d, Changes of sensitivity (slope of psychometric function) caused by a 20% inactivation across five trained networks for different sparsity levels. e, Effects of inactivation on bias across the trained networks. f, Changes of mean decision times across networks. Error bars show s.e.m. over five networks for lesions and all 20 networks for intact. Maximal trial duration set to 500 steps.

These results demonstrate that increasing independence within the neural circuit can increase robustness to inactivation, be it through explicitly enforced parallelism (Fig. 4) or through sparsity of connections (Fig. 5). Across both conditions, reaction times remain a more sensitive measure of the causal role of the affected neurons on the computation, with the bifurcation criterion providing a valuable metric for the magnitude of these effects.

The effects of inactivation are task specific

Many perturbation experiments include inactivation of a circuit in more than one task to achieve two goals: establish the efficacy of inactivation by showing altered behavior in at least one of the tasks and to establish some form of dissociation in the involvement of the circuit across tasks. For example, inactivation of the lateral intraparietal cortex causes strong biases in a free choice task, where monkeys arbitrarily choose one of two targets that yield similar reward. But the same inactivation causes weaker or transient biases in perceptual tasks in which the choice is based on sensory stimuli (Katz et al., 2016; Zhou and Freedman, 2019b; Jeurissen et al., 2022). Such dissociations are often interpreted as the circuit being more involved in the computations necessary for one task than the other. But this interpretation is not unique, as the operations of a complex, nonlinear dynamical system (e.g., neural circuits) could substantially differ in the presence and absence of a perturbation (see Discussion).

Consider a circuit designed to implement multiple computations, abstracted as different patterns of population activity in different tasks (Yang et al., 2019; Driscoll et al., 2022). Given an inactivation that affects a subset of neurons, are all tasks affected equally? Are some tasks more robust to inactivation than others? Even though in this thought experiment, the unperturbed circuit is known to underlie all the computations, its different activity patterns, and the corresponding dynamical features that implement the computations, and could vary in their degree of sensitivity to different circuit perturbations. To approach this question more directly, we consider two complementary scenarios: one in which similar computations need to be performed on different input streams, requiring some form of input gating or context-dependent processing (Mante et al., 2013), and one in which different computations need to be performed on a single input stream in a context-dependent way.

In the first example, we trained a recurrent network to flexibly integrate one of two inputs, depending on a “context” cue (Fig. 6a). After learning, the network was able to integrate the cued input and ignore the other by embedding two distinct attractors in its dynamics, one for each context (Fig. 6b). As the network operates in context “1”, the psychometric and chronometric functions for the appropriate input qualitatively match those seen for simple integration, while the same functions show sensitivity near 0 to the contextually irrelevant stimulus (Fig. 6c). Similarly, decision times show strong dependence on the contextually relevant input, and no dependence on the irrelevant input (Fig. 6d). This phenomenology holds robustly across different network realizations (Fig. 6f). The vast majority of the trained networks show sensitivity only to the contextually relevant input, and a small minority of the trained networks show mixed sensitivity to both inputs—a suboptimal solution. Furthermore, this context-dependent integration occurs without any evident bias (Fig. 6g). In short, the unperturbed trained networks behave as expected.

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

In a network trained to perform two similar computations (e.g., context-dependent integration of various inputs), effective inactivations influence performance in both tasks. a, Schematic of a network integrating either input 1 (green) or input 2 (pink) based on contextual inputs. b, Network steady-state activities projected onto the first two principal components, calculated across all trials of both contexts. Color indicates the input mean (stimulus strength). Circles indicate context 1, and triangles indicate context 2. c, Psychometric function for responses in context 1 for contextually relevant input 1 (green) and contextually irrelevant input 2 (pink). The network’s output reflects the integral of relevant input and is unaffected by the irrelevant input. d, Same as (c), but for the chronometric function. e, Psychometric function in context 1 pre- (gray) and post-inactivation (orange). f, Distribution of psychometric function sensitivities of the unperturbed network in context 1 for input 1 (green) and input 2 (pink). g, same as (f), but for the absolute value of the psychometric function bias. h, Fraction of correct responses across both contexts as a function of the bifurcation criterion estimated for each network. Gray corresponds to unperturbed networks, yellow to 5% inactivation, and orange to 10% inactivation. i, Scatter plot of percent correct in context 1 versus context 2. j,k, Same as (i), but for psychometric function sensitivity and bias. Results reported use 20 trained networks.

When examining how inactivations affect performance on each task, an interesting picture emerges. Inactivating 10% of the neurons disrupted performance by decreasing the sensitivity of the network and by introducing a bias (Fig. 6e), in contrast to the previous simulations with a non-contextual integration task, where the primary loss-of-function was in the form of reduced sensitivity. Furthermore, different levels of inactivation resulted in correlated loss-of-function across contexts, in terms of bifurcation criterion (Fig. 6h), choice accuracy (Fig. 6i), and psychometric sensitivity (Fig. 6j). Biases increased for many inactivated networks (Fig. 6k), but appeared to not increase jointly in both tasks. These results show that for this specific task (context-specific integration), with inactivations of a random subset of neurons, loss-of-function appears to occur jointly across task contexts. This joint loss of function likely arises from the similarity of computations in the two tasks (integration) and the fact that we did not impose any training constraint to orthogonalize context-specific attractor dynamics. Depending on additional constraints, one could expect distinct results, possibly including independent changes of sensitivity in the two contexts. Different sensory modalities or salient features (Okazawa et al., 2021), explicit regularization (Duncker et al., 2019), or separate readout pathways could each prevent a circuit from learning to share computational structure across tasks.

A second example highlights distinct effects of inactivation across tasks. We trained our simple network architecture (Fig. 7a) to perform different computations in the two tasks, depending on context cues on each trial: integration of the stimulus, where the activity of the output neuron should match the time integral of the input, or a simple replication task, where the activity of the output neuron should match the network input (see Methods). We found that integration was more prone to be affected by inactivation. Across simulated networks with 5% and 10% inactivated neurons, all networks showed degraded sensitivity in the integration task, whereas the replication task showed a high degree of variability. Furthermore, when the replication task was affected by the inactivation, errors tended to be small overall compared to the input variance (normalized error before inactivation: 0.029 ± 0.002 std; after a 5% inactivation: 0.075 ± 0.053 std; and after a 10% inactivation: 0.093 ± 0.059 std). This result demonstrates that the presence of an inactivation effect in one task and its absence in another task do not indicate that the circuit does not contribute to the computations in both tasks.

Figure 7.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 7.

In a network trained to perform two different computations, inactivation can differentially affect performance. a, Schematic for a network that either integrates its input (green) or reproduces the input (pink) depending on a contextual signal. b, Plot showing the network input and output in context 2 against the identity (black dashed line). Gray indicates the pre-inactivation network, orange indicates the network after a 10% inactivation. Inset: cumulative density of errors across time and trials before and after inactivation. c, Psychometric function for responses in context 1 for input 1 before (gray) and after (orange) inactivation. d, Psychometric sensitivity in context 1 against the network output error in context 2. Output error in context 2 is normalized by the variance of the sensory inputs for 10 trained networks. Gray, yellow, and orange indicate unperturbed networks, 5% inactivation, and 10% inactivation, respectively. e, Plot of the bias and variance of the network output in context 2. Error bars indicate ±1 s.e.m. Stars mark the example network shown in panels b–c.

Short periods of re-learning can compensate for inactivation

A key hidden assumption in our simulated experiments in the previous sections is that no additional task-specific learning can happen after the inactivation. This assumption is unlikely to be completely true, as plasticity and reinforcement mechanisms may continue to operate. In fact, past studies show many behaviors are only transiently affected following lesions (Schiller et al., 1979; Newsome and Pare, 1988; Rudolph and Pasternak, 1999; Murray and Baxter, 2006), a clear illustration of the brain’s remarkable capacity for learning through re-organization of its circuits. Similar recoveries may be expected in less severe experimental manipulations in which neurons are transiently inactivated, but the extent of additional learning required for adaptation to occur is less clear. To investigate the capacity of networks to adapt to inactivation and regain their performance through further task-specific learning, we modified our model to allow network connections to continue to change at test time and investigated two biologically relevant variants of transient inactivation: continuous and intermittent.

The first type of inactivation is implemented as a continuous disabling of the involved neurons, as used for all the previous sections. It is intended as an analog of experimental manipulations using pharmacological agents (e.g., muscimol, a GABAA agonist) or DREADDs (Wiegert et al., 2017), which affect target circuits for many minutes to days. In these cases, since the inactivation lasts for the majority of an experimental session (or multiple experimental sessions), circuits could eventually learn to compensate for the perturbation with sufficient additional task experience. What is remarkable in the context of our models is how little additional training is required.

Depending on the extent of manipulation, a few hundreds of trials were sufficient to compensate for the inactivation, much fewer than the number of trials required for the initial training of the network. To describe the trajectory of re-learning across networks, we measured the percentage of correct responses as a function of the number of retraining trials. Figure 8a shows an example run where inactivation of 30% of the integrating population caused the network to initially perform at chance, as poorly as it did before learning. But the circuit robustly reached pre-inactivation performance with fewer than 500 retraining trials (Fig. 8b). This return to the pre-inactivation performance was also mimicked in the underlying bistable attractor, with the bifurcation criterion α returning to positive values on the same timescale (Fig. 8c), indicating that the network has reconstructed its shallow bistable attractor.

Figure 8.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 8.

Perturbed networks can learn to compensate for inactivation but the speed of recovery depends on the timescale of inactivation. a, Post-inactivation training can be much faster than the initial training. The lines show changes in the MSE error of the hierarchical network of Figure 1a during the initial training (gray) and during training after inactivation of 30% of P2 neurons. b, Percent correct as a function of retraining trials. The gray dashed line indicates the average network performance across all stimulus strengths after initial training. c, The stability criterion α as a function of the number of retraining trials. Data points show 10 different networks before inactivation (gray) and after inactivation of 30% of P2 neurons (black). d, Activity projected onto the first principal component for the network prior to inactivation. e, Activity projected onto the first principal component for the network after inactivation of 30% of P2 neurons. f, Activity projected on the first principal component for the network after post-inactivation retraining. g, Fraction of explained variance as a function of the number of latent dimensions of the network activity after initial learning (black), after inactivation (purple), and after retraining (blue dashes). h, Mean retraining time to reach the pre-inactivation accuracy (80±0.5%) as a function of the percentage of unperturbed neurons in the network. i, Retraining slows down considerably if inactivation of neurons occurs on a fast timescale that allows mixing perturbed and unperturbed trials during retraining. In this simulation, inactivation was limited to a random half of trials. Integration error goes down slowly with the number of perturbed trials (orange), taking more trials than that required for the trial initial training (gray). For the unperturbed trials (black), integration error remains close to the levels following the initial training. Error bars indicate ±1 s.e.m. over 10 trained networks.

To directly visualize the impact of inactivation and re-learning, we compared the projection of the network activity onto its first principal component at three periods: the end of training (Fig. 8d), just after inactivation (Fig. 8e), and after re-learning (Fig. 8f)—PCA performed separately for each period. The results show that the integration properties of the network largely collapse (although not completely) immediately after inactivation, but are quickly and fully restored by re-learning. The re-learning speed—time to reach virtually the same accuracy as the pre-inactivation network (80 ± 0.5% across the stimulus set)—is strongly correlated with the extent of inactivation: the larger the inactivated population, the longer it takes for function to be recovered by retraining (Fig. 8h).

Can re-learning happen when inactivation is not continuous throughout the session? The advent of optogenetics allows controlling the activity of neurons with millisecond resolution, leading to new experiments which interleave perturbed trials with unperturbed ones. This improved precision is quite beneficial but does not remove the re-learning challenge mentioned above. The intact versus the inactivated network can be thought of as two distinct dynamic states of the circuit. Repeated inactivation of largely the same group of neurons in a circuit, as in most optogenetic experiments, can provide opportunities for compensation even when inactivation is infrequent. Biological circuits could learn to use the silence of inactivated neurons as a contextual cue to switch behavior, or could redirect the computation in both states to the neurons that are not being directly manipulated.

To model an intermittent inactivation scenario similar to optogenetic manipulation experiments, we inactivated the network on a random subset (50%) of trials, instead of tonically inactivating the neurons. In alignment to general intuitions that adaptation is less likely during intermittent inactivation, we found that it takes the network more trials to re-learn than the continuous inactivation designs (Fig. 8i). When neurons are inactivated on only 50% of trials, it takes our networks longer than their initial training time to compensate. This implies that intermittent inactivation techniques are likely more effective against inactivation-induced adaptation in biological networks, although compensation is still possible.

A possible criticism when interpreting these re-learning results is that the optimization procedure used for learning is not biologically realistic, and that the dynamics of re-learning might look very different when the network connections adapt via local synaptic plasticity rules. To assess the generality of our results, we trained the network using RFLO learning, a biologically plausible alternative to BPTT (Murray, 2019). We also replaced the mean-squared error with a loss based on binary decision outcomes, as a more realistic feedback signal to the network (see Methods). In Figure 9, we repeat the training and inactivation experiments for different inactivation sizes in this new model. We find that the qualitative features of the network solution and the post-inactivation loss of function match those shown in Figure 2. In particular, the network learns bounded integration (Fig. 9a), with a moderate loss-of-function in the psychometric and chronometric functions for a 40% inactivation (Fig. 9b), and near-total loss-of-function for a 75% inactivation (Fig. 9c). As the network continues to learn after inactivation, it is able to restore its mean decision time and performance much more rapidly than the original training time for both the 40% (Fig. 9d and e, top) and 75% (Fig. 9d and e, bottom) inactivations, suggesting that local synaptic plasticity can also support fast recovery of function after partial circuit inactivation.

Figure 9.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 9.

Inactivation and re-learning analysis for a network trained with biologically plausible learning. a, Mean output throughout time, stratified by motion coherence indicates that the network successfully implements bounded integration of its inputs. b, Psychometric function after training (gray), after a 40% inactivation (left, yellow), and a 75% inactivation (right, purple) without re-learning. c, Chronometric function after training (gray), after a 40% inactivation (top, yellow), and a 75% inactivation (bottom, purple) without re-learning. d, Mean decision time for a 40% (yellow) and 75% (purple) inactivation throughout retraining, compared to the asymptotic mean decision time prior to inactivation (gray). e, Percent correct for a 40% inactivation (yellow) and 75% inactivation (purple) throughout retraining, compared to the percent correct through the original training. Error bars indicate ±1 s.e.m. across 1,000 test trials for (b) and (c), across 10 simulated networks for (d) and (e).

Overall, the particular learning algorithm that drives the organization of the circuit does not appear crucial for the observed effects: a biologically plausible learning algorithm (Murray, 2019), with more realistic feedback, learned a similar computational structure, and showed similar inactivation and re-learning effects to brute-force optimization via BPTT. It is likely that any algorithm that closely aligns with gradient descent on a task-specific loss will produce similar effects.

Discussion

A main quest of neuroscience is to identify the neural circuits and computations that underlie cognition and behavior. A common approach to achieve this goal is to first use correlational studies (e.g., electrophysiological recordings) to identify the circuits whose activity co-fluctuates with task variables (e.g., stimulus or choice), and then perturb those circuits one by one as subjects perform the task. Loss of task accuracy following lesions or transient inactivation of a circuit is commonly interpreted as evidence that the circuit is “necessary” for the underlying computations. The converse, however, need not be true. Of course, if the inactivated circuit is not involved, the behavior remains unaffected. But negative results can also arise because of other reasons, which challenge the impulse of embracing the null hypothesis.

The conclusion that negative results in perturbation experiments are not readily interpretable is not new. In fact, it is common knowledge in statistics that inability to reject a null hypothesis (e.g., circuit X is not involved in function Y) is not evidence that the null hypothesis is correct, especially if the true causes of the results remain unexplored and not included in hypothesis testing. In practice, however, there is a growing abundance of publications that interpret negative results of perturbation experiments as lack of involvement of a circuit in a mental function. In many cases, experimenters perceive their negative results as important because they seem to contradict existing theories (e.g., role of posterior parietal cortex in perceptual decisions; Katz et al., 2016). A popular approach is to make negative results palatable or interpretable through establishing a “dissociation” by showing that the same perturbation in a different task or a different circuit yields positive results. Although successful alteration of behavior in a new task is proof that the brain activity has been perturbed by the experimental manipulation, it does not provide additional evidence about the perturbed circuit’s involvement (or lack thereof) in the original task. Aside from statistical and logical concerns about the validity of this remedy (Dunn, 2003; Yoshihara and Yoshihara, 2018), our results reveal key challenges often ignored in the interpretation of negative results: they can emerge from robustness to perturbation due to the architecture of the affected circuit (e.g., sparse connections; Fig. 5) or the bigger network that the circuit belongs to (e.g., redundant circuits; Fig. 4); circuits may continue to learn, and do so within a significantly accelerated timeframe compared to our typical expectations (Fig. 8); and circuits that perform multiple tasks could have developed solutions that are robust to perturbation effects in one task and sensitive in another task, even when the circuit is involved in both (Figs. 6 and 7).

Our results underscore the importance of further investigation following negative results and point to promising avenues for subsequent experiments. We highlight four improvements in experimental approaches that would enhance our understanding of future circuit perturbations. First, we recommend documenting behavior from the initial application of the perturbation protocols, and developing methods to quantify learning opportunities both within and outside the experimental context. For instance, in a decision-making task, we should track changes of choice behavior at single trial resolution from the very first trial after the perturbation. Furthermore, in case of long-lasting perturbations (e.g., muscimol injection), we should continue to track behavior in between experimental sessions. Plastic brain circuits can learn from experience to compensate for perturbations (Fig. 8). There is already experimental evidence that circuits adapt to both transient inactivations (Fetsch et al., 2018; Jeurissen et al., 2022) and permanent lesions (Newsome and Pare, 1988). Also, the brain’s impressive robustness to extensive and gradual dopaminergic neuron loss in Parkinson’s disease is well documented (Zigmond et al., 1990). Careful tracking of behavior in a variety of experiments that target different brain regions and tasks will address which brain circuits adapt to perturbations and what experimental conditions facilitate or thwart adaptation. Second, we recommend quantification of behavior not be limited to choice (a discrete measure) and include more sensitive, analog measures, such as reaction time (Results, In distributed architectures, inactivation effects can be variable). This improved design provides a more accurate understanding, especially in distributed networks, where choice accuracy could be seemingly unaffected by circuit perturbations but reaction times or other more sensitive measures could register changes of behavior. Third, it is crucial to understand the broader network engaged in a task and the parallel pathways that can mediate behavior. Association cortex, for example, includes recurrent and diverse connections, which give rise to many possible parallel pathways. Such complex networks prevent straightforward conclusions from a simple experimental approach that perturbs a single region in the network (Fig. 4). Our results strengthen the case for a comprehensive mapping of anatomical and functional connectivity, which can be incorporated into computational models for predicting perturbation outcomes (Yang et al., 2021; Nejatbakhsh et al., 2023). Without such models, deciphering perturbation results in complex brain networks becomes exceedingly difficult. Fourth, we propose augmenting single region perturbations with simultaneous perturbation of a collection of network nodes, chosen based on network models. Another valuable approach is to simultaneously record unperturbed network nodes to quantify the effects of perturbation on brain-wide response dynamics and identify adaptive mechanisms that could rescue behavior (Li et al., 2016). Overall, we view negative behavioral results in perturbation experiments as a starting point, not an endpoint. With improved experimental designs and computational models, we can accelerate and enhance our comprehension of the neural mechanisms that shape behavior.

Our models in this paper focus on a well-studied perceptual decision-making task: direction discrimination with random dots. Understanding the dynamical mechanisms of computation in our circuits proved necessary for understanding their response to inactivation. Our trained networks approximated sensory integration using a shallow bistable attractor (Wong and Wang, 2006; Strogatz, 2018), whose disruption was closely correlated with loss-of-function in inactivated networks. Furthermore, re-learning reconstructed a bistable attractor within the network. Our analysis shows that what matters in a circuit, irrespective of the status of individual neurons or number of affected neurons, is the integrity of its computational structure. For the perceptual decision-making tasks that require integration of sensory evidence, the bifurcation criterion α makes a good functional integrity index. For tasks that depend on other types of computations, good functional integrity indices could be developed to characterize their underlying computational structure. The need for developing good functional integrity indices for different tasks makes statistical approaches that extract computational structures directly from measured population activity particularly valuable (Zhao and Park, 2016; Nassar et al., 2018; Duncker et al., 2019).

When multiple tasks are used to assess the effects of inactivation, interpreting situations that do not affect one task but do affect another task can be difficult, especially if the two tasks involve different computations. We explored such a situation, where a modeled neural circuit was known to contribute to both tasks, but perturbations had a large effect on one task, and a small effect on the other. Recapitulating previous empirical results (Yang et al., 2019; Driscoll et al., 2022), we found that a single recurrent network could form multiple functional units for solving separate tasks, depending on a contextual signal. Whether lesions to a random subset of neurons affected performance depended strongly on which tasks were being performed. In particular, we found that a network performing integration on two separate sensory streams suffered similar performance impairment across both tasks, but that a network performing two different computations (integration or replication) on a single sensory stream showed unequal impairment across tasks. If the different computations implemented by a circuit do not share computational structure (e.g., one task requires a bistable attractor and one does not), inactivation differentially affects the integrity of the separate computational structures. Consequently, contrasting inactivation effects across tasks may simply reflect dissimilar computational structures instead of differential involvement of the circuit.

Furthermore, we found that even if neural circuits play a direct causal role in a computation, loss-of-function in response to inactivation of a subset of neurons can be transient. Inactivation of neurons in our circuit, in the presence of active learning, allowed networks to compensate. When inactivation was continuous in time, the compensation occurred rapidly, on a timescale much faster than the original training time, likely because inactivation did not completely destroy the network’s previously learned computations. This re-learning is reminiscent of the recovery of behavior following lesions or transient inactivation (Newsome and Pare, 1988; Rudolph and Pasternak, 1999; Murray and Baxter, 2006; Fetsch et al., 2018; Jeurissen et al., 2022). Overall, drawing conclusions about the causal role a circuit plays in a given computation can be difficult without first analyzing the transient changes of behavior immediately after inactivation—a commonly omitted or poorly documented aspect in many studies.

Fast timescale inactivation techniques (e.g., optogenetics) have greatly increased in popularity as they allow precise control of the affected neurons with sub-second resolution. As we show here, brief inactivation periods interspersed with normal activity make it harder for a learning system to identify and adapt to the perturbation. However, compensation can occur even for fast optogenetic perturbations (Fig. 8), as has been observed experimentally (Fetsch et al., 2018). But such compensations require more training than techniques in which the inactivation is more sustained (Fig. 8). This longer adaptation may be a result of destructive interference during re-learning, where the synaptic changes needed to improve performance during perturbation are canceled by those in the absence of perturbation, thus slowing down learning overall. Compensating for intermittent inactivation could take two forms. The first one involves developing two computational structures (e.g., two separate attractors), one used in the presence of the perturbation and the other in the unperturbed condition. Alternatively, the network may converge to a single attractor, modified from its original solution to exclude the inactivated subset of neurons. The empirical prevalence of these compensatory mechanisms is unknown and a prime target for future experiments.

The mechanisms that created resilience to inactivation in this study—network architecture, rapid re-learning, differential sensitivity to inactivation across tasks—generalize beyond evidence integration and perceptual decision-making tasks (Wolff and Ölveczky, 2018; Vaidya et al., 2019). Here we have provided several suggestions for identifying the effects of artificial manipulations in neural circuits, and have provided several cautionary tales based on the choice of a particular perturbation or task. Circuits before and after manipulation are only tenuously related, and drawing conclusions about the function of intact circuits from the effects of inactivation — positive or negative—is quite difficult. Implementing proper controls for these effects and applying careful interpretations of observed experimental results in terms of the system’s computational structure will benefit inactivation studies across a breadth of systems and neuroscience subfields.

Footnotes

  • We thank Michael Shadlen, Jean-Paul Noel, Saleh Esteki, Gouki Okazawa, Michael Waskom, John Sakon, Danique Jeurissen, and S. Shushruth for inspiring discussions and feedback on earlier versions of the manuscript. We thank Owen Marschall for sharing code to implement the Random Feedback Local Online algorithm. This work was supported by the Simons Collaboration on the Global Brain (542997), National Institute of Mental Health (R01MH109180 and R01MH127375), and the Alfred P. Sloan Foundation. Additionally, R.K. was supported by a Pew Scholarship in the Biomedical Sciences and a McKnight Scholar Award. C.S. was supported by National Institute of Mental Health (1R01MH125571-01), the National Science Foundation (Award No. 1922658), and a Google Faculty Research Award.

  • ↵*C.S. and R.K. are co-senior authors of this paper.

  • The authors declare no competing financial interests.

  • Correspondence should be addressed to Cristina Savin at csavin@nyu.edu or Roozbeh Kiani at roozbeh{at}nyu.edu.

SfN exclusive license.

References

  1. ↵
    1. Akrout M,
    2. Wilson C,
    3. Humphreys PC,
    4. Lillicrap T,
    5. Tweed D
    (2019) “Using Weight Mirrors to Improve Feedback Alignment”. Preprint, http://arxiv.org/abs/1904.05391.
  2. ↵
    1. Basso MA,
    2. Bickford ME,
    3. Cang J
    (2021) Unraveling circuits of visual perception and cognition through the superior colliculus. Neuron 109:918–937.
    OpenUrlCrossRefPubMed
  3. ↵
    1. Britten KH,
    2. Shadlen MN,
    3. Newsome WT,
    4. Movshon JA
    (1992) The analysis of visual motion: a comparison of neuronal and psychophysical performance. J Neurosci 12:4745–4765.
    OpenUrlAbstract/FREE Full Text
  4. ↵
    1. Cain N,
    2. Barreiro AK,
    3. Shadlen M,
    4. Shea-Brown E
    (2013) Neural integrators for decision making: a favorable tradeoff between robustness and sensitivity. J Neurophysiol 109:2542–2559.
    OpenUrlCrossRefPubMed
  5. ↵
    1. Chandrasekaran C,
    2. Peixoto D,
    3. Newsome WT,
    4. Shenoy KV
    (2017) Laminar differences in decision-related neural activity in dorsal premotor cortex. Nat Commun 8:1–16.
    OpenUrlCrossRefPubMed
  6. ↵
    1. Churchland AK,
    2. Kiani R,
    3. Chaudhuri R,
    4. Wang X-J,
    5. Pouget A,
    6. Shadlen MN
    (2011) Variance as a signature of neural computations during decision making. Neuron 69:818–831.
    OpenUrlCrossRefPubMed
  7. ↵
    1. Churchland AK,
    2. Kiani R,
    3. Shadlen MN
    (2008) Decision-making with multiple alternatives. Nat Neurosci 11:693–702.
    OpenUrlCrossRefPubMed
  8. ↵
    1. Deverett B,
    2. Koay SA,
    3. Oostland M,
    4. Wang SS
    (2018) Cerebellar involvement in an evidence-accumulation decision-making task. Elife 7:e36781.
    OpenUrl
  9. ↵
    1. Ding L,
    2. Gold JI
    (2013) The basal ganglia’s contributions to perceptual decision making. Neuron 79:640–649.
    OpenUrl
  10. ↵
    1. Driscoll L,
    2. Shenoy K,
    3. Sussillo D
    (2022) “Flexible Multitask Computation in Recurrent Networks Utilizes Shared Dynamical Motifs”. bioRxiv 2022-08.
  11. ↵
    1. Drugowitsch J,
    2. Moreno-Bote R,
    3. Churchland AK,
    4. Shadlen MN,
    5. Pouget A
    (2012) The cost of accumulating evidence in perceptual decision making. J Neurosci 32:3612–3628.
    OpenUrlAbstract/FREE Full Text
  12. ↵
    1. Duncker L,
    2. Bohner G,
    3. Boussard J,
    4. Sahani M
    (2019) Learning interpretable continuous-time models of latent stochastic dynamical systems. In: International conference on machine learning, pp 1726–1734. Long Beach, CA: PMLR.
  13. ↵
    1. Dunn JC
    (2003) The elusive dissociation. Cortex 39:177–179.
    OpenUrlPubMed
  14. ↵
    1. Erlich JC,
    2. Brunton BW,
    3. Duan CA,
    4. Hanks TD,
    5. Brody CD
    (2015) Distinct effects of prefrontal and parietal cortex inactivations on an accumulation of evidence task in the rat. Elife 4:e05457.
    OpenUrlCrossRefPubMed
  15. ↵
    1. Fetsch CR,
    2. Odean NN,
    3. Jeurissen D,
    4. El-Shamayleh Y,
    5. Horwitz GD,
    6. Shadlen MN
    (2018) Focal optogenetic suppression in macaque area MT biases direction discrimination and decision confidence, but only transiently. Elife 7:e36523.
    OpenUrlCrossRefPubMed
  16. ↵
    1. Frémaux N,
    2. Gerstner W
    (2016) Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules. Front Neural Circ 9:85.
    OpenUrl
  17. ↵
    1. Ganguli S,
    2. Sompolinsky H
    (2012) Compressed sensing, sparsity, and dimensionality in neuronal information processing and data analysis. Annu Rev Neurosci 35:485–508.
    OpenUrlCrossRefPubMed
  18. ↵
    1. Gold JI,
    2. Law C-T,
    3. Connolly P,
    4. Bennur S
    (2010) Relationships between the threshold and slope of psychometric and neurometric functions during perceptual learning: implications for neuronal pooling. J Neurophysiol 103:140–154.
    OpenUrlCrossRefPubMed
  19. ↵
    1. Gold JI,
    2. Shadlen MN
    (2007) The neural basis of decision making. Annu Rev Neurosci 30:535–574.
    OpenUrlCrossRefPubMed
  20. ↵
    1. Goldman MS,
    2. Levine JH,
    3. Major G,
    4. Tank DW,
    5. Seung H
    (2003) Robust persistent neural activity in a model integrator with multiple hysteretic dendrites per neuron. Cereb Cortex 13:1185–1195.
    OpenUrlCrossRefPubMed
  21. ↵
    1. Hanks TD,
    2. Ditterich J,
    3. Shadlen MN
    (2006) Microstimulation of macaque area lip affects decision-making in a motion discrimination task. Nat Neurosci 9:682–689.
    OpenUrlCrossRefPubMed
  22. ↵
    1. Hanks TD,
    2. Kopec CD,
    3. Brunton BW,
    4. Duan CA,
    5. Erlich JC,
    6. Brody CD
    (2015) Distinct relationships of parietal and prefrontal cortices to evidence accumulation. Nature 520:220–223.
    OpenUrlCrossRefPubMed
  23. ↵
    1. Horwitz GD,
    2. Newsome WT
    (1999) Separate signals for target selection and movement specification in the superior colliculus. Science 284:1158–1161.
    OpenUrlAbstract/FREE Full Text
  24. ↵
    1. Huk AC,
    2. Shadlen MN
    (2005) Neural activity in macaque parietal cortex reflects temporal integration of visual motion signals during perceptual decision making. J Neurosci 25:10420–10436.
    OpenUrlAbstract/FREE Full Text
  25. ↵
    1. Jeurissen D,
    2. Shushruth S,
    3. El-Shamayleh Y,
    4. Horwitz GD,
    5. Shadlen MN
    (2022) Deficits in decision-making induced by parietal cortex inactivation are compensated at two timescales. Neuron 110:1924–1931.
    OpenUrl
  26. ↵
    1. Jonas E,
    2. Kording KP
    (2017) Could a neuroscientist understand a microprocessor?. PLoS Comput Biol 13:e1005268.
    OpenUrlCrossRefPubMed
  27. ↵
    1. Katz LN,
    2. Yates JL,
    3. Pillow JW,
    4. Huk AC
    (2016) Dissociated functional significance of decision-related activity in the primate dorsal stream. Nature 535:285–288.
    OpenUrlCrossRefPubMed
  28. ↵
    1. Khalvati K,
    2. Kiani R,
    3. Rao RPN
    (2021) Bayesian inference with incomplete knowledge explains perceptual confidence and its deviations from accuracy. Nat Commun 12:5704.
    OpenUrl
  29. ↵
    1. Kiani R,
    2. Corthell L,
    3. Shadlen MN
    (2014a) Choice certainty is informed by both evidence and decision time. Neuron 84:1329–1342.
    OpenUrl
  30. ↵
    1. Kiani R,
    2. Cueva CJ,
    3. Reppas JB,
    4. Newsome WT
    (2014b) Dynamics of neural population responses in prefrontal cortex indicate changes of mind on single trials. Curr Biol 24:1542–1547.
    OpenUrlCrossRefPubMed
  31. ↵
    1. Kim J-N,
    2. Shadlen MN
    (1999) Neural correlates of a decision in the dorsolateral prefrontal cortex of the macaque. Nat Neurosci 2:176–185.
    OpenUrlCrossRefPubMed
  32. ↵
    1. Kingma DP,
    2. Ba J
    (2014) “Adam: A Method for Stochastic Optimization”. Preprint, http://arxiv.org/abs/1412.6980.
  33. ↵
    1. Li N,
    2. Daie K,
    3. Svoboda K,
    4. Druckmann S
    (2016) Robust neuronal dynamics in premotor cortex during motor planning. Nature 532:459–464.
    OpenUrlCrossRefPubMed
  34. ↵
    1. Licata AM,
    2. Kaufman MT,
    3. Raposo D,
    4. Ryan MB,
    5. Sheppard JP,
    6. Churchland AK
    (2017) Posterior parietal cortex guides visual decisions in rats. J Neurosci 37:4954–4966.
    OpenUrlAbstract/FREE Full Text
  35. ↵
    1. Lillicrap TP,
    2. Cownden D,
    3. Tweed DB,
    4. Akerman CJ
    (2016) Random synaptic feedback weights support error backpropagation for deep learning. Nat Commun 7:1–10.
    OpenUrlCrossRefPubMed
  36. ↵
    1. Mante V,
    2. Sussillo D,
    3. Shenoy KV,
    4. Newsome WT
    (2013) Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503:78–84.
    OpenUrlCrossRefPubMed
  37. ↵
    1. Marschall O,
    2. Cho K,
    3. Savin C
    (2020) A unified framework of online learning algorithms for training recurrent neural networks. J Mach Learn Res 21:1–34.
    OpenUrl
  38. ↵
    1. Mochol G,
    2. Kiani R,
    3. Moreno-Bote R
    (2021) Prefrontal cortex represents heuristics that shape choice bias and its integration into future behavior. Curr Biol 31:1234–1244.
    OpenUrlCrossRefPubMed
  39. ↵
    1. Murray JM
    (2019) Local online learning in recurrent networks with random feedback. eLife 8:e43299.
    OpenUrlCrossRef
  40. ↵
    1. Murray EA,
    2. Baxter MG
    (2006) Cognitive neuroscience and nonhuman primates: lesion studies. Methods Mind 43:69.
    OpenUrl
  41. ↵
    1. Nassar J,
    2. Linderman SW,
    3. Bugallo M,
    4. Park IM
    (2018) “Tree-Structured Recurrent Switching Linear Dynamical Systems for Multi-scale Modeling”. Preprint, http://arxiv.org/abs/1811.12386.
  42. ↵
    1. Nejatbakhsh A,
    2. Fumarola F,
    3. Esteki S,
    4. Toyoizumi T,
    5. Kiani R,
    6. Mazzucato L
    (2023) Predicting the effect of micro-stimulation on macaque prefrontal activity based on spontaneous circuit dynamics. Phys Rev Res 5:043211. doi:10.1103/PhysRevResearch.5.043211
    OpenUrlCrossRef
  43. ↵
    1. Newsome WT,
    2. Britten KH,
    3. Movshon JA
    (1989) Neuronal correlates of a perceptual decision. Nature 341:52–54.
    OpenUrlCrossRefPubMed
  44. ↵
    1. Newsome WT,
    2. Pare EB
    (1988) A selective impairment of motion perception following lesions of the middle temporal visual area (MT). J Neurosci 8:2201–2211.
    OpenUrlAbstract/FREE Full Text
  45. ↵
    1. Okazawa G,
    2. Hatch CE,
    3. Mancoo A,
    4. Machens CK,
    5. Kiani R
    (2021) Representational geometry of perceptual decisions in the monkey parietal cortex. Cell 184:3748–3761.
    OpenUrlCrossRefPubMed
  46. ↵
    1. Pagan M,
    2. Tang VD,
    3. Aoi MC,
    4. Pillow JW,
    5. Mante V,
    6. Sussillo D,
    7. Brody CD
    (2022) A new theoretical framework jointly explains behavioral and neural variability across subjects performing flexible decision-making. bioRxiv 2022-11.
  47. ↵
    1. Palmer J,
    2. Huk AC,
    3. Shadlen MN
    (2005) The effect of stimulus strength on the speed and accuracy of a perceptual decision. J Vis 5:1–1.
    OpenUrlAbstract/FREE Full Text
  48. ↵
    1. Paszke A
    , et al. (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8026–8037.
    OpenUrl
  49. ↵
    1. Peixoto D
    , et al. (2021) Decoding and perturbing decision states in real time. Nature 591:604–609.
    OpenUrlCrossRefPubMed
  50. ↵
    1. Ratcliff R,
    2. Hasegawa YT,
    3. Hasegawa RP,
    4. Childers R,
    5. Smith PL,
    6. Segraves MA
    (2011) Inhibition in superior colliculus neurons in a brightness discrimination task? Neural Comput 23:1790–1820.
    OpenUrlCrossRefPubMed
  51. ↵
    1. Rigotti M,
    2. Ben Dayan Rubin DD,
    3. Wang X-J,
    4. Fusi S
    (2010) Internal representation of task rules by recurrent dynamics: the importance of the diversity of neural responses. Front Comput Neurosci 4:24.
    OpenUrlCrossRefPubMed
  52. ↵
    1. Roitman JD,
    2. Shadlen MN
    (2002) Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J Neurosci 22:9475–9489.
    OpenUrlAbstract/FREE Full Text
  53. ↵
    1. Rudolph K,
    2. Pasternak T
    (1999) Transient and permanent deficits in motion perception after lesions of cortical areas MT and MST in the macaque monkey. Cerebral Cortex 9:90–100.
    OpenUrlCrossRefPubMed
  54. ↵
    1. Salzman CD,
    2. Britten KH,
    3. Newsome WT
    (1990) Cortical microstimulation influences perceptual judgements of motion direction. Nature 346:174–177.
    OpenUrlCrossRefPubMed
  55. ↵
    1. Schiller PH,
    2. True SD,
    3. Conway JL
    (1979) Effects of frontal eye field and superior colliculus ablations on eye movements. Science 206:590–592.
    OpenUrlAbstract/FREE Full Text
  56. ↵
    1. Shadlen MN,
    2. Hanks TD,
    3. Churchland AK,
    4. Kiani R,
    5. Yang T
    (2006) The speed and accuracy of a simple perceptual decision: a mathematical primer. In: Bayesian brain: probabilistic approaches to neural coding (Doya K, Ishii S, Pouget A, Rao RPN, eds), pp 209–237. Cambridge, MA: MIT Press.
  57. ↵
    1. Shadlen MN,
    2. Kiani R
    (2013) Decision making as a window on cognition. Neuron 80:791–806.
    OpenUrlCrossRefPubMed
  58. ↵
    1. Shadlen MN,
    2. Newsome WT
    (2001) Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. J Neurophysiol 86:1916–1936.
    OpenUrlCrossRefPubMed
  59. ↵
    1. Strogatz SH
    (2018) Nonlinear dynamics and chaos with student solutions manual: with applications to physics, biology, chemistry, and engineering. Boca Raton, FL: CRC Press.
  60. ↵
    1. Thura D,
    2. Cisek P
    (2014) Deliberation and commitment in the premotor and primary motor cortex during dynamic decision making. Neuron 81:1401–1416.
    OpenUrl
  61. ↵
    1. Vaidya AR,
    2. Pujara MS,
    3. Petrides M,
    4. Murray EA,
    5. Fellows LK
    (2019) Lesion studies in contemporary neuroscience. Trends Cogn Sci 23:653–671.
    OpenUrlCrossRefPubMed
  62. ↵
    1. Wald A,
    2. Wolfowitz J
    (1950) Bayes solutions of sequential decision problems. Ann Math Stat 21:82–99.
    OpenUrl
  63. ↵
    1. Waskom ML,
    2. Okazawa G,
    3. Kiani R
    (2019) Designing and interpreting psychophysical investigations of cognition. Neuron 104:100–112.
    OpenUrlCrossRefPubMed
  64. ↵
    1. Werbos PJ
    (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78:1550–1560.
    OpenUrl
  65. ↵
    1. Wiegert JS,
    2. Mahn M,
    3. Prigge M,
    4. Printz Y,
    5. Yizhar O
    (2017) Silencing neurons: tools, applications, and experimental constraints. Neuron 95:504–529.
    OpenUrlCrossRefPubMed
  66. ↵
    1. Wildenberg GA,
    2. Rosen MR,
    3. Lundell J,
    4. Paukner D,
    5. Freedman DJ,
    6. Kasthuri N
    (2021) Primate neuronal connections are sparse in cortex as compared to mouse. Cell Rep 36:109709.
    OpenUrlCrossRef
  67. ↵
    1. Wolff SB,
    2. Ölveczky BP
    (2018) The promise and perils of causal circuit manipulations. Curr Opin Neurobiol 49:84–94.
    OpenUrlCrossRefPubMed
  68. ↵
    1. Wong K-F,
    2. Wang X-J
    (2006) A recurrent network mechanism of time integration in perceptual decisions. J Neurosci 26:1314–1328.
    OpenUrlAbstract/FREE Full Text
  69. ↵
    1. Yang GR,
    2. Joglekar MR,
    3. Song HF,
    4. Newsome WT,
    5. Wang X-J
    (2019) Task representations in neural networks trained to perform many cognitive tasks. Nat Neurosci 22:297–306.
    OpenUrl
  70. ↵
    1. Yang Y,
    2. Qiao S,
    3. Sani OG,
    4. Sedillo JI,
    5. Ferrentino B,
    6. Pesaran B,
    7. Shanechi MM
    (2021) Modelling and prediction of the dynamic responses of large-scale brain networks during direct electrical stimulation. Nat Biomed Eng 5:324–345.
    OpenUrl
  71. ↵
    1. Yartsev MM,
    2. Hanks TD,
    3. Yoon AM,
    4. Brody CD
    (2018) Causal contribution and dynamical encoding in the striatum during evidence accumulation. Elife 7:e34929.
    OpenUrlCrossRefPubMed
  72. ↵
    1. Yoshihara M,
    2. Yoshihara M
    (2018) Necessary and sufficient’in biology is not necessarily necessary–confusions and erroneous conclusions resulting from misapplied logic in the field of biology, especially neuroscience. J Neurogenet 32:53–64.
    OpenUrlCrossRefPubMed
  73. ↵
    1. Zhao Y,
    2. Park IM
    (2016) “Interpretable Nonlinear Dynamic Modeling of Neural Trajectories”. Preprint, http://arxiv.org/abs/1608.06546.
  74. ↵
    1. Zhou Y,
    2. Freedman DJ
    (2019a) Posterior parietal cortex plays a causal role in perceptual and categorical decisions. Science 365:180–185.
    OpenUrlAbstract/FREE Full Text
  75. ↵
    1. Zhou Y,
    2. Freedman DJ
    (2019b) Posterior parietal cortex plays a causal role in perceptual and categorical decisions. Science 365:180–185.
    OpenUrlAbstract/FREE Full Text
  76. ↵
    1. Zigmond MJ,
    2. Abercrombie ED,
    3. Berger TW,
    4. Grace AA,
    5. Stricker EM
    (1990) Compensations after lesions of central dopaminergic neurons: some clinical and basic implications. Trends Neurosci 13:290–296.
    OpenUrlCrossRefPubMed
Back to top

In this issue

The Journal of Neuroscience: 44 (16)
Journal of Neuroscience
Vol. 44, Issue 16
17 Apr 2024
  • Table of Contents
  • About the Cover
  • Index by author
  • Masthead (PDF)
Email

Thank you for sharing this Journal of Neuroscience article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
Recurrent Neural Circuits Overcome Partial Inactivation by Compensation and Re-learning
(Your Name) has forwarded a page to you from Journal of Neuroscience
(Your Name) thought you would be interested in this article in Journal of Neuroscience.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
Recurrent Neural Circuits Overcome Partial Inactivation by Compensation and Re-learning
Colin Bredenberg, Cristina Savin, Roozbeh Kiani
Journal of Neuroscience 17 April 2024, 44 (16) e1635232024; DOI: 10.1523/JNEUROSCI.1635-23.2024

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Request Permissions
Share
Recurrent Neural Circuits Overcome Partial Inactivation by Compensation and Re-learning
Colin Bredenberg, Cristina Savin, Roozbeh Kiani
Journal of Neuroscience 17 April 2024, 44 (16) e1635232024; DOI: 10.1523/JNEUROSCI.1635-23.2024
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Methods
    • Results
    • Discussion
    • Footnotes
    • References
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • circuit perturbation
  • compensation
  • functional integrity
  • re-learning
  • recurrent neural networks
  • redundant architecture

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Research Articles

  • Chemogenetic disruption of monkey perirhinal neurons projecting to rostromedial caudate impairs associative learning
  • Specializations in amygdalar and hippocampal innervation of the primate nucleus accumbens shell
  • “What” and “When” Predictions Jointly Modulate Speech Processing
Show more Research Articles

Systems/Circuits

  • Chemogenetic disruption of monkey perirhinal neurons projecting to rostromedial caudate impairs associative learning
  • Specializations in amygdalar and hippocampal innervation of the primate nucleus accumbens shell
  • Theta Oscillons in Behaving Rats
Show more Systems/Circuits
  • Home
  • Alerts
  • Follow SFN on BlueSky
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Issue Archive
  • Collections

Information

  • For Authors
  • For Advertisers
  • For the Media
  • For Subscribers

About

  • About the Journal
  • Editorial Board
  • Privacy Notice
  • Contact
  • Accessibility
(JNeurosci logo)
(SfN logo)

Copyright © 2025 by the Society for Neuroscience.
JNeurosci Online ISSN: 1529-2401

The ideas and opinions expressed in JNeurosci do not necessarily reflect those of SfN or the JNeurosci Editorial Board. Publication of an advertisement or other product mention in JNeurosci should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in JNeurosci.