Abstract
Dopamine release in the nucleus accumbens core (NAcC) is generally considered to be a proxy for phasic firing of the ventral tegmental area dopamine (VTADA) neurons. Thus, dopamine release in NAcC is hypothesized to reflect a unitary role in reward prediction error signaling. However, recent studies reveal more diverse roles of dopamine neurons, which support an emerging idea that dopamine regulates learning differently in distinct circuits. To understand whether the NAcC might regulate a unique component of learning, we recorded dopamine release in NAcC while male rats performed a backward conditioning task where a reward is followed by a neutral cue. We used this task because we can delineate different components of learning, which include sensory-specific inhibitory and general excitatory components. Furthermore, we have shown that VTADA neurons are necessary for both the specific and general components of backward associations. Here, we found that dopamine release in NAcC increased to the reward across learning while reducing to the cue that followed as it became more expected. This mirrors the dopamine prediction error signal seen during forward conditioning and cannot be accounted for temporal-difference reinforcement learning. Subsequent tests allowed us to dissociate these learning components and revealed that dopamine release in NAcC reflects the general excitatory component of backward associations, but not their sensory-specific component. These results emphasize the importance of examining distinct functions of different dopamine projections in reinforcement learning.
Significance Statement
Dopamine regulates reinforcement learning. While it was previously believed that this system contributed to simple value assignment to reward cues, we now know dopamine plays increasingly diverse roles in reinforcement learning. How these diverse roles are achieved in distinct circuits is not fully understood. By using behavioral tasks that examine distinctive components of learning separately, we reveal that nucleus accumbens core (NAcC) dopamine release reflects a unique component of learning. Thus, the present study supports a distinct role of NAcC in reinforcement learning, consistent with the idea that different dopamine systems serve different learning functions. Examining the roles of different dopamine projections is important to identify neuronal mechanisms underlying the reinforcement-learning deficits observed in schizophrenia and drug addiction.
Introduction
Dopamine neurons in substantia nigra exhibit phasic activity to unexpected rewards, and this activity backpropagates to the sensory cue that comes to predict these rewards across learning (Mirenowicz and Schultz, 1994). This observation led to the proposition that dopamine encodes the difference between the scalar value of actual and expected rewards, acting to stamp in cached value to events preceding rewards (i.e., reward prediction error; Montague et al., 1996; Schultz et al., 1997). This same pattern of firing is also seen in the ventral tegmental area dopamine (VTADA) neurons (Hollerman and Schultz, 1998; Pan et al., 2005; Cohen et al., 2012; Eshel et al., 2016) and the main projection site of VTADA neurons, which is the nucleus accumbens core (NAcC; Day et al., 2007; Flagel et al., 2011; A. S. Hart et al., 2014; Menegas et al., 2017). From those studies, it was suggested that VTADA neurons and its afferents serve the unitary role of computing reward prediction errors (Menegas et al., 2017; Watabe-Uchida et al., 2017).
However, recently it has been revealed that dopamine plays many diverse roles in reinforcement learning, which is no longer compatible with a unitary theory of dopamine function (Hamid et al., 2016; Parker et al., 2016; Sadacca et al., 2016; Syed et al., 2016; Sharpe et al., 2017a, 2020; Langdon et al., 2018; Lee et al., 2019; Hughes et al., 2020; Maes et al., 2020; Akam and Walton, 2021; Lerner et al., 2021; Krausz et al., 2023; Takahashi et al., 2023). Dopamine neurons help us to form cognitive models of our environment (Sharpe et al., 2017a, 2020; Maes et al., 2020), which allows us to make inferences about the sensory features of upcoming rewards (Takahashi et al., 2017; Keiflin et al., 2019) or mentally simulate possible actions to approach or obtain rewards (Daw et al., 2011; Groman et al., 2019). Indeed, dopamine neurons encode many different aspects of the environment that are value and reward independent (Engelhard et al., 2019). What's more, the prediction error itself contains information about events that led to the error, demonstrating that this signal contains the “what” to learn as well as the “when” to learn it (Stalnaker et al., 2019). This work provides strong evidence that the dopamine prediction error acts as a universal teaching signal to facilitate many different forms of learning.
Now, the question becomes how dopamine instantiates the many different forms of learning that it is capable of supporting. To begin to answer this question, we utilized a backward conditioning task, where a reward reliably precedes presentation of a sensory cue. Backward conditioning inverts the traditional task used to measure dopamine release, which typically consists of forward conditioning where a sensory cue precedes reward. As such, our backward conditioning task allows us to dissociate reinforcement-learning signals from value backpropagation. Indeed, the added benefit of this task is that it produces several different learned associations, including sensory-specific inhibitory and general excitatory associations (Delamater et al., 2003; Laurent and Balleine, 2015; Seitz et al., 2022). Thus, this task allows us to reveal the unique subcomponent of learning. Critically, we have shown that VTADA neurons are necessary for both these forms of learning that develop during backward conditioning (Seitz et al., 2022). Here, as an entry point to investigating how different dopamine circuits contribute to these different forms of learning, we examined dopamine release in NAcC during the backward conditioning task. Although dopamine release in NAcC is traditionally viewed as a proxy for VTADA neuronal activity, recent data have revealed the heterogeneity of dopamine neurons and dopamine release downstream (Lammel et al., 2008; Beier et al., 2015; Parker et al., 2016; Morales and Margolis, 2017; Saunders et al., 2018; Cox and Witten, 2019; Mohebi et al., 2019). Given the emerging view that distinct dopamine circuits may serve differential functions, we reasoned that NAcC may encode a unique subcomponent of reinforcement learning supported by VTADA neurons, which we could reveal by examining dopamine release in NAcC during backward conditioning.
Materials and Methods
Subjects
In this study, nine wild-type male Long–Evans rats were used. While all our prior studies investigating dopamine function have used both male and female rats (Sharpe et al., 2017a, 2020; Seitz et al., 2022), we have recently had issues with the female rats being underweight in our colony that are sourced from Charles River. This makes it hard for our female rats to take the weight of the headcaps and cords required for chronic recording experiments. Thus, we opted to use male rats only for this particular study. Male rats were housed on a 12 h light/dark cycle, and all behavioral training and experiments were performed during the light cycle. Rats were maintained on food restriction during the behavioral components of the experiment. Feeding was controlled to maintain 85% of the prerestriction body weight. All experimental procedures were conducted in accordance with the University of California, Los Angeles (UCLA) Institutional Animal Care and Use Committee and the University of Sydney Animal Ethics Committee.
Surgeries
Surgery procedures have been described previously (Sharpe et al., 2017a). Rats were bilaterally infused with adeno-associated virus (AAV) carrying GRABDA (AAV9-hSyn- GRABDA2m, a titer of 2.4 × 1013) into NAcC (AP, +1.3; ML, +1.3; DV, −7.2 and −6.4; 1.0 µl per hemisphere). Optic fibers were also bilaterally implanted to target NAcC (AP, +1.3; ML, +1.3; DV, −6.8). Rats were left for 4–6 weeks to recover from the surgery and to give sufficient time for the virus to transduce into NAcC neurons.
Apparatus
Behavioral sessions were conducted in identical sound-attenuated behavioral chambers from Med Associates. The chambers have a food delivery port. The delivery port is equipped with a head entry detector. The food port is connected to a food dispenser that delivers either sucrose or grain pellets. The port also has a stainless tube which is connected to a syringe filled with 15% maltodextrin solution. A sound generator and clicker are positioned on the wall opposite the food port. A 3 W, 24 V house light was mounted on the top of the wall opposite the food port, and two white flashing lights were located diagonally above the food port.
Forward conditioning
The order of forward conditioning and backward conditioning was counterbalanced such that half the rats received forward conditioning first and the other half received backward conditioning first. Rats received eight sessions of cue–reward forward conditioning. Two 10 s auditory cues (click or white noise) were presented in a pseudorandom fashion with a variable intertrial interval (ITI) ranging from 150 to 230 s (190 s on average). One of the auditory cues (CS+) was followed by reward (two 40 mg sucrose pellets; TestDiet), and the other did not predict reward (CS−). Cue–reward associations were fully counterbalanced across rats. Each session comprised five presentations of each CS. For the behavioral analysis, we calculated the difference in the number of port entries during the 10 s preCS baseline period subtracted from the 10 s CS period. We recorded dopamine release using in vivo fiber photometry during Sessions 1, 4, and 8, which allows us to reduce problems with photobleaching (Sias et al., 2024). Although all nine rats received forward conditioning, data from two rats were excluded from forward conditioning as viable recordings were not able to be collected from all three sessions.
Backward conditioning
Initial backward conditioning training
Rats received eight sessions of reward–cue backward conditioning, in line with previous published demonstrations (Laurent and Balleine, 2015; Seitz et al., 2022). Rats learned two backward associations, one where maltodextrin preceded an auditory cue (e.g., tone or siren) and another where a grain pellet preceded another cue (e.g., siren or tone). Reward–cue associations were fully counterbalanced across rats. Each backward association was presented in a pseudorandom fashion with a variable ITI ranging from 50 to 160 s (90 s on average). The auditory cues had a variable length ranging from 2 to 58 s (30 s on average) and were presented 10 s after the rats entered the food port to retrieve the rewards. Each session comprised 12 presentations of each backward association. For behavioral analysis, we took the number of port entries made during the variable–length backward cue presentations and calculated the rate of port entry per minute for each trial. We recorded dopamine release using in vivo fiber photometry on Sessions 1, 4, and 8.
Forward conditioning training with a visual cue
Following the eight sessions of backward conditioning, rats received five sessions of forward conditioning with a novel visual cue. A 30 s flashing light was presented and followed by a grain pellet, which was identical to that used for one of the backward associations (and distinct from the forward conditioning experiments). The ITI was variable ranging from 150 to 230 s (190 s on average). Each session comprised 15 presentations of the cue–reward contingency. The number of port entries during the 30 s flashing light was used for analyses.
Test of specificity
Following forward conditioning with the visual cue, rats received a single test of specificity of backward association. We gave rats a test with three trial types: (1) visual cue alone, (2) a compound with the visual cue and the backward cue associated with the same reward (i.e., congruent compound), or (3) a compound with the visual cue and the backward cue associated with the different reward (i.e., incongruent compound). Each trial type comprised a 30 s cue or compound presentation, which was delivered in a pseudorandom fashion with a variable ITI ranging from 150 to 230 s (190 s on average). The session comprised six presentations of each trial type. We used the number of port entries during the first 10 s of the cue presentations. We recorded dopamine release using in vivo fiber photometry throughout the test of specificity. Data from two rats were excluded from the analysis as we were not able to collect behavioral measurements across all sessions.
Omission test
A subset of rats (n = 5 rats) received an omission test that included standard backward trials (i.e., normal trials) and trials where the backward cue was delivered unexpectedly without its preceding reward (i.e., omission trials). The session started with four normal trials. After that, normal or reversal trials were presented in a pseudorandom fashion. The omission test comprised 16 normal trials and 12 omission trials. Each backward association was presented in a pseudorandom fashion with a variable ITI ranging from 50 to 160 s (90 s on average). The auditory cues had a variable length ranging from 2 to 58 s (30 s on average). We recorded dopamine release using in vivo fiber photometry throughout the omission test.
Reversal test
The same subset of rats (n = 5 rats) received a backward conditioning session that included standard backward conditioning trials (i.e., normal trials) and trials where the contingency between the reward and cue was reversed (i.e., reversal trials). The session started with four normal trials. After this, normal and reversal trials were presented in a pseudorandom fashion. The session comprised 16 normal trials and 12 reversal trials. Each backward association was presented in a pseudorandom fashion with a variable ITI ranging from 50 to 160 s (90 s on average). The auditory cues had a variable length ranging from 2 to 58 s (30 s on average). We recorded dopamine release using in vivo fiber photometry throughout the reversal test.
Recording dopamine release with in vivo fiber photometry
Fiber photometry recordings were used to monitor dopamine release in NAcC during the behavioral sessions. We collected the fluorescent signal from an isosbestic control channel (415 nm) and a GRABDA signal channel (470 nm) using a commercial fiber photometry system (Neurophotometrics). All the recordings were conducted with ∼40–100 µW at the tip of patch cord. An optical patch cord (fiber core diameter, 200 µm) was attached to the optic fibers implanted onto the rats to excite GRABDA and collect its fluorescent signal. Signals were taken at a 20 Hz sampling rate and interleaved between control and GRABDA signal channels. A camera was positioned in front of our Med Associates interface to collect the time stamps of behavioral events, which was relayed through a custom Bonsai (Lopes et al., 2015) workflow to align behavioral events to fluorescent signal. Recordings were collected unilaterally from the hemisphere with the most robust fluorescent signal, and this was kept consistent across recording sessions.
Histology
The rats were anesthetized with carbon dioxide and underwent transcardial perfusion with 1× phosphate buffer saline (PBS) followed by 4% paraformaldehyde in PBS. The brain samples were left in the fixative solution overnight at 4° and transferred into 30% sucrose PBS solution. The samples were sliced by 20 µm thickness using a cryostat (Leica Biosystems). The brain slices were mounted onto glass slides and coverslipped with ProLong Gold mounting medium with DAPI (Thermo Fisher Scientific). The image was acquired through a confocal microscope (Carl Zeiss).
Fiber photometry data analysis
Fiber photometry data were analyzed using a custom-written MATLAB code. We first removed the first 60 s of the data (prior to the onset of the first event) as this consistently had high variability in the signal and would disrupt baseline correction. Sudden drops of signal that were likely due to loosened connection between the patch cord and optic fiber were also removed. The control signal was fit with linear regression and scaled to the GRABDA signal. We then calculated dF / F value as follows:
LME model analysis
To examine the effect of learning or different trial types on the behavioral response and dopamine signals, we fit the data to an LME model (Yu et al., 2022). This model includes independent variables with fixed effects that come from different training sessions or trial types and random effects mainly from the variability across subjects. This allows us to examine whether each independent variable significantly affects the modulation of the response variable (i.e., behavioral and dopamine response) while considering the individual variability. Food-port entries (or rate) and AUCs for each trial were used as response variables to represent behavioral and dopamine responses, respectively. The LME model used for each dataset is listed in Table 1. Within the LME models of behavioral responding, the variable Session was a continuous variable indicating the session corresponding to each data point. On the other hand, in the LME models for the AUCs from the forward and backward conditioning sessions, RecSession was a three-level categorical variable (Session 1, 4, and 8) with coding Session 1 used as the reference. The side from which we recorded was added as the variable Hemisphere in the models for the AUCs during the CS+ or CS− in forward conditioning. Hemisphere was a two-level categorical variable (left or right) with the left hemisphere coded as the reference. Latencies to retrieve reward from the food port after food delivery was added as Latency in the model for AUCs in backward conditioning. The variable CuePeriod was added in the LME models to compare the dopamine signal observed before and after the onset of the backward cues. CuePeriod was a two-level categorical variable (before or after the cue onset). The period before the cue onset was coded as the reference point. The variable CueType in the test of specificity was a three-level categorical variable (visual cue, congruent, and incongruent) where the congruent cue was used as the reference. In the models for forward conditioning, backward conditioning, and the test of specificity, we added the explanatory variable Fibre indicating the placement of optic fiber of each rat to the LME model. Fibre is two-level categorical variable (ventral or dorsal) with the ventrally placed fiber coded as the reference. The variable TrialType in reversal and omission test was a two-level categorical variable (normal and reversal/omission) where the normal trial was coded as the reference point. To compare the AUCs during the 2 s before and after port entry in the omission test, we added the variable EntryPeriod in the LME models. EntryPeriod was a two-level categorical variable (before or after the port entry), and the period before the port entry was coded as the reference point. We implemented the LME model analyses and the statistical tests using the fitlme function in MATLAB.
Summary of LME models
Waveform analysis
To examine if the presentation of the backward cues induced a dopamine response above baseline, we conducted a waveform analysis on the z-scored dopamine signals around the onset of the reward delivery and the backward cues (Fig. 3B,C). Details of this waveform analysis have been described (Jean-Richard-Dit-Bressel et al., 2020). This analysis allows us to examine the time points at which the dopamine signal is significantly greater than zero (i.e., the baseline). Bootstrapped 95% confidence intervals are calculated for each time point. Dopamine transients are considered significantly different from the baseline when the confidence intervals does not include zero for >0.2 s.
Results
Dopamine release in NAcC during forward conditioning is consistent with prediction error coding
To demonstrate effective recording of dopamine release in NAcC, we first recorded dopamine release during forward conditioning. This would allow us to replicate existing studies, which have demonstrated the presence of a prediction error in NAcC (Day et al., 2007; Flagel et al., 2011; A. S. Hart et al., 2014; Menegas et al., 2017). To do this, we bilaterally infused wild-type rats with AAV carrying GRABDA2m (AAV9-hSyn-GRABDA2m; n = 7 rats; 1 µl per hemisphere) into the NAcC (AP, +1.3 mm; ML, ±1.3 mm; DV, −6.4 and −7.2 mm) and implanted optic fibers targeting the location site (Fig. 1; AP, +1.3 mm; ML, ±1.3 mm; DV, −6.8 mm). In two rats the fiber placement was slightly dorsal to NAcC (Fig. 1C). However, we confirmed the expression of GRABDA was restricted to NAcC in all the rats, which would mean that fluorescence detection would be the result of dopamine release in NAcC. Furthermore, we included fiber placement as a variable in our analyses (Table 1), and this did not influence the results of our LME models (Extended Data Tables 2-1, 3-1, 4-1, 4-2). Therefore, the data from the rats whose fibers were dorsally implanted is also included in all analyses. Four to six weeks after surgery, rats began training on our behavioral tasks. We bilaterally implanted the optic fibers, but the recording was performed unilaterally from the hemisphere that showed the most robust changes in fluorescence during signal check sessions prior to behavioral training. The side from which we recorded was kept constant across recording sessions. ∼15% of rats were recorded from the right hemisphere. During forward conditioning, rats received eight behavioral sessions where two 10 s auditory cues were presented five times each with a variable ITI ranging from 150 to 230 s (190 s average). One of the auditory cues (CS+) was followed by reward (two 40 mg sucrose pellets; TestDiet), and the other did not predict reward (CS−). To minimize any photobleaching that may occur during the experiment, we recorded dopamine release on Sessions 1, 4, and 8. The number of food-port entries during the CS+ increased across the conditioning. This difference in behavioral responding was confirmed statistically using LME analyses. There was a significant increase in the number of food-port entries during the CS+ across sessions [Fig. 2A; # Port entry (CS+, CS − preCS) ∼ Session + Fibre + 1 + (1|RatID); session, t = 5.99; p < 0.001], while the response to CS− did not [Fig. 2A; # Port entry (CS−, CS − preCS) ∼ Session + Fibre + 1 + (1|RatID); session, t = −0.64; p = 0.52]. This learning was also reflected in the dopamine response to the cues that emerged across training. Early in training, a phasic dopamine response was seen at the onset of the CS+ and CS−. This phasic response increased across sessions in response to CS+ presentation, but not CS− presentation. This was confirmed by LME analysis of the AUCs, which quantifies the changes in dopamine release from the baseline and examines whether these changes occur systematically across sessions. These analyses showed that dopamine release to the CS+ was significantly greater on Sessions 4 and 8 compared with that in Session 1 [Fig. 2B; AUCs (10 s CS+) ∼ RecSession + Fibre + 1 + (1|RatID); Session 1 vs Session 4, t = 5.29; p < 0.001; Session 1 vs Session 8, t = 2.96; p = 0.0039]. In contrast, the AUCs for the CS− did not change across conditioning [Fig. 2C; AUCs (10 s CS−) ∼ RecSession + Fibre + 1 + (1|RatID); Session 1 vs Session 4, t = −1.99; p = 0.05003; Session 1 vs Session 8, t = −0.80; p = 0.42]. The increase in dopamine release to the CS+ across training contrasted with the decrease in release seen to the delivery of the reward. Specifically, early in training, there was a strong dopamine response after the delivery of reward. This dopamine response to reward decreased across conditioning [Fig. 2D; AUCs (2 s reward) ∼ RecSession + Fibre + 1 + (1|RatID); Session 1 vs Session 4, t = −1.34; p = 0.18; Session 1 vs Session 8, t = −2.52; p = 0.014]. Since we recorded dopamine release from different hemispheres in different rats, we examined whether the hemisphere could explain some of the variability in dopamine release. To do this, we examined dopamine release in Session 8 to the CS and CS− when rats would understand the contingencies well. Here, we found that the hemisphere did not predict dopamine release (Extended Data Table 2-2). Overall, these data demonstrate that we were able to capture the dynamic shift in the dopamine signal from reward delivery to a reward-predictive cue, which is consistent with prediction error encoding seen across many studies during forward conditioning (Day et al., 2007; Flagel et al., 2011; A. S. Hart et al., 2014; Menegas et al., 2017).
Histological representation of virus expression and fiber placement. A, Top, We recorded dopamine release in NAcC using fiber photometry of a genetically encoded dopamine sensor GRABDA (Sun et al., 2018). An AAV carrying GRABDA was infused into NAcC, and optic probes were implanted at the injection site. Bottom, Representative image of expression of GRABDA in NAcC. B, Representation of AAV expression in NAcC, lighter shades of green indicate the maximal extent of AAV expression seen in the cohort, and the darker shades indicate minimal expression. C, Approximate location of individual fiber tips for each animal.
Dopamine release in NAcC during forward conditioning reflects prediction error coding. A, Task design and behavioral responding. Left, Experimental design. Rats received two 10 s auditory cues, where one predicted reward (CS+) and the other did not (CS−). Right, The number of food-port entries during the CS presentations relative to the baseline. Responding to the CS+, but not the CS−, increased across conditioning sessions. B, Dopamine release to the CS+. Left, Z-scores of dopamine release across conditioning. Right, AUCs quantifying the z-score across the 10 s cue represented to the left. Dopamine release increased to the CS+ across conditioning. C, Dopamine release to the CS−. Left, Z-scores of dopamine release across conditioning. Right, AUCs quantifying the z-scored traces across the 10 s cue represented to the left. The dopamine response to the CS− did not change across conditioning. D, Dopamine release to the rewards. Left, Z-scores of dopamine release across conditioning. Right, AUCs quantifying the z-scored traces across the 10 s represented to the left. The dopamine response to the reward delivery decreased across conditioning. *p < 0.05; **p < 0.01; ***p < 0.001. Mean ± SEM. See Extended Data Tables 2-1 and 2-2 for more details.
Table 2-1
Liner mixed effect model for forward conditioning. Note: SE indicates standard error of estimated fixed-term effects. 95% CIs indicate 95% confidence intervals of the estimates. Download Table 2-1, DOCX file.
Table 2-2
Liner mixed effect models examining lateralized dopamine activity. Note: SE indicates standard error of estimated fixed-term effects. 95% CIs indicate 95% confidence intervals of the estimates. Download Table 2-2, DOCX file.
Dopamine release in NAcC increases to rewards and decreases to backward cues across backward conditioning
Our prior research has demonstrated that VTADA neurons are necessary for both the general excitatory and specific inhibitory components of learning that develops through backward conditioning (Seitz et al., 2022). Here, we examined the learning content of dopamine release in the NAcC during backward conditioning. Rats received two backward associations, where maltodextrin preceded one auditory cue (e.g., tone or siren), and another where a grain pellet preceded another auditory cue (e.g., siren or tone). The auditory cues had a variable length ranging from 2 to 58 s (30 s average) and were presented 10 s after the rats entered the food port to retrieve the rewards, mirroring our previous study (Seitz et al., 2022). We recorded dopamine release using fiber photometry on Sessions 1, 4, and 8. The rate of food-port entries during the backward cues decreased across conditioning, as we and others have reported (Delamater et al., 2003; Laurent and Balleine, 2015; Seitz et al., 2022). We applied an LME model to test how the rate of port entries changed across the sessions. The behavioral responding to backward cues significantly decreased across sessions [Fig. 3A; Port entry rate (backward cue) ∼ Session + Fibre + 1 + (1|RatID); Session, t = −3.97; p < 0.001]. We then examined dopamine release to reward delivery prior to cue presentation. To examine temporal profiles of dopamine release induced by the reward delivery, we conducted waveform analyses on the z-scored dopamine signals in each session. This analysis allows us to examine the time points at which the dopamine signal is significantly greater than zero (i.e., the baseline). These analyses showed that there was a significant phasic dopamine response at reward delivery in all three sessions (Fig. 3B, left). Interestingly, we found that the dopamine release generated by reward deliveries increased across days. To quantify the dopamine signal generated by the reward deliveries, we calculated the difference of AUCs 1 s before and after the onset of the reward. Dopamine release to the reward delivery was significantly greater on Sessions 4 and 8 compared with that on Session 1 [Fig. 3B; AUCs (1 s reward − baseline) ∼ RecSession + Latency + Fibre + 1 + (1|RatID); Session 1 vs Session 4, t = 2.21; p = 0.028; Session 1 vs Session 8, t = 4.08; p < 0.001] with no effect of latency on the dopamine AUCs seen to reward (latency, t = −0.91; p = 0.37). This demonstrated that the dopamine response to reward delivery increased across learning, and this was independent of any changes in the latency to retrieve the reward. We next examined the dopamine release to the backward cue across training. The waveform analyses showed a significant phasic response after presentation of the backward cues in Sessions 1 and 4 (Fig. 3C, left). This was reduced across learning such that there was no significant increase above the baseline in the dopamine signal seen to the backward cues in Session 8 (Fig. 3C, left). We further analyzed the difference in AUCs 2 s before and after the onset of the backward cues across sessions. While dopamine release to the backward cue on Session 4 was not significantly different from Session 1, dopamine release on Session 8 was significantly smaller compared with the Session 1 [Fig. 3C; AUCs (2 s backward cue − baseline) ∼ RecSession + Fibre + 1 + (1|RatID); Session 1 vs Session 4, t = −1.22; p = 0.22; Session 1 vs Session 8, t = −4.04; p < 0.001]. Together, these data show that the dopamine response to the reward increases and the dopamine response to the backward cue decreases across learning. This mirrors the shift in the dopamine signal from the expected outcome (here, the backward cue) to the predictive event (here, the reward) seen during forward conditioning.
Dopamine release increases to the reward and decreases to the backward cue across backward conditioning. A, Left, Experimental design. Rats received two backward associations where maltodextrin was followed by an auditory cue (e.g., tone or siren), and a grain pellet was followed by another auditory cue (e.g., siren or tone). Right, Behavioral responding during backward cues represented as food-port entry rates per minute. Food-port entries to the backward cues decreased across conditioning. B, Dopamine release to reward delivery. Left, Z-scores of dopamine release across sessions. Waveform analyses revealed the significance from baseline (i.e., z-score is zero), as indicated by the colored bars below the traces. Periods of significance from the baseline were defined by the bootstrapped 95% confidence intervals. There was a significant phasic response following the reward deliveries in all sessions. Right, Differences of AUCs of the z-scored traces 1 s before and after reward deliveries, quantifying the dopamine response generated by the reward deliveries. The AUCs significantly increased across learning. C, Dopamine release to the backward cues. Left, Z-scores of the dopamine release across sessions. Waveform analyses revealed the significance from the baseline, as indicated by the colored bars below the traces. Periods of significance from the baseline were defined by the bootstrapped 95% confidence intervals. There was a significant phasic response following presentation of the backwards cues in Sessions 1 and 4 above the baseline. On the other hand, there was no significant increase above the baseline in the dopamine signal seen at the backward cues in Session 8. Right, Differences of AUCs in z-scored traces 2 s before and after the backward cue. AUCs in Session 8 were significantly smaller than Session1, indicating that the dopamine response to the backward cues decreased across conditioning. *p < 0.05; ***p < 0.001. Mean ± SEM. See Extended Data Table 3-1 for more details.
Table 3-1
Liner mixed effect models for backward conditioning experiments. Note: SE indicates standard error of estimated fixed-term effects. 95% CIs indicate 95% confidence intervals of the estimates. Download Table 3-1, DOCX file.
The backward cues exert control over behavior in a manner that reflects specific reward–cue relationships, but dopamine release in the NAcC does not
To examine the content of learning that is reflected in the dopamine release seen during backward conditioning, we monitored dopamine release in the NAcC during a test designed to reveal specific components of learning. After undergoing backward conditioning, we trained these same rats that a visual cue precedes delivery of grain pellets. The number of food-port entries during the visual cue increased significantly across conditioning [Fig. 4A; # Port entry (30 s visual cue) ∼ Session + Fibre + 1 + (1|RatID); session, t = 10.77; p < 0.001], suggesting the rats learnt to use the visual cue to predict arrival of reward. Pairing the visual cue with one of the rewards allowed us to examine how presentations of the backward cues impact responding to the visual cue. Specifically, we gave rats a test with three trial types: (1) visual cue alone, (2) a compound with the visual cue and the backward cue associated with the same reward (i.e., congruent compound), or (3) a compound with the visual cue and the backward cue associated with the different reward (i.e., incongruent compound). If the rats are aware of the specific reward–cue relationships, responding to the congruent compound should be lower than the incongruent compound. This is because the backward cues have a specific inhibitory relationship with the specific reward with which they are paired (Delamater et al., 2003; Laurent and Balleine, 2015; Seitz et al., 2022). Thus, the backward cue in the congruent compound should exert specific inhibition over responding to the visual cue that has an excitatory relationship with the same reward. This will not be seen with the incongruent compound as the backward cue in the incongruent compound has an inhibitory association over a different reward. Indeed, we found that the behavioral response to these cues reflected the specific relationships. That is, the number of food-port entries to the congruent cue was significantly lower than the incongruent cue [Fig. 4B; # Port entry (10 s cue) ∼ CueType + Fibre + 1 + (1|RatID); congruent vs incongruent, t = 2.10; p = 0.038], revealing the specific component of the backward association. However, in contrast to the specificity that was in the behavioral response, when we examined dopamine release to the compounds, we saw a comparable increase in the dopamine release to both the congruent and incongruent cues [Fig. 4C; AUCs (10 s cue) ∼ CueType + Fibre + 1 + (1|RatID); congruent vs incongruent, t = −0.70; p = 0.49]. Together, these results indicate that rats are aware of the specific relationships with the backward cues but that the dopamine release in the NAcC does not reflect this specific component of learning.
Rats behavioral responding reflects the specific component of backward association, but the dopamine release in the NAcC does not. A, Forward conditioning. Left, Experimental design. Rats received presentation of a visual cue that was followed by a grain pellet. Right, The number of food-port entries during the visual cue increased across conditioning. B, Test of specificity. Left, Experimental design. After forward conditioning sessions, rats received a separate test session where they were presented with three cue types: visual cue only, a compound comprising the visual cue and the backward cue associated with the same reward (i.e., congruent cue), or a compound of the visual cue and the backward cue associated with the different reward (i.e., incongruent cue). Right, Behavioral responding. The number of food-port entries was lower to the incongruent cue compared with that to the congruent cue, which reflects rats’ knowledge of the specific reward–cue relationship. C, Dopamine release during the test of specificity. Left, dopamine release to the visual cue (black), congruent cue (blue), and incongruent cue (red). Right, analysis of the AUCs of the z-scored traces revealed no difference in the dopamine release to the congruent and incongruent compounds. Thus, contrary to behavior, dopamine release did not reflect the specific reward–cue relationships. D, Reversal test. Left, Experimental design. In a backward conditioning session, we gave rats trials with the reversed reward–cue associations (i.e., reversal trials). Right, The behavioral response to backward cues in normal and reversal trials represented as the rate of food-port entry per minute. The responding significantly decreased in reversal trials compared with that in normal trials. E, The dopamine response at the onset of backward cues. Left, Representation of the temporal profile of the dopamine response to backward cues on normal (black) and reversal (red) trials. Right, AUCs of the z-scored traces represented on the left. There was no difference in the dopamine response between the two trial types, suggesting dopamine response reflects the general component of backward conditioning rather than the specific component. *p < 0.05; **p < 0.01; ***p < 0.001. Mean ± SEM. See Extended Data Tables 4-1–4-3 for more details.
Table 4-1
Liner mixed effect model for forward conditioning with a visual cue. Note: SE indicates standard error of estimated fixed-term effects. 95% CIs indicate 95% confidence intervals of the estimates. Download Table 4-1, DOCX file.
Table 4-2
Liner mixed effect models for test of specificity. Note: SE indicates standard error of estimated fixed-term effects. 95% CIs indicate 95% confidence intervals of the estimates. Download Table 4-2, DOCX file.
Table 4-3
Liner mixed effect models for the reversal test. Note: SE indicates standard error of estimated fixed-term effects. 95% CIs indicate 95% confidence intervals of the estimates. Download Table 4-3, DOCX file.
Another way to examine whether the dopamine response in NAcC encodes the specific components of learning is to conduct a session where we unexpectedly shift the reward–cue associations (i.e., a reversal test). In this test, rats received standard backward trials as before (i.e., normal trials; Fig. 4D) and trials where the reward–cue contingency was reversed (i.e., reversal trials; Fig. 4D). The difference in the rate of food-port entries during the backward cues in reversal trials compared with those in normal trials was tested using LME analyses. The behavioral responding decreased significantly in reversal trials [Fig. 4D; Port entry rate (backward cue) ∼ TrialType + 1 + (1|RatID); normal vs reversal, t = −2.2; p = 0.030]. This suggests that rats detected a change in the reward–cue relationship, and this disrupted their behavior. However, when we examined the dopamine release in NAcC, we did not see a difference in dopamine release by the trial type. That is, we found that dopamine release did not change between normal and reversal trials [Fig. 4E; AUCs (2 s cue) ∼ TrialType + 1 + (1|RatID); normal vs reversal, t = 0.33; p = 0.74]. This result again supports the notion that the dopamine release in the NAcC does not reflect specific reward–cue relationships that develops during backward conditioning.
The omission test reveals the excitatory component of the dopamine response to the backward cues
Interestingly, during our test of specificity, we saw evidence that our backward cues possessed an excitatory influence over dopamine release. Specifically, we saw that the audio-visual compounds elicited greater dopamine release relative to when only the visual cue was presented [Fig. 4C; AUCs (10 s cue) ∼ CueType + Fibre + 1 + (1|RatID); congruent vs visual, t = −2.59; p = 0.011]. During the test of specificity, the backward cues in the compounds were unexpected as they were not preceded by their associated reward. Thus, the dopamine release seen to the compounds, which was higher relative to the visual cue, likely indicates the excitatory component of learning. Indeed, we have explicitly revealed both inhibitory and excitatory components of backward conditioning using this procedure in our prior studies (Seitz et al., 2022). Thus, the unexpected presentations of the auditory cues may increase dopamine release in the NAcC because dopamine release in this region reflects the general excitatory components of this learning.
However, in the test of specificity, we cannot rule out the possibility that the excitatory response to congruent or incongruent cues is due to the unexpected nature of the compound cue. That is, as the rats have never experienced the visual and backward cues together before, presentation of this novel compound is unexpected and salient, which may produce an increase in dopamine that is not related to the significance of the cues per se. To avoid this confound, we conducted an additional test to examine dopamine release to the unexpected presentation of the backward cues alone. This would allow us to confirm whether dopamine release reflects the excitatory component of learning. We recorded dopamine release while rats received a backward conditioning session that included trials where the backward cue was delivered unexpectedly without its preceding reward (i.e., omission trials; Fig. 5A). The LME analysis showed that the rate of food-port entries during backward cues increased significantly in omission trials compared with that in normal trials, revealing an excitatory component of learning in behavior [Fig. 5A; Port entry rate (backward cue) ∼ TrialType + 1 + (1|RatID); normal vs omission, t = 2.61; p = 0.0099]. Here, we also saw that the dopamine release in NAcC to the backward cues was increased in omission trials compared with that in normal trials [Fig. 5B; AUCs (2 s cue − baseline) ∼ TrialType + 1 + (1|RatID); normal vs omission, t = 2.98; p = 0.0035]. We also analyzed the dopamine signal to the first port entry after the backward cue, at which the rats would have confirmed that no reward had been delivered following an omission trial. Here, we found that the dopamine response to the port entry in the omission trials was not significantly different from that in the normal trials [Fig. 5C; AUCs (2 s port entry − baseline) ∼ TrialType + 1 + (1|RatID); normal vs omission, t = −1.46; p = 0.15]. It might be surprising that we do not see remnants of the dopamine response to the backward cue at port entry. That is, if rats enter the food port shortly after the onset of the backward cue, we might expect that the increase in dopamine seen to the backward cue confounds an ability to see a change in dopamine release generated by the food-port response. However, the mean latency to port entry on an omission trial was much longer than the phasic dopamine response to the backward cue [latency to port entry in omission trial: (mean ± SEM), 7.68 ± 1.35 s]. Together, this indicates that the change in dopamine release during the omission test was seen at the cue onset and not during the port entry. These results demonstrate that presentation of the backward cue without its preceding reward induces an increase in the dopamine response in NAcC, which reveals the excitatory nature of the backward association encoded in NAcC.
Dopamine release increases to unexpected presentations of the backward cues, revealing the excitatory component of learning. A, Omission test. Left, Experimental design. We gave rats a session with normal backward conditioning trials as well as presentation of the backward cues without their preceding rewards (i.e., omission trials). Right, The behavioral response to the backward cues was significantly larger on omission trials relative to normal trials. B, The dopamine response at the onset of backward cues. Left, The temporal profile of the dopamine response to backward cues on normal (black) and omission (red) trials. Right, Differences of AUCs in z-scored traces 2 s before and after the backward cue. Unexpected presentations of the backward cues in the omission trials induced greater dopamine release relative to normal trials. C, Dopamine release to the port entry at the backward cue in normal and omission trials. Left, The temporal profile of the dopamine response to port entries on normal (black) and omission (red) trials. Right, Differences of AUCs in z-scored traces 2 s before and after the port entry. Port entry after the backward cue did not evoke a change in dopamine release **p <0.01; ***p < 0.001. Mean ± SEM. See Extended Data Table 5-1 for more details.
Table 5-1
Liner mixed effect models for the omission test. Note: SE indicates standard error of estimated fixed-term effects. 95% CIs indicate 95% confidence intervals of the estimates. Download Table 5-1, DOCX file.
Discussion
In the present study, we recorded dopamine release in NAcC during both forward and backward conditioning tasks. We were able to recapitulate the classical prediction error signal in NAcC seen in prior studies during forward conditioning (Fig. 2; Day et al., 2007; Flagel et al., 2011; A.S. art et al., 2014; Menegas et al., 2017). Specifically, across cue–reward learning, we found that the dopamine response backpropagated from the unexpected reward to the reward-predictive cue. This validated our methods for recording dopamine release in the NAcC. Armed with this validation, we recorded dopamine release during backward conditioning. Interestingly, we found that dopamine release dynamically shifted across backward conditioning in a very similar manner to that seen during forward conditioning (Fig. 3). That is, we initially saw a phasic dopamine response to both the reward delivery and the backward cue. Across learning, the phasic response to the reward increased, while the phasic response to the backward cue decreased. The increase in the phasic response to the reward cannot be explained by reward prediction error theory (Schultz et al., 1997), which interprets the dopamine response as a cached value signal. This is because the value of the predictive reward across our backward conditioning task remained constant and so could not be the reason that the phasic increase in the dopamine response to this reward increased across time. Instead, we would interpret this increase in the phasic response to the reward as reflecting an increase in the predictive power of the reward as rats learn that it predicts the backward cue. Together, these results again support the idea that dopamine acts as a general teaching signal to guide learning.
Critically, our backward conditioning task produces different types of learned associations, which include both specific and general components of learning. Using this feature of the task, we next examined what component of learning is uniquely reflected in NAcC dopamine release. This is an important question because recent studies reveal the heterogeneity of dopamine system (Lammel et al., 2008; Beier et al., 2015; Parker et al., 2016; Morales and Margolis, 2017; Cox and Witten, 2019; Mohebi et al., 2019). Indeed, while we have previously shown that VTADA neurons are necessary for both the specific and general components of learning (Seitz et al., 2022), our data revealed that dopamine release downstream in NAcC reflected the general excitatory component of learning. Specifically, we found that dopamine release did not distinguish between the congruent and incongruent cues during the test of specificity (Fig. 4C). Similarly, dopamine release did not differentiate between normal and reversal trials during the reversal test (Fig. 4E). This was despite the rats’ behavioral responses reflecting the knowledge of the sensory-specific backward associations (Fig. 4B,D). In contrast to a lack of differentiation in the dopamine response during the specificity and reversal tests, dopamine release to the backward cues during the omission test was modulated by expectancy. That is, unexpectedly delivering the backward cue without its preceding reward evoked a strong increase in dopamine release in NAcC, which was greater in magnitude than that seen when the backward cue followed its associated reward (i.e., when it is expected). This excitatory nature was also reflected in increased behavioral responding at omission trials compared with that at normal trials (Fig. 5B). This demonstrates that dopamine release in NAcC reflects the general excitatory component, and not the specific component, of the learning that develops during backward conditioning.
We have shown that VTADA neurons are necessary for both the general and specific components of learning that evolve with backward conditioning (Seitz et al., 2022). In Seitz et al. (2022), we used optogenetic inhibition to examine the causal impacts of VTADA neuronal silencing on learning during backward conditioning. In the present study, we recorded dopamine release in the NAcC, a major downstream target of VTADA neurons. This approach allowed us to examine whether dopamine release in the NAcC reflects the general and/or specific components of learning, which we found both required VTADA neurons (Seitz et al., 2022). In the present study, we showed that dopamine release in the NAcC reflected only the general components of backward associations and not their sensory-specific components. This was despite the behavior of our rats showing that they learned both the general and specific features of the backward associations. This demonstrates that while VTADA neurons are necessary for both general and specific components of learning, NAcC dopamine release reflected only the general components. This finding reinforces the recent notion that distinct dopamine projections contribute in unique ways to learning. However, to examine whether dopamine release in NAcC is necessary for driving general component of learning, we would need to optogenetically silence VTADA projections to the NAcC during a backward conditioning task (as we did VTADA neurons in our prior study; Seitz et al., 2022). Future studies revealing both the causal (optogenetic approach) and correlative (recording approach) contributions of these circuits to different components of learning will broaden our understanding of how these circuits drive a cohesive understanding of our environment.
The present study begs the question of how other dopamine projections might facilitate the specific components of learning, which we found was not reflected in dopamine activity in the NAcC. Recent studies have shown that dopamine projections to the lateral hypothalamus (Hoang et al., 2023) and the basolateral amygdala (Sias et al., 2024) stamp in associations between cues and specific rewards (i.e., the specific component of learning). Furthermore, there is also evidence that the nucleus accumbens shell (Corbit and Balleine, 2011) and orbitofrontal cortex (Howard and Kahnt, 2018; E. E. Hart et al., 2020) encode these specific components of learning, where the latter has also been correlated with dopaminergic activity (Howard and Kahnt, 2018). Critically, there are subtle but important nuances in the way that these circuits contribute to the specific components of learning that likely have wider consequences for the way our brain functions to create a unified learning experience. For example, we have shown that the lateral hypothalamus is important for biasing learning toward information proximal to rewards (Sharpe et al., 2017b, 2021; Hoang and Sharpe, 2021; Sharpe, 2024). In contrast, the orbitofrontal cortex is necessary to learn about sensory-specific associations between events (e.g., cue–cue associations; E. E. Hart et al., 2020), regardless of whether those associations involve rewards. That is, many other dopamine circuits that do encode the specific components of learning (as distinct from the general component in NAcC) likely contribute to specific forms of learning in critically different ways.
Understanding how distinct dopamine circuits contribute to very specific aspects of reinforcement learning is important. For example, we have recently developed a model that accounts for the positive and negative symptoms of schizophrenia by simulating asymmetric changes in distinct dopamine circuits as a critical feature of this disorder (Millard et al., 2022). This approach is novel as the field has traditionally conceptualized a global increase in subcortical dopamine driving the positive symptoms of schizophrenia (Fusar-Poli and Meyer-Lindenberg, 2013; Sarpal et al., 2016; McCutcheon et al., 2019), while hypofrontality is argued to drive the negative symptoms (Barch and Dowd, 2010; Waltz et al., 2013; Strauss et al., 2014; Slifstein et al., 2015). However, leveraging our new understanding of how dopamine contributes to learning in different ways via distinct circuits allows us to refine this hypothesis. In this context (Millard et al., 2022), we argue that the nucleus accumbens is a battleground for competing influences over reinforcement learning that come from distinct dopamine circuits. Specifically, we hypothesize that orbitofrontal inputs to nucleus accumbens are strengthened in schizophrenia, while inputs coming from the medial prefrontal cortex and lateral hypothalamus are weakened. Together, this could account for the positive symptoms of the disorder that are associated with overlearning about neutral or irrelevant information (dependent on orbitofrontal circuits), and the reduction in learning about reward-predictive information that characterizes the negative symptoms of the disorder (dependent on the lateral hypothalamus and medial prefrontal cortex). Further research to distinguish the ways these distinct dopamine circuits learn and interact with one another is critical to our understanding of psychopathology.
Footnotes
This work was funded by National Science Foundation CAREER2143910, R01DA054967, R01DA057084, BBRF 30637, and R21MH126278 awarded to M.J.S.
The authors declare no competing financial interests.
- Correspondence should be addressed to Masakazu Taira at masakazu.taira{at}sydney.edu.au or Melissa J. Sharpe at melissa.sharpe{at}sydney.edu.au.