Abstract
Adaptive decision-making relies on flexible updating of learned associations where environmental cues come to predict valenced stimuli, such as food or threat. Cue-guided behavior depends on a network of brain systems, including dopaminergic projections to the striatum. Critically, it remains unclear how dopamine signaling across the striatum encodes multivalent, dynamic learning contexts, where positive and negative associations must be rapidly disambiguated. To understand this, we used a pavlovian discrimination paradigm, where cues predicting food or threat were intermingled during conditioning sessions and their meaning was serially reversed across training. We found that male and female rats readily distinguished these cues and updated their behavior rapidly upon valence reversal. Using fiber photometry, we recorded dopamine signaling in three major striatal subregions—the dorsolateral striatum (DLS), the nucleus accumbens (NAc) core, and the NAc medial shell—finding that valence was represented uniquely across all three regions, indicative of local signals biased for value and salience. Furthermore, ambiguity introduced by cue reversals reshaped striatal dopamine on different timelines: NAc signals updated more readily than those in the DLS. Together, these results indicate that striatal dopamine flexibly encodes stimulus valence according to region-specific rules, and these signals are dynamically modulated by changing contingencies in the resolution of ambiguity about the meaning of environmental cues.
Significance Statement
Adaptive decision-making relies on updating learned associations to disambiguate predictions of reward or threat. This cue-guided behavior depends on striatal dopamine, but it remains unclear how dopamine signaling encodes multivalent, dynamic learning contexts. Here, we employed a paradigm where cues predicting positive and negative outcomes were intermingled, and their meaning was serially reversed across time. We recorded dopamine signaling, finding heterogeneous patterns of valence encoding across striatal subregions, and cue reversal reshaped subregional signals on different timelines. Our results suggest that dopamine flexibly encodes dynamic learning contexts to resolve ambiguity about the meaning of environmental cues.
Introduction
Adaptive decision-making requires rapidly adjusting learned behaviors in response to changing contingencies. Environmental cues guide this process via pavlovian learning, signaling the identity and valence of future rewards and threats. Understanding how the brain disambiguates complex and dynamic sensory elements to promote flexible behavior is important for insight into a number of psychiatric disorders, including addiction, obsessive compulsive disorder, PTSD, and schizophrenia (Swainson et al., 2000; Remijnse et al., 2006; Leeson et al., 2009; Izquierdo and Jentsch, 2012).
The striatum is a key center for cue-based learning, decision-making, and action selection, and disruptions in striatal signaling contribute to behavioral inflexibility (Ragozzino, 2007; Haluk and Floresco, 2009; Castañé et al., 2010; Brown et al., 2011; Klanker et al., 2013; Floresco, 2015; Cox and Witten, 2019). Critically, the striatum is anatomically and functionally heterogeneous (Swainson et al., 2000; Di Ciano et al., 2001; Cardinal et al., 2002; Barnes et al., 2005; Everitt and Robbins, 2005; Yin et al., 2009; Saunders and Robinson, 2012; Bakhurin et al., 2016; Nestler and Lüscher, 2019), and a central component of this is dopamine signaling, where different striatal niches subserve specialized learning-related functions (Brown et al., 2011; Howe and Dombeck, 2016; Parker et al., 2016; Klanker et al., 2017; Saunders et al., 2018; Cox and Witten, 2019; Radke et al., 2019; Collins and Saunders, 2020; Salinas et al., 2023; Mohebi et al., 2024). While striatal dopamine is classically associated with reward learning and reinforcement (Schultz et al., 1997; Wise, 2004), there is now evidence for a broader role, including encoding negative valence and threat/avoidance conditioning (Pezze and Feldon, 2004; Roitman et al., 2008; Fadok et al., 2009; Badrinarayan et al., 2012; Mccutcheon et al., 2012; Oleson et al., 2012; Lammel et al., 2014; Wendler et al., 2014; Park and Moghaddam, 2017; De Jong et al., 2019; van Elzelingen et al., 2022a; Lopez and Lerner, 2025).
Despite broad interest in identifying heterogeneous dopamine signaling patterns across striatal regions, it remains unclear how these signals encode conditioned valence and adapt to situations where cue identity is ambiguous. This is an important gap to a robust understanding of dopaminergic modulation of decision-making. To explore this question, we employed a pavlovian valence discrimination paradigm modified from previous studies (Kim et al., 2010; Burgos-Robles et al., 2017), where cues predicting positive and negative outcomes were intermingled during conditioning sessions, and their meaning was serially reversed across training sessions. Rats came to distinguish these cues and rapidly updated their behavior upon valence reversal. We used fiber photometry to measure dopamine signaling in three major striatal subregions that are uniquely interconnected with the larger thalamo-cortico-striatal network and for which there are established functional differences in the context of pavlovian learning, value signaling, and behavioral flexibility—the dorsolateral striatum (DLS), the nucleus accumbens (NAc) core, and the NAc medial shell (Swainson et al., 2000; Di Ciano et al., 2001; Cardinal et al., 2002; Barnes et al., 2005; Everitt and Robbins, 2005; Yin et al., 2009; Nestler and Lüscher, 2019).
We found a triple dissociation in the qualitative pattern of responses to reward and threat conditioned cues and their predicted outcomes, reflecting a mixture of local scaling of value, salience, and cue identity properties. Furthermore, valence ambiguity introduced by cue reversal reshaped striatal dopamine signals on different timelines, with NAc core and shell dopamine signals updating faster than those in the DLS. Together, our results indicate that striatal dopamine niches uniquely encode different features of positive and negative pavlovian associations, and these signals are dynamically modulated by changing contingencies to resolve uncertainty about the meaning of environmental cues.
Materials and Methods
Subjects
Long–Evans rats (N = 23; 11 females, 12 males, Envigo) were single-housed in a humidity- and temperature-controlled vivarium, with ad libitum access to water. Rats were acclimated to the vivarium for at least 5 d and handled for at least 3 d prior to any procedures. Rats were food-restricted to ∼90% of free-feeding weight 1 week prior to starting any behavioral experiments. All procedures were done in accordance with the National Institutes of Health's Office of Laboratory Animal Welfare and the University of Minnesota's Institutional Animal Care and Use Committee.
Stereotaxic surgery
Rats were anesthetized with isoflurane (5% induction, 1–2% maintenance; Patterson Veterinary Supply) and injected with carprofen (5 mg/kg, 50 mg/ml, s.c.), cefazolin (70 mg/kg, 0.2 g/ml, s.c.), and saline (0.5 ml, s.c.). Following head fixation in a digital stereotax (David Kopf Instruments), the skull was exposed, leveled, and scored, and a craniotomy was drilled over the target region. Plasmid containing dLight viral DNA (pAAV-CAG-dLight1.3b) was purchased from Addgene (#125560; Patriarchi et al., 2018) and packaged by the University of Minnesota's Viral Vector and Cloning Core with a titer of ∼1 × 1013 GC/ml (AAV5-CAG-dLight1.3b). Viral vectors were loaded into 10 μl syringes (Hamilton Company) and infused (700 nl, 140 nl/min) unilaterally into the NAc medial shell (coordinates in mm, relative to the bregma, AP +1.5, ML −0.8, DV −7.5), NAc core (AP +1.3, ML −1.3, DV −7.0), or DLS (AP +1.0/1.2, ML −3.3/3.5, DV −5.0). Mono fiber-optic cannulae (silica/polymer, 400 μm, NA 0.48, 9.0 mm fiber, 2.5 mm metal ferrule, flat tip; Doric Lenses) were implanted unilaterally into the NAc medial shell (AP +1.5, ML −0.8, DV −7.3), NAc core (AP +1.3, ML −1.3, DV −6.8), or DLS (AP +1.0/1.2, ML −3.3/3.5, DV −4.8) above injection sites. Fiber-optic cannulae were secured with dental cement (Lang Dental) and further secured to the skull with self-tapping bone screws (Fine Science Tools). Following surgery, rats were given daily injections of carprofen (5 mg/kg, 50 mg/ml, s.c.), and cefazolin (70 mg/kg, 0.2 g/ml, s.c.) for 3 d.
Habituation and pavlovian conditioning
Behavior training began ∼3–4 weeks after surgery. Med Associates chambers were outfitted with speakers for the delivery of tone (high, 4,500 Hz; low, 2,900 Hz) or white noise cues, a syringe pump to deliver liquid reward, a magazine port equipped with an infrared beam to sense port entries (PEs), and floor bars for the delivery of electrical shocks. Chambers were cleaned with 70% ethanol, and bedding was replaced after each session.
Rats were first acclimated to the behavioral chambers, conditioning cues, and optic cable tethering in a ∼35 min habituation session. During this session, rats were tethered to a cable and given 15 noncontingent cue presentations (∼60 dB; five high tone, five low tone, five white noise) on a 120 s average variable time (VT) schedule (range 90–150 s) lasting 5 s each. These cue presentations were not associated with any outcome.
Next followed a pavlovian reversal task adapted from previous studies (Kim et al., 2010; Burgos-Robles et al., 2017). During the initial learning phase (five training sessions), rats were presented with 60 cues on a 60 s average VT schedule (range, 45–75 s), with each cue presentation lasting 20 s. Twenty-five cues predicted liquid reward (CS + R, 0.1 ml Ensure chocolate shake) delivered from 10–15 s, 25 cues predicted footshock (CS + S, 0.4 mA) delivery from 19.5 to 20 s, and 10 cues had no outcome (CS−). Cues were presented in a pseudorandom order. The high and low tones were counterbalanced across rats for the CS + R and CS + S, and white noise was always used for the CS−. Each session lasted ∼80 min, and PE instances and the duration of each PE throughout each session were recorded by the Med Associates software. Cameras positioned above each chamber recorded movement throughout each session for behavioral analysis.
Cue–outcome reversals
After five sessions of initial learning, the reward and shock cue contingencies were reversed (Reversal 1), whereby the CS + R became the CS + S and vice versa (e.g., a tone that predicted reward now predicts shock). All other aspects of the task were identical to the initial learning phase. After five sessions in the Reversal 1 phase, the contingencies were reversed again back to the original associations (Reversal 2) as in the initial learning phase for another five sessions.
Extinction
Behavioral testing concluded with one extinction session. This session was identical to the pavlovian reversal task, except with the omission of the reward and shock deliveries.
Fiber photometry recordings
To measure dopamine signals in the core, shell, and DLS, we performed fiber photometry using a system with optical components from Doric Lenses controlled by a real-time processor from the Tucker-Davis Technologies (TDT; RZ5P)-running Synapse software for session control and data acquisition, similar to previous studies (Engel et al., 2024). A fluorescence minicube transmitted light streams from 465 to 415 nm LEDs, sinusoidally modulated at 211 and 330 Hz, respectively. LED currents were adjusted to ∼50 μW for each signal. Fluorescence from neurons at the fiber tip was transmitted back to the minicube, where it was passed through a GFP emission filter, amplified, and focused onto a high sensitivity photoreceiver (Newport, Model 2151). The RZ5P processor-running Synapse software modulated the output of each LED and recorded photometry signals, which were sampled from the photodetector at 6.1 kHz. Demodulation of the brightness produced by the 465 nm excitation, which stimulates dopamine-dependent dLight fluorescence, versus isosbestic 415 nm excitation, which stimulates dLight in a dopamine-independent manner, allowed for correction for bleaching and movement artifacts.
Behavioral timestamps (e.g., PEs) were fed into the RZ5P processor as TTL signals from the operant chambers (Med Associates) for alignment with neural data. Video recording feeds were similarly integrated with the photometry data via the Synapse software. Photometry recordings occurred on the following sessions: habituation, 1, 3, 5, 6 (Reversal 1), 8, 10, 11 (Reversal 2), 13, 15, and 16 (extinction).
DeepLabCut-based pose tracking
Markerless tracking of animal body parts was conducted using the DeepLabCut Toolbox (DLC version 2.2.3; Mathis et al., 2018), and analysis of movement features based on these tracked coordinates was conducted in MATLAB R2022b (MathWorks). All DLC analysis was conducted on either a Dell G7-7590 laptop-running Windows 10 with an Intel Core i7-9750H CPU, 2.60 Ghz, 16 GB RAM, and an NVIDIA GeForce RTX 2080 Max-Q 8 GB GPU or an Alienware Aurora R13 Gaming desktop-running Windows 11 with an Intel Core i9-12900F CPU, 2,400 Mhz, 128 GB RAM, and an NVIDIA GeForce RTX 3090 32 GB GPU. DeepLabCut was installed in an Anaconda environment with Python 3.8.15, CUDA 12.0, and TensorFlow 2.10.0. Videos (944 × 480 resolution) were recorded with a sampling frequency of 10–20 fps using the TDT Synapse software with overhead cameras (Vanxse CCTV 960H 1000TVL HD Mini Spy Security Camera 2.8–12 mm Varifocal Lens Indoor Surveillance Camera).
DeepLabCut model
Two hundred ninety frames from 35 videos (32 different animals, three experiments) were labeled, and 807 outlier frames were relabeled to refine a network described previously (Collins et al., 2023) for the current study. Labeled frames were split into a training set (95% of frames) and a test set (5% of frames). A ResNet-50–based neural network (Insafutdinov et al., 2016) was used for 1,030,000 training iterations. After final refinement, we used a p cutoff of 0.85 resulting in a training set error of 2.99 pixels and test set error of 3.68 pixels. The body parts labeled included the nose, eyes, ears, fiber-optic implant, shoulders, tail base, and an additional three points along the spine. Features of the environment were also labeled, including the four corners of the apparatus floor and the two magazine ports. This model was then used to analyze videos from 23 rats (10 DLS, 8 NAc core, and 5 NAc shell) for all training sessions.
Histology
Rats received intraperitoneal injections of Fatal-Plus (2 ml/kg; Patterson Veterinary) to induce a deep anesthesia and were transcardially perfused with cold phosphate-buffered saline (PBS) followed by 4% paraformaldehyde (PFA). Brains were removed and postfixed in 4% PFA for ∼24 h and then cryoprotected in a 30% sucrose in PBS solution for at least 48 h. Sections were cut at 50 μm on a cryostat (Leica Biosystems). To confirm viral expression and fiber-optic placements, we mounted the brain sections containing the striatum on microscope slides and coverslipped them with VECTASHIELD containing DAPI counterstain. Slides were imaged using a fluorescent microscope (Keyence BZ-X710) with a 4× air immersion objective. Fiber placements and virus spreads were assessed using the Rat Brain Atlas (Paxinos and Watson, 2007).
Behavioral analysis
PEs and freezing bouts were the primary behavioral output measures. PE frequency and duration were analyzed from Med-PC software outputs and the generated TTL signals, while freezing bouts were calculated from positional information tracked using DLC. DLC coordinates and confidence values for each body part and frame were imported to MATLAB and filtered to exclude body parts/features from any frame where the confidence was <0.7. For labeled features of the environment (those with a fixed location), the average coordinates for that recording were used for analysis. For each video, a pixel-to-centimeter conversion rate was used to convert pixel distances to the real chamber dimensions. The distance (in pixels) between each edge of the environment floor and the diagonal measurements from corner to corner was measured, and these values were divided by the actual distance in centimeter. The mean of these values was then used as the conversion factor. The movement threshold for detecting freezing was calibrated to an animal size using a scale factor determined from the relationship between the body size (distance between the shoulder and lower back points) and the optimal threshold for detecting movement in a subset of animals used in this study (n = 4). This limit was used to detect movement in the face and head, and for the remaining body parts, this value was multiplied by 2 to allow for detection of finer movements of the face versus the body. Freezing bouts were detected when all visible body parts were below their respective movement thresholds for a single frame, and a sliding window was used to determine when the speed of two or more body parts exceeded the detection threshold for 0.2 s, indicating the beginning and end of a movement. Frames in which <2 body parts were visible were ignored, and bouts with durations <1 s were excluded.
Fiber photometry analysis
Recorded fluorescence signals produced by 465 and 415 nm LEDs were downsampled to 40 Hz, and a least-square linear fit was applied to the 415 nm signal to align it to the 465 nm signal. The fitted 415 nm signal was used to normalize the 465 nm signal, where ΔF/F = (465 nm signal − fitted 415 nm signal)/(fitted 415 nm signal). Task events (e.g., cue presentations) were timestamped in the photometry data file via a TTL signal from Med-PC, and behavior was video recorded as described below at 10–20 fps, with timestamps for each frame captured in the photometry file. Normalized signals for each trial were extracted in the window from 5 s preceding to 25 s following each cue presentation and Z-scored to the 5 s precue period for each trial to minimize the effects of drift in the signal across experiment duration. Given that rats were not head-fixed and made PEs with somewhat variable latencies trial by trial, we isolated reward consumption periods by analyzing photometry signals in relation to when PEs occurred immediately following reward pump onset. Normalized signals for reward consumption were extracted in the window from 3 s preceding to 10 s following each PE and Z-scored to the 3 s pre-PE. A PE was categorized as rewarded if it was the first PE after reward delivery with a duration >0.5 s. Normalized signals for shock responses were extracted in the window from 3 s preceding to 5 s following each shock delivery and Z-scored to the 3 s preshock delivery. Cue, reward, and shock responses were detected in the average waveform for each animal in the 2 s window beginning at event onset. The maximum and minimum responses in each window were detected, and latency was calculated relative to the start of the detection window. The area under the curve (AUC) values were calculated from the Z-scored traces by numerical integration via the trapezoidal method using the trapz function (MATLAB).
Statistical analyses
Behavior and photometry signal data were analyzed with a combination of ANOVA models (one-way, two-way, and mixed effects). Post hoc comparisons and planned t tests were used to clarify main effects and interactions. Graphs represent average values of sites/subjects, not individual trials. Data are expressed as mean ± SEM. For all tests, the α level for significance was set to p < 0.05.
Results
Rats discriminate between intermingled positively and negatively valenced pavlovian cues
To investigate the effect of intermingled positive and negatively valenced stimuli on conditioned behavior and dopamine signaling in striatal subregions, we used a pavlovian discrimination task adapted from previous studies (Kim et al., 2010; Burgos-Robles et al., 2017). Rats first underwent habituation to three novel auditory cues (high tone, low tone, white noise), followed by an initial learning phase where one conditioned stimulus (CS) was paired with a liquid reward (CS + R), one was paired with a footshock (CS + S), and a third was paired with no outcome (CS−; Fig. 1A). To explore the relationship between dopamine signals and dynamic cue valence, the initial learning phase was followed by a reversal phase, where the tones predicting reward and shock delivery were reversed. After further conditioning under the reversed cue conditions, we employed a second reversal phase, where the CS and US pairings were returned to their original contingencies for several sessions (Fig. 1B).
Multivalent pavlovian cue discrimination task. A, Schematic showing a representative session of the task. Shock (CS + S), reward (CS + R), and neutral cue (CS−) trials were intermingled during pavlovian training. Reward PEs and freezing bouts were the primary measures. Rats (N = 23; 11 females, 12 males) learned to discriminate the cues with appropriate behavioral responses. B, Experimental timeline. Primary training data from Sessions 1, 3, 5, 10, and 15 are shown in this figure. C, Percentage time spent in the port during the cues diverged with training, with the most time spent in the port during the CS + R. D, Percentage time in the port during the CS + R increased during initial training, and by Session 5 rats discriminated the CS + R from the other cues. E, Percentage time spent in the port during the cues continued to diverge after initial training. F, Percentage time freezing during the cues diverges across training, with the most time freezing spent during the CS + S. G, Percentage time freezing during initial training. H, Percentage time freezing remained divergent after initial training. Data represent mean ± SEM. ****p < 0.0001; ***p < 0.001; **p < 0.01; *p < 0.05.
We measured two primary conditioned responses (CR): reward PEs and freezing. To track cue discrimination learning generally, we initially focused analysis on Sessions 1, 3, 5, 10, and 15 (Fig. 1). We examined the percentage of time spent in the port for the entire 20 s cue period. Rats quickly learned to discriminate between CS types, and this discrimination grew across sessions (Fig. 1C; session × CS interaction F(8,176) = 22.23; p < 0.0001). Time in the port during the cue increased from Days 1 to 5 only for the CS + R (Fig. 1D; multiple-comparison test p < 0.0001), and by Day 5, time in the port was higher for the CS + R than the CS + S (p < 0.0001) and CS− (p = 0.0240). After the initial training phase (Days 1–5), cue discrimination continued to improve (Fig. 1E), as shown by increased time in port from Day 5 to 15 for the CS + R (p < 0.0001). Time in port for the CS + R on Day 15 was higher than for either the CS− or CS + S (both p < 0.0001). Notably, time in port also increased from day 5 to 15 for the CS− (p = 0.0100) and was higher for the CS− than CS + S on Day 15 (p < 0.0001), potentially reflecting the learned safety of the CS− period.
Next, we examined the percentage of time rats spent freezing during each cue. Rats quickly learned to discriminate between CS types across sessions (Fig. 1F; session × CS interaction, F(8,186) = 7.071; p < 0.0001). By Day 5, freezing was highest for the CS + S (Fig. 1G), relative to the CS + R and CS− (multiple-comparison test, p = 0.0109 and p < 0.0001, respectively). Across the remaining training phases, cue discrimination improved (Fig. 1H). Freezing during the CS + S was greater than the other cues (both p < 0.0001), and the time spent freezing during the CS + R decreased from Day 5 to 15 (p < 0.0001). We saw some evidence of CR generalization (Fig. 1F–H), as freezing during the CS + R remained higher than the CS− (p < 0.0001) across training.
We examined pavlovian-conditioned behaviors with data split by sex. Overall, we found no sex differences in pavlovian cue discrimination (two-way mixed–model ANOVAs, no sex containing significant interactions and all main effects of sex p > 0.1) with females (n = 11) and males (n = 12) exhibiting comparable levels of conditioned behavior across training. As such, data were collapsed for all analyses.
Subregion-specific striatal dopamine responses to positively and negatively valenced outcomes
We recorded dopamine activity in the DLS, NAc core, and NAc medial shell using fiber photometry with the dopamine sensor dLight 1.3b (Fig. 2A–E). As with previous studies (Engel et al., 2024), robust transient dopamine-related fluorescent signals were measured in each region, with minimal signal dynamics observed in the control channel (Fig. 2F–H).
Heterogeneous responses to positive and negative stimuli among striatal dopamine signals. A, The dopamine sensor dLight 1.3b was expressed in the striatum of wild-type Long–Evans rats. Schematic of the photometry system; dLight signals were fitted against a control signal to account for photobleaching and movement artifacts. B, Photometry measurements were made from three striatal subregions: the DLS (green, n = 10), NAc core (blue, n = 8), and NAc shell (purple, n = 5). Representative virus and fiber placement for the (C) DLS, (D) core, and (E) medial shell subregions. F–H, Example long (500 s) recording traces of raw 415 and 465 signals show stable signal and consistent bleaching patterns across channel type and region. I, In the DLS, dopamine increased mildly during reward consumption. J, In the core, there was a large positive dopamine response during reward. K, In the shell, there was a slight peak in dopamine followed by a dip below baseline during reward. L, Quantification of responses plotted in I–K. M, In the DLS, dopamine dipped sharply then rebounded above the baseline before returning in response to the shock. N, In the core, there was a dopamine peak in response to the shock. O, In the shell, there was not a clear response to the shock. P, Quantification of responses plotted in M–O. Data represent mean ± SEM. ****p < 0.0001; **p < 0.01.
We first examined dopamine signals in response to each unconditioned stimulus (footshock, reward). The dopamine signal in response to reward consumption varied qualitatively across regions: there was a small, slow positive response in the DLS (Fig. 2I); a larger, phasic positive response in the core (Fig. 2J); and a small, positive response followed by a dip below the baseline in the shell (Fig. 2K). Given this variability, we measured the peak or trough of the reward response, which differed between regions (Fig. 2L; one-way ANOVA F(2,18) = 9.941; p = 0.0012). Next, we looked at dopamine responses to shock delivery, which also varied across regions: the DLS had a biphasic response, with a trough during shock followed by a positive peak at shock offset (Fig. 2M); the core had a positive response during shock (Fig. 2N); and the shell responded more slowly, with little change during the shock itself but a slow steady increase in dopamine that emerged after shock offset with a mean time to peak of 1.51 s (Fig. 2O). Correspondingly, the magnitude of the shock response differed across regions (Fig. 2P; F(2,20) = 24.54; p < 0.0001).
Pavlovian valence ambiguity shapes dynamic heterogeneity in striatal dopamine signals
Dopamine signals in the striatum track elements of appetitive and aversive pavlovian learning, but most studies examine conditioning in the context of univalent learning conditions. Here, we assessed striatal dopamine under conditions of valence ambiguity, whereby cues that predict positive and negative outcomes were intermingled, and rapid discrimination was required for appropriate behavior. We recorded dopamine signals in the DLS, NAc core, and NAc medial shell during the pavlovian discrimination reversal task described in Figure 1.
Focusing first on the CS + R trials (Fig. 3), heterogeneous dopamine signals emerged across training (Fig. 3B–G). In the DLS, the initial CS + R response was negative (Fig. 3B); however, as training progressed, the negative signal diminished (AUC, F(3,27) = 8.646; p = 0.0003) and became biphasic, with a peak emerging after the initial negative response. In contrast, the DLS response to reward consumption (Fig. 3C) did not change throughout training (AUC, F(3,25) = 0.9476; p = 0.4326). In the core, the early response to the CS + R was biphasic, with an initial peak followed by a trough (Fig. 3D). Across training, the trough disappeared, and the response became larger (AUC, F(3,21) = 12.29; p < 0.0001). The core response to reward (Fig. 3E) did not change throughout training (AUC, F(3,20) = 1.307; p = 0.2998). The shell response to the CS + R was initially negative, with a gradual progression across training to a large peak (Fig. 3F, AUC, F(3,12) = 15.00; p = 0.0002). The shell response to reward (Fig. 3G) also did not significantly change over the course of training (AUC, F(3,11) = 0.5755; p = 0.6429). Across all recording sites, evolution in the shapes of the cue signals were seen comparing Sessions 5 with 10/15 (i.e., pre- vs postreversal). The relationship between reward cue-evoked dopamine signals and behavior also varied by region. By the end of training, the CS + R-evoked AUC in the core, but not shell or DLS, became positively correlated with the time spent in the reward port (R2 = 0.77; p = 0.004). Together these results highlight subregion-specific dopamine responses to reward-predictive cues, which change with the progression of learning in valence discrimination.
Regionally heterogeneous and dynamic dopamine signals in response to appetitive stimuli. A, Experimental timeline. Photometry data from the highlighted sessions (1, 5, 10, 15) are shown in this figure. Schematic of the CS + R tone showing reward delivery from 10 to 15 s. In the (B) DLS, (D) core, and (F) shell, dLight responses to the CS + R increased throughout training with different patterns in each striatal subregion. C, E, G, Reward-associated dLight signals were different in magnitude across subregions but remained stable within each subregion across training. Data represent mean ± SEM. ****p < 0.0001; ***p < 0.001.
Beyond the classic connection with reward-related learning and reinforcement, at least some striatal dopamine signals also track aversive learning (Matsumoto and Hikosaka, 2009; Badrinarayan et al., 2012; Oleson et al., 2012; Lerner et al., 2015). To explore this in the context of our valence discrimination task, we next focused on CS + S trials and found heterogeneous dopamine signals to the CS + S and shock across training (Fig. 4B–G). In the DLS, early in training, the dopamine response to the CS + S was negative (Fig. 4B). As training progressed, there was no consistent change in this signal (AUC, F(3,27) = 1.653; p = 0.2005). The DLS response to the shock itself had a distinct biphasic shape, with a trough followed by a similar magnitude peak at shock offset that did not change across training (Fig. 4C; AUC, F(3,27) = 1.953; p = 0.1448). In the core, the dopamine response to the CS + S showed an initial peak followed by a trough (Fig. 4D), and this signal also did not significantly change over time (AUC, F(3,21) = 2.144; p = 0.1251). The core response to the shock itself, which comprised a sharp peak (Fig. 4E) also did not change (AUC, F(3,21) = 0.8604; p = 0.4770). The dopamine response to the CS + S in the shell included a rapid decrease with slow recovery to the baseline (Fig. 4F), which trended smaller across training (AUC, F(3,12) = 2.730; p = 0.0903). While the dopamine response to the shock itself in the shell was relatively small, there was a slow dopamine increase at shock offset (Fig. 4G) that did not change with training (AUC, F(3,12) = 0.2348). Collectively, these results show subregion-specific dopamine responses to shock-predictive cues that are relatively invariant across valence discrimination learning.
Regionally heterogeneous but stable dopamine signals in response to threat stimuli. A, Experimental timeline. Photometry data from the highlighted sessions (1, 5, 10, 15) are shown in this figure. Schematic of the CS + S tone showing shock delivery from 19.5 to 20 s. B, In the DLS, the dLight signal in response to the CS + S was negative. C, The DLS dLight signal was negative during shock and rebounded to a peak after shock offset. D, In the core, the dLight signal in response to the CS + S included a peak followed by a trough. E, In the core, the dLight response to the shock was positive. F, In the shell, the dLight response to the CS + S was negative. G, There was no dopamine response in the shell during shock but a mild increase at shock offset. Data represent mean ± SEM.
Region-consistent dopamine responses to the CS− in the multivalent cue discrimination task
Recent evidence suggests that NAc dopamine signals track conditioned safety cues, the presence of which signal relief from, or avoidance of, an expected aversive outcome (Luo et al., 2018; Stelly et al., 2019). In our task, the CS− predicted no outcome, but given that it was intermingled with the CS + cues and was the only cue that was never paired with shock, within this learning context, it likely functioned akin to a safety cue (Day et al., 2016; Ng et al., 2018; Ng and Sangha, 2023). This was behaviorally evident, given that time in port during the CS− was higher than during the CS + S, and freezing during the CS− was lower than both the CS + S and CS + R (Fig. 1). In contrast to the variable signals seen for each CS + cue, we found qualitatively similar dopamine responses to the CS− across the striatum (Fig. 5). Early in training, responses to the CS− onset were small and slightly negative. As training progressed, strong positive signals to the CS− emerged across regions (DLS AUC, F(3,27) = 7.012; p = 0.0012; core AUC, F(3,21) = 7.312; p = 0.0015; shell AUC, F(3,12) = 2.580; p = 0.1021). We also saw strong positive dopamine responses in all three regions at the offset of the CS−, which changed in shape but not magnitude across training (DLS AUC, F(3,27) = 1.067; p = 0.3795; core AUC, F(3,21) = 1.796; p = 0.1787; shell F(3,12) = 1.854; p = 0.1913).
Regionally similar dopamine signals emerge to the CS− during multivalent cue discrimination task. A, Experimental timeline. Photometry data from the highlighted sessions (1, 5, 10, 15) are shown in this figure for the CS−, presentations of which were intermingled with reward and shock-predictive cues. The CS− was never associated with either unconditioned stimulus. In the (B) DLS, (D) core, and (F) shell, dLight responses to the CS− increased across training. dLight responses were also seen following CS− offset in the (C) DLS, (E) core, and (G) shell, which generally grew in magnitude before stabilizing in the middle of training. Data represent mean ± SEM. **p < 0.01.
Valence reversal prompts rapid updating of pavlovian-conditioned behavior
In decision-making contexts, the meaning of environmental stimuli is often dynamic, and previously learned associations must be updated based on changing contingencies. We probed this in our valence discrimination task by reversing the predictive meaning of reward and shock cues at two points in training (Fig. 6A). On the first day of Reversal 1 (Day 6), the tones for the CS + R and CS + S were swapped. On the first day of Reversal 2 (Day 11), the tones were reversed back to the original contingency of the initial learning phase. Similar to the initial learning phase, there were no sex differences in behavior during reversals.
Pavlovian-conditioned behavior rapidly updates upon cue reversal. A, Experimental timeline. Training began with habituation to the three auditory cues used in the task. Reward (CS + R), shock (CS + S), and neutral cue (CS−) trials were then intermingled, with the CS + R and CS + S switching identity during Reversal 1 (Days 5–10) before switching back to the original contingency during Reversal 2 (Days 11–15). The final session (Day 16) was conducted under extinction conditions. B, Percentage time in the port during cues diverged throughout training, with behavior updating when tones predicting reward and shock were reversed on Sessions 6 and 11, along with extinction. Rats increased time in the port when the tone predicted reward, decreased time in the port when the tone predicted shock, and remained in the middle when white noise predicted nothing. C, Within the first session of Reversal 1, rats’ behavior changed due to cue reversal: there was a bias in the reversal, where rats decreased time in the port to Tone A (reward→shock) more than they increased time in the port to Tone B (shock→reward). D, Within the first session of Reversal 2, rats’ behavior rapidly changed back, with an increase in percentage time in the port from shock→reward reversal and decrease from reward→shock reversal. The bias for reversal from reward→shock that was seen in Reversal 1 disappeared. E, Percentage time in the port strongly decreased during extinction. F, Percentage time freezing during cues diverged throughout training, with behavior appropriately updating when tones predicting reward and shock were reversed on Sessions 6 and 11 along with extinction. Rats increased time freezing when the cue predicted shock, decreased time freezing when the tone predicted reward, and remained lowest when white noise predicted nothing. Some generalization to the two tones occurred, as freezing remained higher for reward-predictive cues compared with the CS−. G, Within the first session of Reversal 1, freezing behavior did not reverse to reflect the new contingencies. H, Within the first session after reversal 2, freezing behavior full reversed. Percentage time freezing decreased for Tone A (shock→reward) more than it increased for Tone B (reward→shock). I, In extinction, freezing behavior decreased modestly for both cues. Data represent mean ± SEM. ****p < 0.0001; ***p < 0.001.
Across training, rats spent more time in the reward port on CS + R trials (Figs. 1, 6B). In the first session of Reversal 1 (Day 6), percentage time in the port was updated to reflect the new contingencies (Fig. 6C; session × tone interaction, F(1,22) = 18.80; p = 0.0003). The reward→shock transition resulted in a decrease in percentage time in the port (multiple-comparison test, Tone A Session 5 vs 6, p = 0.0001), while the shock→reward transition (Tone B Session 5 vs 6, p = 0.1530) did not result in a change, indicating that rats were faster at updating PE behavior when the valence of the cue shifted from positive to negative. By the end of Reversal 1, the time spent in the port had increased significantly for Tone B, reflecting the change in cue valence (multiple-comparison test, Tone B Session 6 vs 10, p < 0.0001), and the time in the port remained low for Tone A throughout the session (multiple-comparison test, Tone A Session 6 vs 10, p > 0.9999).
On the first day of Reversal 2, appetitive behavior rapidly shifted, and rats quickly returned to the pattern of PE behavior seen in the initial training phase with the original cue contingencies (Fig. 6D; Day 10 vs Day 11, session ×tone interaction, F(1,22) = 33.37; p < 0.0001). The difference in reversal speed between the CS + R and CS + S disappeared, as behavior updated to both Tone A (multiple-comparison test, p = 0.0019) and Tone B (p = 0.0001) within this first session. Following the retraining phase after Reversal 2, we conducted a final session under extinction conditions (Day 16), wherein the CS + R and CS + S were presented without outcomes. During extinction, time spent in the port dropped dramatically (Fig. 6E; session × tone interaction, F(1,22) = 51.44; p < 0.0001). Overall, time spent in the port showed a tight coupling with the CS + cue contingencies across sessions, selectively increasing in response to the reward-paired cue and rapidly decreasing to the same cue when it was paired with shock.
Across training, conditioned freezing was generally more prevalent on CS + S trials (Figs. 1, 6F). However, we found that freezing behavior was slower to update than PE behavior upon cue reversal. On the first session of Reversal 1, freezing to either CS+ did not change, despite the shift in contingencies. Freezing discrimination to the reversed cues did eventually emerge by the end of training after Reversal 1 (Fig. 6F,G; Day 10), and subsequently, freezing adapted quickly on the first session of Reversal 2 when cue contingencies returned to their original status (Fig. 6H; Day 10 vs Day 11, session × tone interaction, F(1,22) = 23.21; p < 0.0001). During extinction, freezing decreased to both cues (Fig. 6I; two-way ANOVA, session main effect, F(1,22) = 84.23; p < 0.0001) but remained higher for the CS + S than the CS + R (cue-type main effect, F(1,22) = 79.15; p < 0.0001). Together, our results indicate that pavlovian cue conditioned behavior is generally flexible upon changing valence predictions, but threat-related responding is slower to update than reversal of reward-related responding.
Subregion-specific updating of striatal dopamine signals after valence reversal
We assessed how striatal dopamine responses to the CS + R and CS + S changed when the meaning of the cues were reversed. On the last day of initial training (Session 5), the dopaminergic response in the DLS to both the CS + R and CS + S were characterized by sharp troughs. This was followed by a small peak to the CS + R, while the CS + S response was followed by a slow return to the baseline (Fig. 7B), resulting in an overall larger decrease in dopamine in response to the CS + S than the CS + R in the DLS (AUC, paired t test, t(9) = 7.196; p < 0.0001). On the first day of reversal (Session 6; data shown reflect trial averages from the whole session) the response profiles in the DLS were similar to Day 5: the tone originally predicting shock continued to evoke a stronger decrease in dopamine, despite now predicting reward (Fig. 7C; AUC paired t test, t(9) = 2.971; p = 0.0157; Fig. 7D; two-way ANOVA, tone main effect, F(1,9) = 30.24; p = 0.0004). Thus, upon reversal the DLS responses to each tone remained constant despite their change in predictive nature.
Subregional striatal dopamine signals update to valence reversal at different rates. A, Experimental timeline with the highlighted sessions outlined in this figure. B, The dopamine signals in the DLS in response to Tone A (CS + R) and Tone B (CS + S) differed by Session 5. The response to the CS + R included a trough followed by a shallow peak, while the response to the CS + S included a trough with a slow return to the baseline; the difference is quantified as AUC. C, Upon Reversal 1, the dopamine signal in response to Tone A (now CS + S) still includes a trough followed by a soft peak while the response to Tone B (now CS + R) still includes a trough with a slow return to baseline. The AUC for Tone A remains higher than that for Tone B, even though the tones have reversed in meaning. D, The DLS did not show an updated response only one session after reversal, because the AUC in response to Tone A (CS + R→CS + S) remains higher than that to Tone B (CS + S→CS + R). E, During Session 5, the dopamine signals in the core in response to both Tones A and B included a peak followed by a trough, but the response to Tone A included a faster return to the baseline. F, With Reversal 1, the dopamine signals in the core in response to Tones A and B again included a peak followed by a trough, but now the response to Tone B includes a shallow trough with a faster return to the baseline. G, The core showed updated dopamine responses to Tones A and B within the first session of reversal, with the AUC increasing from CS + S→CS + R and decreasing from CS + R→CS + S. H, During Session 5, the dopamine response in the shell to Tone A (CS + R) included a soft trough followed by a soft peak, while the response to Tone B (CS + S) included a soft trough with a slow return to the baseline. I, Upon reversal in the shell, the dopamine responses to Tones A and B include a soft trough with a return to the baseline. J, The shell did not display a fully updated response to the cue reversal, but it began to change. The response to Tone A (CS + R→CS + S) decreased while the response to Tone B (CS + S→CS + R) increased. Data represent mean ± SEM. ****p < 0.0001; ***p < 0.001; **p < 0.01; *p < 0.05.
During Session 5, the response in the core to both the CS + R and CS + S included a sharp peak followed by a small trough and a return to the baseline, but the return to the baseline was slower in response to the CS + S (Fig. 7E). Similar to the DLS, the dopamine response in the core to the CS + S was larger than for the CS + R (AUC paired t test, t(7) = 2.414; p = 0.0465). However, unlike the DLS, the dopamine response in the core after reversal to the new CS + R was significantly reduced compared with the new CS + S (Fig. 7F; AUC paired t test, t(7) = 3.302; p = 0.0131). Comparing Sessions 5 and 6 for the core, cue-evoked signals updated during the first day of Reversal 1 (Fig. 7G; session × tone interaction, F(1,7) = 11.97; p = 0.0106), which was driven by an increased response to Tone B (p = 0.0090), but not Tone A (p = 0.2292).
In the shell on Session 5, the response to the CS + R included a delayed trough followed by a slow positive response, while the CS + S induced a larger trough that was delayed a greater extent (trough, paired t test, t(4) = 5.280; p = 0.0062; trough latency, t(4) = 3.405; p = 0.0272) followed by a slow return to the baseline (Fig. 7H). As in the DLS and core, the dopamine response in the shell was larger to the CS + S than the CS + R (AUC, paired t test, t(4) = 7.096; p = 0.0021). Upon reversal, there was a similar response in the shell to both the CS + R and CS + S (Fig. 7I; AUC paired t test, t(4) = 0.4480; p = 0.6774). However, comparing signals across Sessions 5 and 6 revealed that both signals changed, becoming more similar (Fig. 7J; interaction, F(1,4) = 20.04; p = 0.0110). Thus, while shell dopamine did not fully reverse by Session 6, it was in the process of updating, unlike the DLS. Multiple comparisons of the AUC for Tone A did not significantly change between Sessions 5 and 6 (p = 0.0752), but the AUC increased for Tone B (p = 0.0169).
Finally, for all regions, we examined the cue-evoked dopamine signals early (first five trials) versus late (last five trials) within the reversal session (Day 6). Waveforms for these trial epochs were strikingly similar, indicating that when signal reversal occurred, it was rapid.
Serial cue reversal effects on valence encoding across dopamine subregions
Rats were trained on the new cue contingencies for four more sessions. In the session before Reversal 2 (Session 10), the dopaminergic response to the CS + R and CS + S in the DLS included a rapid decrease in dopamine with a quick return to the baseline (Fig. 8B). Despite this additional training, the DLS cue responses were never fully updated to the changing contingencies from Reversal 1 (AUC, paired t test, t(9) = 1.945; p = 0.0836), although a trend in cue discrimination had emerged by Session 10 (Fig. 8B). On Reversal 2 (Session 11, cue contingencies returned to their original identity; data shown reflect trial averages from the whole session), the DLS response to the CS + R included a sharp trough followed by a sharp positive peak, while the response to the CS + S only included a sharp trough (Fig. 8C). Although there was only a trend toward cue discrimination on Session 11 (Fig. 8C; AUC; paired t test, t(9) = 1.907; p = 0.0889), comparing Sessions 10 and 11 showed a change in the dopamine signal as a function of session and tone upon Reversal 2 (Fig. 8D; session × tone interaction, F(1,9) = 8.363; p = 0.0178), indicating the response in the DLS was in the process of updating/reversing. By Session 10, the dopamine signal in the core to the CS + R was larger than the CS + S (Fig. 8E; AUC, paired t test, t(7) = 3.079; p = 0.0178). Upon Reversal 2 (Session 11), these cue signals became more similar (Fig. 8F; AUC, paired t test, t(7) = 1.056; p = 0.3259), but comparison of Sessions 10 and 11 suggested that the core signal was in the process of updating/reversing (Fig. 8G; session × tone interaction, F(1,7) = 8.860; p = 0.0206). During Session 10, the dopamine response in the shell to the CS + R was larger than for the CS + S (Fig. 8H; AUC paired t test, t(4) = 3.841; p = 0.0184). Upon reversal (Session 11), these signals were qualitatively updated to reflect the renewed contingencies. As with the DLS and core, the shell signal did not fully discriminate the cue type on Session 11 (Fig. 8I; AUC paired t test, t(4) = 2.047; p = 0.1101), but comparison of Sessions 10 and 11 again suggested that the shell signal was in the process of updating (Fig. 8J; session × tone interaction, F(1,4) = 9.581; p = 0.0364).
Serial cue reversal effects on dopamine signal updating across striatal subregions. A, Experimental timeline highlighting the sessions outlined in this figure. B, In the DLS on Session 10, the dopamine response to Tone A (CS + S) included a trough with a return to the baseline while the response to Tone B (CS + R) included a trough followed by a peak. C, After the second reversal, the cues returned to their original meanings and the dopamine response in the DLS to Tone A (CS + R) included a trough followed by a peak, while the response to tone B (CS + S) only included a trough. D, Comparing the patterns of responding to each cue on Days 10 and 11, the DLS signals updated to reflect the new contingencies. E, The dopamine response in the core on Session 10 to Tone A (CS + S) included a peak with a dip below the baseline, while the response to Tone B (CS + R) included a peak with a return to the baseline. F, Upon Reversal 2 in the core, the dopamine signal in response to Tone A (CS + R) included a peak with a return to baseline, while the response to Tone B (CS + S) included a peak with a dip below the baseline. G, Comparing the patterns of responding to each cue on Days 10 and 11, the core signals were in the process of updating to the new contingencies. H, In the shell on Session 10, the dopamine signal in response to Tone A (CS + S) included a trough, while the response to Tone B (CS + R) included a large peak. I, Upon Reversal 2 in the shell, Tone A (CS + R) developed a large peak, while Tone B (CS + S) developed a negative response. J, Comparing the patterns of responding to each cue on Days 10 and 11, the shell signals updated to reflect the new contingencies. Data represent mean ± SEM. *p < 0.05.
As with Reversal 1, we examined the cue-evoked dopamine signals early (first five trials) versus late (last five trials) within the Reversal 2 session (Day 11). Waveforms for these trial epochs were again similar, indicating that when signal reversal occurred, it was rapid.
Dopamine signals extinguish at different rates for reward versus threat cues across striatal subregions
Following the second reversal, rats were trained with the original cue associations for an additional four sessions. In general, this led to stronger cue discrimination in dopamine signals across all regions in the final session before extinction (Session 15). Dopamine signals in response to the CS + R and CS + S were clearly distinct in all regions (Fig. 9B,E,H; DLS AUC paired t test, t(9) = 3.429; p = 0.0075; core AUC paired t test, t(7) = 3.865; p = 0.0062; shell AUC paired t test, t(4) = 6.751; p = 0.0025).
Variable extinction of striatal dopamine responses to reward versus threat cues across striatal subregions. A, Experimental timeline showing the sessions highlighted in this figure. B, By the final day of training in the DLS, the dopamine signal in response to the CS + R included a trough followed by a peak, while the response to the CS + S included a trough with a slow return to the baseline. C, After extinction in the DLS, the dopamine signal in response to Tone A (previously CS + R) only included a peak. The response to Tone B (previously CS + S) became less distinct, with the trough disappearing. D, Comparing the last day of conditioning (15) with extinction (16), the DLS signals to either cue did not significantly change. E, In the core during the last training session, the dopamine signal in response to Tone A (CS + R) included a large peak while the response to Tone B (CS + S) included a small peak. F, After extinction in the core, the dopamine peak to the previously rewarded cue was much smaller. G, Comparing Days 15 and 16, only the reward cue response diminished with extinction. H, On the final session of training in the shell, the dopamine signal in response to Tone A (CS + R) included a large peak, while the response to Tone B (CS + S) included a small dip below the baseline. I, After extinction in the shell, the dopamine response to the previously rewarded cue was smaller. J, Comparing Days 15 and 16, shell dopamine signals only modestly extinguished. Data represent mean ± SEM. **p < 0.01; *p < 0.05.
During extinction (Session 16), DLS dopamine signals in response to the CS + R remained stable (Fig. 9D; AUC, Session 15 vs 16, multiple-comparison test, p = 0.7553), while the response to the CS + S became smaller (Fig. 9D; trough, paired t test, Session 15 vs 16, t(9) = 2.351; p = 0.0432). DLS dopamine continued to discriminate cue types during extinction (Fig. 9C; paired t test, t(9) = 2.565; p = 0.0304) and when comparing across Sessions 15 and 16 (Fig. 9D,E; no interaction, significant main effect of tone, F(1,9) = 10.28; p = 0.0107).
In the core during extinction, the CS + R signal shrank, but the CS + S signal remained stable (Fig. 9F). The core signals in response to the CS-R showed the most change in extinction, and a comparison of Sessions 15 and 16 revealed that responses varied as a function of both session and tone (Fig. 9F,G; session × tone interaction, F(1,7) = 9.991; p = 0.0056). Despite these changes, the cue signals remained distinct after extinction (AUC; paired t test, t(7) = 2.706; p = 0.0304).
Finally, during extinction, the shell response to the CS + R decreased (AUC, multiple-comparison test, trend, p = 0.0615), while the CS + S signal remained unchanged (AUC, multiple-comparison test, p = 0.4966). This resulted in marginal discrimination in the dopamine signal between cue types (paired t test, t(4) = 2.452; p = 0.0703). Comparing Sessions 15 and 16 in the shell, the results were similar to the DLS, where dopamine signal in response to the CS + R remained higher than to the CS + S (Fig. 9J; tone main effect, F(1,4) = 32.35; p = 0.0047).
Region-unique patterns of valence encoding among striatal dopamine signals
As a final analysis, we wanted to summarize our regional comparisons of dopamine signals and examine encoding of reward and shock cues more directly across striatal sites, to better understand how each region represents cue valence. We calculated a signal bias score based on the ratio of peak (most positive) and trough (most negative) signals measured in the 2 s after onset of either cue types. Thus, a positive score reflects a dopamine response where the absolute value of the peak was larger than that of the trough and a negative score reflects a larger trough than peak. Using this approach, we identified distinct patterns of bias (Fig. 10A–C). Across all three regions, initial responses to both cues were biased negative, particularly in the DLS and shell. In the DLS, with training, the reward cue response increased to a neutral position, while the shock cue response remained negative (Fig. 10A; session × cue-type interaction, F(3,27) = 5.323; p = 0.0052; effect of cue type, F(1,9) = 8.216; p = 0.019). In the core, the reward cue bias became strongly positive, and the shock cue reached a neutral position (Fig. 10B; session × cue-type interaction, F(3,21) = 2.288; p = 0.108; effect of cue type, F(1,7) = 17.27; p = 0.0043). Conversely, in the shell, the reward cue response became strongly positive and the shock cue remained negative across training (Fig. 10C; session by cue-type interaction, F(3,12) = 5.125; p = 0.0164; effect of cue type, F(1,4) = 89.03; p = 0.0007).
Dopamine signals represent valence according to unique striatal subregional scales. Dopamine response bias was computed by comparing the magnitude of the peak (most positive) and trough (most negative) parts of the dLight signal following cue onset. A–C, Signal bias for the reward and shock-predictive CSs changed with different patterns across striatal subregions as training progressed. Dopamine in all regions encoded cue type, with different relative signals. D, At the beginning of training, the DLS and shell signals had a negative bias to the reward cue, while the core signal was neutral. E, In response to the shock cue, all three regions had an initial negative bias. F, At the end of training, across regions, core and shell dopamine signals had developed a positive bias for reward cues, while the DLS reward cue response became neutral. G, In contrast, after learning, the DLS and shell dopamine responses to threat/shock cues were biased negative, while the core signal was neutral. H, I, Early in training, there was no relationship between the position of the recording site in the striatum and the reward cue or threat cue dopamine signal bias. J, K, By the end of training, a significant correlation between placement and reward cue bias emerged, which was not seen for the shock cue. Data in A–G represent mean ± SEM. ***p < 0.001; **p < 0.01; *p < 0.05.
Across training, all three regions came to encode the reward cue with a dopamine signal that was relatively more positive than that evoked by the threat cue. However, comparing regions directly within cue types, there were clear relative differences in the representation of reward and threat cues. Initially for the reward cue (Fig. 10D; one-way ANOVA region comparison, F(2,20) = 3.468; p = 0.0509), the DLS and shell had negatively biased signals (greater trough than peak), while the core response was relatively neutral (roughly equal peak and trough). Initially for the threat cue (Fig. 10E; one-way ANOVA region comparison, F(2,20) = 1.88; p = 0.179), all three regions showed a negative bias. However, by the end of training, these relative patterns changed. For the reward cue (Fig. 10F; one-way ANOVA region comparison, F(2,20) = 9.88; p = 0.001), the DLS had developed a neutral dopamine signal (equally balanced peak and trough components), while the core and shell had become positively biased (greater peak than trough). In contrast, for the threat cue (Fig. 10G; one-way ANOVA region comparison, F(2,20) = 3.92; p = 0.037), the DLS and shell maintained a negative bias (greater trough than peak), while the core signal became neutral.
Finally, we assessed the relationship between recording site location in the striatum and the degree of reward and threat cue signal bias in the dopamine signals on an individual subjects basis. We plotted cue signal bias as a function of medial–lateral position of each rat's striatal recording site. Consistent with the subgroup averaged data, initially there was no correlation between the recording position and bias in dopamine response to either cue types (Fig. 10H,I; Pearson's correlation, reward cue p = 0.423; R2 = 0.0305; shock cue p = 0.868; R2 = 0.0283). By the end of conditioning, bias scores across subjects shifted systematically and a strong correlation emerged, where reward cue bias in the dopamine signal moved from positive to negative along a medial to lateral axis (Fig. 10J; Pearson's correlation, reward cue p = 0.0006; R2 = 0.44). A similar relationship did not emerge for dopamine response bias to the shock cue (Fig. 10K; Pearson's correlation, shock cue p = 0.20; R2 = 0.077), reflecting the complexity of the shock responses within the NAc subregions. Collectively, these data indicate that while striatal dopamine broadly signals a positive-leaning bias for appetitive stimuli, the representation of valence exists within subregion-specific scales.
Discussion
We investigated behavioral flexibility and dopaminergic encoding in a valence discrimination task. From the current data, we highlight five primary conclusions. First, rats readily learned to distinguish between cue types, but reversal sessions suggested a bias in the speed of updating for reward versus threat associations. Second, by recording dopamine signals across three major striatal subregions known for their distinct behavioral specializations (DLS, NAc core, and NAc medial shell), we show substantial heterogeneity in responses of reward and threat associations, bolstering the notion that striatal niches signal different aspects of learning and stimulus meaning. While dopamine in all regions discriminated cue types, the sign, timing, and qualitative pattern of that signal differed in each area. Third, our pavlovian learning context, in which cues predicting positive and negative outcomes are intermingled, promoted unique, multiphasic dopamine signaling patterns, compared with previous studies involving single valence learning contexts. Ambiguity in perceived cue identity was apparent across all striatal regions, especially early in training, where most dopamine signatures following cue onset included both a negative and positive inflection point. Fourth, we show that some conditioned dopamine signals are quite dynamic to changes in expected valence: most responses updated within a single session of cue reversal, an adaptation that was even more robust upon a second reversal. Fifth, we show that biases in the representation of positive and negative conditioned cues in dopaminergic signaling reflect unique subregional scales across the striatum, with local mixtures of value, salience, and cue identity properties. Together, our results offer insight into a number of ongoing questions regarding the role of dopamine in signaling different facets of learning, as well as the broad heterogeneity and dynamic nature of the striatal dopamine system.
Isolating valence encoding in the striatal dopamine system
Notably, our data show that dopamine in each region encoded valence, as the signal to the reward cue was uniformly more positive than the signal in response to the shock cue. This is broadly consistent with classic notions of the role of dopamine in reward and reinforcement (Schultz et al., 1997; Wise, 2004), as well as more recent investigations into dopamine subregions (van Elzelingen et al., 2022a). Other recent works have begun to broaden that framework; however, suggesting that within the NAc core, dopamine signals salience rather than valence (Ventura et al., 2007; Bromberg-Martin et al., 2010), serving to integrate strength, stimulus intensity, and novelty (Kutlu et al., 2021). Our results support this framework, given that we found strong positive dopamine responses in the core within the first ∼1 s after cue and stimulus onset in both reward and shock conditions, throughout training. This suggests that increased core dopamine denotes the presence of an important stimulus (Redgrave et al., 1999; Horvitz, 2000; Ungless, 2004) rather than only providing information regarding its valence. Signals in the NAc core readily discriminated between reward and shock cues, however, which was evident in examination of the later part of the response signal (1–2 s after cue onset). For shock cues, this second phase of the core signal reflected a decrease in dopamine that was maintained across training. In contrast, decreases in dopamine in response to the reward cue disappeared as training progressed. Thus, the core dopamine signal likely reflects some integration of salience and valence, potentially transmitted on different timescales early in the cue period, as ambiguity over the stimulus identity is resolved. We saw evidence of distinct biphasic dopamine signals in the shell and DLS as well, although the directionality of dopamine changes in those regions in response to the CS + R and CS + S more closely resembled their valences at the end of training—a net positive increase in dopamine in response to the CS + R and a net negative decrease in dopamine in response to the CS + S. This pattern in the shell and DLS suggests the prioritization of cue value over salience. Our data collectively indicate that functional heterogeneity in dopamine signaling exists not just across different anatomical regions but also dynamically within a given region at different points across brief temporal windows.
The notion, supported by our data, that striatal dopamine broadly represents stimulus valence is generally consistent with previous studies across a variety of tasks and recording methods, but we find some differences. For example, previous studies have shown that dopamine neurons projecting to the DLS are excited by aversive stimuli, while dopamine neurons projecting to the ventral striatum are inhibited by aversive stimuli (Matsumoto and Hikosaka, 2009; Lerner et al., 2015). In other studies, dopamine terminals and dopamine release in the medial shell are excited by aversive stimuli while being inhibited in other NAc subregions (Badrinarayan et al., 2012; Oleson et al., 2012; De Jong et al., 2019; Goedhoop et al., 2022). Yet others have reported uniformly negative dopamine signals across the striatum in response to aversive stimuli (van Elzelingen et al., 2022a). We found a different set of response patterns here, where shock-evoked dopamine in the DLS showed a biphasic, negative-then-positive response, the core showed a positive response, and the shell showed no clear response. We also found, in contrast to some previous studies (Brown et al., 2011), that the DLS, core, and shell all developed positive responses to a reward-predictive stimulus, although it took longer for this signal to emerge in the DLS. A number of important methodological differences exist across these studies, and more work is required to further understand how different striatal regions signal valence, but our results suggest that multivalent, conflict-based, or valence-ambiguous learning contexts (Kim et al., 2010; Burgos-Robles et al., 2017; Kutlu et al., 2020) likely produce unique dopamine response patterns compared with other learning environments.
Dynamic dopamine signals track evidence accumulation
Our studies underscore the importance of considering complex decision-making contexts involving multiple learned associations, in this case where motivation must be derived from stimulus ambiguity, which is variable trial to trial (Bromberg-Martin et al., 2010; Kim et al., 2010, 2012; Burgos-Robles et al., 2017) (Bromberg-Martin et al., 2010; Kim etal., 2010, 2012; Burgos-Robles et al., 2017). A multivalent, dynamic learning landscape presents more opportunities for error signals, as more sensory elements in the environment carry new information, and dopamine under these conditions may represent an integration of multiple factors, including errors in predictions of stimulus identity (Sharpe et al., 2017; Takahashi et al., 2017; Keiflin et al., 2019). A relatively unique feature of our data is that across recording sites, cue onset usually resulted in a multiphasic response, where the dopamine signal included an increase above and dip below the baseline within the first couple seconds. This pattern suggests that stimulus identity was ambiguous, resulting in each CS cue simultaneously evoking elements of reward- and threat-related representations. As conditioning progressed, cue signals generally became less multiphasic, suggesting that their predictive identity became more rapidly discerned with learning. This interpretation is also supported by dopamine responses to the less ambiguous CS−, which were never multiphasic. Notably, while all recording sites showed some degree of biphasic dopamine responses to each CS+, the order of negative versus positive deflections differed systematically, suggesting a fundamental inherent bias in the representation of valence across striatal dopamine niches. The mechanisms underscoring these region-specific encoding rules likely encompass differences in the local striatal machinery as well as distinct regulation within the thalamo-cortico-striatal network, where DLS is more interconnected with sensorimotor regions and the ventromedial striatum with frontolimbic regions (Haber and Knutson, 2010; Frank and Badre, 2012; Ito and Doya, 2015; Lee et al., 2024; Mohebi et al., 2024).
Notably, the resolution of stimulus uncertainty is itself rewarding, and the basal ganglia are centrally implicated in this process (Gottlieb et al., 2014; Bromberg-Martin and Monosov, 2020; Vellani et al., 2020). The offset/end of cues in our task was especially meaningful, as they resolved the remaining ambiguity regarding cue identity. We found cue offset signals in a number of conditions in our data, including in response to the termination of the CS−. Cue offset dopamine signals have been seen in other conditions, such as the end of a rewarding cue (Kalmbach et al., 2022) and at shock omission (Kutlu et al., 2023). Cue offset dopamine may also reflect a “relief” or safety signal in our task (Oleson et al., 2012; Luo et al., 2018; Stelly et al., 2019), either confirming that shock will not occur or that the shock occurrence is over. In task structures involving uncertainty, the behavioral impact of dopamine may reflect the accumulation of evidence that helps produce an appropriate behavioral response (Lak et al., 2017, 2020; Beste et al., 2018; Mikhael et al., 2022; Fraser et al., 2025). Overall, this complexity underscores the difficulty in assigning specific meaning to any given dopamine signal, because it likely represents a multifaceted state simultaneously incorporating value, salience, novelty, information, and other factors that vary on hyper-brief timescales. An interesting prediction from our results is that rapid stimulus-evoked dopamine fluctuations likely contribute to different computations, and thus different features of behavioral output, as evidence accumulates within a trial (Lak et al., 2017, 2020; Bromberg-Martin and Monosov, 2020; Coddington et al., 2023; Blanco-Pozo et al., 2024; Mohebi et al., 2024; Fraser et al., 2025).
Recent work suggests that NAc core dopamine plays a key role in resolving cue-based uncertainty regarding reward prediction (Mikhael et al., 2022; Fraser et al., 2025). Our results further suggest that rapid fluctuations in dopamine across striatal regions tracks stimulus identity and valence on a trial-by-trial basis, which could aid in the real-time disambiguation of the appropriate motivational state. While dopamine in all regions discriminated reward and shock cues, updating of these signals upon cue reversal was faster for the NAc recording sites, compared with the DLS, suggesting that DLS dopamine may track a relatively longer running rate of conditioned cue value that is less tied to moment-to-moment stimulus changes (Fiorillo et al., 2003; Hart et al., 2015; Kim et al., 2015). Consistent with this, DLS dopamine signals showed the least amount of change during extinction in our study. The relative slowness of DLS updating is perhaps consistent with classic notions of the role of this region in stimulus–response, habit-like, or inflexible behavior (Everitt and Robbins, 2005; Yin and Knowlton, 2006; Yin et al., 2009; Hart et al., 2015; Klanker et al., 2017; Malvaez and Wassum, 2018; Nestler and Lüscher, 2019; Lerner, 2020). Interestingly, we saw robust extinction of behavioral responses, suggesting persistent cue-evoked DLS dopamine signals may partly reflect local constraints on signaling dynamics rather than a neural readout of habit.
It will be critical to follow our studies with direct functional manipulations of dopamine signaling, to assess the impact on flexible valence discrimination. Based on previous work, inhibition of the NAc and dopamine signals there is thought to generally disrupt reward-related reversal learning and behavioral flexibility, while inhibition of DLS is thought to facilitate flexible responding (Faure et al., 2005; Floresco et al., 2006; Yin and Knowlton, 2006; Adamantidis et al., 2011; Bergstrom et al., 2018; Malvaez et al., 2018; Radke et al., 2019; van der Merwe et al., 2023). Notably, other recent studies suggest that the link between DLS activity and behavioral inflexibility is unclear (Lerner et al., 2015; Vandaele et al., 2019; Seiler et al., 2022; van Elzelingen et al., 2022b). The various learning contexts of past studies are likely critical for shaping different functional patterns of dopamine signals. We note that most reversal-learning studies have focused on instrumental behavior in monovalent, reward-based contexts, and there may be important differences in striatal engagement during pavlovian valence discrimination, as described in our task. Our results suggest that ambiguity introduced by multivalent learning contexts produces complex, multiphasic dopamine responses across regions, and it remains unclear what direct functional role these more transient parts of cue-evoked dopamine signals play. Given the complex nature of our observed dopamine transients following cue presentations, it will be important to apply highly precise interventions to functionally isolate the behavioral contributions of phasic peaks from phasic inhibitions in future studies. Furthermore, while our results add new insight into dopamine heterogeneity in a major set of striatal regions, it will also be important to investigate other dopaminergic niches. Dopamine signaling in dorsomedial striatum, lateral shell, and posterior “tail” of the striatum, for example, likely contribute unique functions in this context (Cox and Witten, 2019; Lerner, 2020; van Elzelingen et al., 2022a; Green et al., 2024).
Here, by measuring striatal dopamine signals and behavior during a multivalent pavlovian discrimination task, we uncover region-specific patterns of dopamine signaling. We show that the DLS, NAc core, and NAc shell encode cues of positive and negative valence with differing response profiles, and valence ambiguity dynamically reshapes these dopamine signals as contingencies change. This work provides a new insight into functional heterogeneity in the dopamine system and highlights unique, subregion-specific rules for the encoding of value-based decision–making.
Footnotes
This work was supported by NIH Grants T32 MH115886 (T.J.O) and R00 DA042895, R01 MH129370, R01 MH129320, and R01 DA057292 (B.T.S). We thank all members of the Saunders Lab for their support and helpful discussion surrounding this project.
The authors declare no competing financial interests.
K.N.B.’s current address: Stanford University School of Medicine, Department of Anesthesiology, Perioperative and Pain Medicine.
- Correspondence should be addressed to Benjamin T. Saunders at bts{at}umn.edu.