Abstract
Action control is hypothesized to be mediated by corticothalamo-basal ganglia loops subserving the acquisition and updating of action contingencies. Within this, the mediodorsal thalamus (MD) is thought to contribute to volitional control over behavior largely through its interactions with prefrontal cortex. However, MD also projects into striatum, the main input nucleus of the basal ganglia, and the contribution of such projections to behavioral control is not known. Using a mouse model of volitional action control in either sex, here we find that MD terminal calcium activity in dorsal medial striatum (MD→DMS) represents action information during initial acquisition of a novel action contingency. This representation of action information decreases with continued experience. Data demonstrate MD→DMS activity is necessary to learn and employ a contingency control over actions. Functional attenuation of MD→DMS activity negated normal exploration, instead biasing repetitive action control, and resulted in mice unable to adapt their initial action strategy upon changes in action contingency. This suggests MD supports plasticity underlying initial action strategy learning used to adjust control given changing contingencies. Overall, these data show that MD projections into striatum contribute to volitional action control that supports acquisition of adaptive behavior.
Significance Statement
Mediodorsal (MD) thalamus is hypothesized to support volitional action control. However, focus has largely been on MD input into prefrontal cortical regions and the contribution of MD input to striatum has not been explored. Here we show that MD input into dorsal medial striatum supports acquisition of goal-directed strategies and their control over actions.
Introduction
Volitional control over our actions is critical for adaptive control, with such control mediated by corticothalamo-basal ganglia (CTBG) circuitry. Within this, the mediodorsal thalamus (MD) is thought to play a central role as a higher order thalamic nucleus that is reciprocally innervated by prefrontal cortical regions and innervated by basal ganglia, midbrain, and brainstem (Groenewegen, 1988). Indeed, structural and functional differences have been observed in disease states where volitional control is disrupted including schizophrenia, drug use disorders, and major depressive disorder (Pakkenberg, 1990; Zipursky et al., 1992; Young et al., 2004; Segobin et al., 2019; Perez-Rando et al., 2022). Research into mechanisms underlying deficits in behavioral control have largely focused on MD’s reciprocal connectivity with prefrontal cortex (PFC). However, MD also sends nonreciprocal projections directly into basal ganglia, and the contribution of such projections to behavioral control is not known.
MD is hypothesized to be important for acquiring and updating volitional control over actions, as well as more broadly in maintaining and updating mental representations (Wolff and Vann, 2019). Inhibiting or lesioning mediodorsal thalamus disrupts behavioral sensitivity to common tests of volitional control including outcome devaluation and contingency degradation (Corbit et al., 2003; Mitchell et al., 2007; Pickens, 2008; Parnaudeau et al., 2015; Wicker et al., 2018; Fisher et al., 2020). MD contribution to behavioral control has been suggested to depend upon the downstream cortical target as MD projects to PFC in a topographic manner (Krettek and Price, 1977; Alcaraz et al., 2016; Wolff and Halassa, 2024). While investigations of MD projections into different cortical regions have shown functional differences in their contributions to behavioral control (Schmitt et al., 2017; Rikhye et al., 2018; Wolff and Halassa, 2024), single cell reconstruction of individual MD neurons has found extensive axon collateralization across cortex including the following: cingulate, agranular insular, frontal association area, infralimbic area, lateral orbital area, secondary motor area, medial orbital area, piriform cortex, prelimbic area, retrosplenial dysgranular area, secondary visual area, ventrolateral orbital area, and ventral orbital area (Kuramoto et al., 2017). Intriguingly, MD neurons projecting to PFC also collateralize with terminals in striatal regions (Royce, 1983; Joffe and Grueter, 2016; Birdsong et al., 2019; Winnubst et al., 2019). Thus MD→PFC projections likely also influence basal ganglia function. Indeed, the thalamus is a main driver of striatum, which is also heavily implicated in behavioral control. MD sends glutamatergic projections into the dorsal medial striatum (DMS; Berendse and Groenewegen, 1990; Hunnicutt et al., 2016; Birdsong et al., 2019), which has a critical role in volitional control reminiscent of MD. For example, DMS is also necessary for learning new action–outcome contingencies as well as executing behavior while using outcome and contingency information (Yin et al., 2005). While MD and its cortical connections have recently been shown to contribute to cue-guided control over actions (Leung et al., 2024), the contribution of MD input into DMS to volitional control is unknown.
Here we examine the contribution of MD's projections into DMS during a volitional action control task in which we can examine the use of action information to adjust ongoing behavior (Cazares et al., 2022; Schreiner et al., 2022, 2023). We find that MD→DMS projections carry information about recent and current actions and disruption of this circuit impairs the use of recent experience to guide ongoing behavior. Our findings support a hypothesis that MD input to striatum is important for volitional action control.
Materials and Methods
Animals
Male and female C57BL/6J, Slc17a6tm2(cre)Lowl/J (VGlut2-Cre) (Vong et al., 2011), and B6.Cg-Tg(Drd1a-tdTomato)6Calak/J (D1-tdTomato) (Ade et al., 2011; Jax Stock #000664, #016963 and #016204) mice were housed 1–5 per cage under a 14/10 h light/dark cycle with access to food (LabDiet 5015) and water ad libitum unless stated otherwise. Mice were at least 6 weeks of age prior to intracranial injections and implants, and at least 7 weeks of age prior to behavioral training. All surgeries and behavioral experiments took place during the light portion of the cycle. Exploratory analyses for sex and genotype differences in the behavioral cohort revealed similar levels of behavioral performance, and thus data was collapsed across sex and across genotypes. The Animal Care and Use Committee of the University of California San Diego approved all experiments and experiments were conducted according to NIH guidelines “Principles of Laboratory Care.” Investigators were not blind to experimental groups.
Surgical procedures
Mice underwent isoflurane induction of anesthesia (4−5%) and then were placed onto a stereotaxic apparatus (Kopf) where anesthesia was maintained at 1–2% throughout the surgery. Virus was injected at a rate of 100 nl/min using a 0.5 ul Hamilton syringe. The syringe was left unperturbed for 3 min to allow for diffusion. Coordinates for MD thalamus were as follows from bregma in mm: AP −1.35, ML +/−1, and DV −3.65, and the syringe was set to a 12′ angle to avoid hitting ventricles. Coordinates for DMS from bregma in mm: AP + 0.5, ML +/1.5, DV −3.25.
Tracing
To localize collateral synaptic terminals from neurons projecting from MD→DMS, C57Bl/6J mice were injected with 150 nl of AAV1 hSyn FLEx mGFP-2A-Synaptophysin-mRuby in MD (Table 1; Addgene viral prep # 71760-AAV1) and 200 nl of AAV5-hSyn-GFP-Cre in DMS (UNC Vector Core). To ensure adequate viral expression, we waited at least 4 weeks before brain extractions. Mice were killed with CO2 and brains were extracted and fixed in 4% paraformaldehyde. Brains were sliced 50 uM thick using a vibratome (Pelco easiSlicer) and mounted on slides with Vectashield mounting medium. Slides were imaged for mRuby and GFP with a fluorescent microscope (Olympus BX43).
Reagent or resource used in the study
Lesions
For general disruption of DMS, we used 200 nl 0.12 M NMDA in sterile saline with a coinjection of 100 nl hSyn-eYFP. The presence or absence of lesions was determined by cell counts at bregma = AP 0.5. Slices were prepared with NeuN antibody and sliced at 50 nM thick using a vibratome (Pelco easiSlicer). Slices were incubated in Alexa Fluor 488 conjugate ABN78A4 at a concentration of 1:500 to localize neurons. Slices were then mounted onto slides using Vectashield and imaged at 2× and 10× on a fluorescence microscope (Olympus BX43). Neuronal cell counts were done of entire 10× magnification images centered over DMS using ImageJ.
Slice electrophysiology
Surgical procedure
To activate thalamic glutamatergic afferents, D1tdtomato mice were injected with 150 nl AAV-hSyn-hChR2(H134R)-EYFP (Addgene viral prep # 26973-AAV5) in MD. For adequate viral expression, slice electrophysiology experiments occurred at least 3 weeks after intracranial injection. Live 250 uM tissue slices were taken of MD on the day of each experiment to confirm viral targeting using a macro fluorescence microscope (Olympus MVX10).
Brain slice preparation
Mice were at least 8 weeks of age at the time of slice preparation. Mice were rapidly anesthetized with isoflurane inhalation, and brains were extracted and placed into 4°C oxygenated ACSF containing the following (in mM): 210 sucrose, 26.2 NaHCO3, 1 NaH2PO4, 2.5 KCl, 11 dextrose, bubbled with 95% O2/5% CO2. Coronal slices (250 µm thick) containing the DS were taken using a Pelco easiSlicer (Ted Pella). Slices were then transferred to an ACSF with the following concentrations for at least 1 h prior to recording (in mM): 120 NaCl, 25 NaHCO3, 1.23 NaH2PO4, 3.3 KCl, 2.4 MgCl2, 1.8 CaCl2, 10 dextrose.
Patch-clamp electrophysiology
Whole-cell patch-clamp recordings were made in D1-tdtomato mice and D1-Cre mice that had a cre-dependent tdtomato and C57Bl6J mice, providing a recording population of unidentified putative SPNs, SPNs expressing td-tomato under the dopamine-type 1 promoter and putative dopamine-type 2 receptor expression neurons (nonlabeled neurons in D1-tdtomato mice). Visualization of fluorophore expression was done using an Olympus BX51WI microscope mounted on a vibration isolation table and a high-power LED (Thorlabs, LED4D067). Recordings were made in ACSF containing (in mM): 120 NaCl, 25 NaHCO3, 1.23 NaH2PO4, 3.3 KCl, 0.9 MgCl2, 2.0 CaCl2, and 10 dextrose, bubbled with 95% O2/5% CO2. ACSF was continuously perfused at a rate of 2.0 ml/min and maintained at a temperature of 32°C. Picrotoxin (50 µM) was included in the recording ACSF to block GABA receptor-mediated synaptic currents. Recording electrodes were made with borosilicate glass capillaries (Sutter Instrument) using a PC-10 puller (Narishige International) to yield resistances between 3 and 8 MΩ. Pipettes were filled with ice-cold internal solution with the following concentrations (in mM): 135 KMeSO4, 12 NaCl, 10 HEPES, 0.5 EGTA, 2 MG-ATP, 0.3 Tris-GTP. Access resistance was monitored throughout the experiments and cells in which access resistance varied >20% were not included in the analysis.
Electrophysiology data acquisition
Recordings were made using a MultiClamp 700B amplifier (Molecular Devices), filtered at 2 kHz, digitized at 10 kHz with an Instrutech ITC-18 (HEKA Instruments). Acquisition and stimulation were performed using WinWCP (University of Strathclyde). Glutamatergic afferents from MD thalamus were stimulated optically using 470 nm blue light (4 ms) delivered via field illumination using a high-power LED (LED4D067, Thorlabs). Light intensity was adjusted to produce optically evoked excitatory postsynaptic currents (oEPSCs) with a magnitude of 50–400 pA. For paired pulse ratio, two EPSCs were separated by an ISI of 50, 100, 150, and 200 ms for three trials, collected at 0.1 Hz. Data from each neuron within a group were combined and presented as mean and SEM.
Electrophysiology data analysis
Data was analyzed with WinWCP (Strathclyde Electrophysiology Software) finding the peak at the first and second oEPSC, and Microsoft Excel was used to calculate the paired pulse ratio. For patch-clamp experiments, PPR data were analyzed using a two-way ANOVA with Bonferroni-corrected post hoc analyses. Data was analyzed using GraphPad Prism 10.2.3 (GraphPad Software). Statistical significance was defined as an alpha of 0.05.
Behavior
Animals were trained on a lever pressing task where they learned to hold down and release a lever for a given minimum duration to receive a food reward (Cazares et al., 2022; Schreiner et al., 2022, 2023). Two days prior to operant training, mice were food restricted to 85–90% of their baseline weight. Behavioral training and testing were conducted in sound-attenuating operant chambers (Med Associates) under different schedules of reinforcement using MEDPC IV. Each day mice completed one session that stopped after 60 reinforcers or 90 min, whichever came first. Each session started with the house light turning on. The first day, animals were introduced to the operant reinforcer, grain pellets (dustless precision pellets, Bio-Serv, Formula F0071), via a random time delivery where on average a pellet was delivered every minute (RT60 for 60 min). The following 3–4 d the lever was extended at the onset of the session and stayed extended for the duration of the training session. Every lever press produced a reward with an increasing number of potential earned rewards under a continuous reinforcement schedule (CRF15, CRF30, CRF60). In fiber photometry experiments, we attached the optical fiber for one more day of CRF60 to habituate them to the fiber prior to beginning duration criteria training. We set arbitrary criteria for lever press duration on subsequent training days, with reward only delivered following release of a lever press that was equal to or surpassed criteria. During the first 6 d of criteria training, the minimum duration criteria was 800 ms. The criteria then increased to a minimum duration of 1,600 ms for 6 more days. A separate subset underwent sensory-specific outcome devaluation testing (Adams and Dickinson, 1981), whereby across 2 d mice were allowed to prefeed on an outcome for 1 h and then given 5 min to lever press under extinction (no reward delivered). On the Valued day, mice were given a control 20% sucrose solution they had prior exposure to, while on the Devalued day mice were given the food pellet normally earned by lever pressing.
Fiber photometry
Fiber photometry surgical procedures
To target thalamic glutamatergic neurons, VGlut2-Cre mice were injected with 150 nl of hSyn-FLEX-axon-GCaMP6s (Addgene viral prep # 112010-AAV5) or hSyn-DIO-GFP in MD. A stainless steel ferrule with a 400 µm optical fiber was placed unilaterally in DMS targeted to MD projection fields. Fibers were anchored to the skull using mini screws [Amazon, Amazon Mini Screws: PN: AMS90/1P-25 (00–90×) 1/16 SL PAN SST] and fast curing orthodontic acrylic resin (Lang Dental). Once the resin was cured, a small dab of Krazy Glue all-purpose super glue (Newell Office Brands) was placed to further anchor the ferrule to the dental cement. To ensure adequate viral expression, we waited at least 3 weeks prior to recording photometry data. After the experiment concluded, we perfused the brains with 10 mM CaCl in order to activate the GCaMP to visualize viral spread since there was no additional fluorophore. A 4% paraformaldehyde was perfused and brain extracted and placed into 4% paraformaldehyde for an additional 24–48 h. We took 100 uM sections of the brains of each animal, and viral expression was confirmed using a macro fluorescence microscope (Olympus MVX10).
Data collection
Ferrule-implanted animals were unilaterally attached to bifurcated 400 uM optical fibers. Bifurcated 400uM optical fibers (Thorlabs) were attached using connectors (ADAL3, Thorlabs) after initial training and throughout the contingency training procedures. Each day that the recording took place, mice were lightly anesthetized to connect them to the fiber, and then recording and the session took place ∼15 min post anesthesia exposure. A 470 nm LED (Thorlabs) excited virally expressed GCaMP6s (<70 uW/mm2) while a 405 nm LED (Thorlabs) served as an isobestic control. Fluorescence was monitored via a 4× objective (Olympus) focused on a CMOS camera (FLIR systems). Using Bonsai software (Lopes et al., 2015), regions of interest for each bifurcation of the fiber were selected. The fluorescence signal was collected at 20 Hz to produce two digitized signals: one for each animal connected to the bifurcated fiber. Behavioral timestamps for beginning and end of lever press, head entry, and reinforcer delivery were simultaneously sent to the Bonsai software from TTL MEDPC pulses through microcontrollers (Arduino). After each session, a csv file with timestamps, behavioral events, and raw fluorescence was saved.
Data analysis
Data was imported into Matlab 2022a (MathWorks) for analysis using custom scripts. Raw fluorescence signals underwent a median (fifth order) filter and low-pass filter (cutoff frequency 1 Hz) to reduce noise and electrical artifacts (Cazares et al., 2022). To correct for the slow decay seen from photobleaching of fluorophores during the session, we used high-pass filtering (cutoff frequency 0.001 Hz). Of note, when we used a double exponential fit, the results did not differ. Isobestic signal was fitted to and subtracted from the signal. Filtered signals were inspected for any fiber decoupling. To ensure the presence of a calcium signal, we only included sessions where the 97.5th percentile of DF/F0 exceeded 1% (Markowitz et al., 2018). Session data were excluded if an animal did not make at least three lever presses that met duration criteria. Peri-event changes in fluorescence and z-score normalization of signal corresponding to pre-event onset period. We conducted analyses in two different ways. First, to preserve within subject variation, Z-scored fluorescence traces were combined across all mice. Secondly, to examine between-mouse variability, we also averaged these z-scored ΔF/F0 traces for a given animal session and then averaged these traces across mice within a group. To examine calcium signals during the duration of lever presses, activity was modified using Akima interpolations (interp1 Matlab), excluding any lever press that was fewer than 2 samples (100 ms) in duration. To determine whether there were significant differences in photometry signal between groups or between trial types, we used permutation testing where the data was shuffled 1,000 times and required at least five consecutive samples (or three samples for interpolated activity) to be different from one another (Jean-Richard-dit-Bressel et al., 2020). For visual display purposes only, calcium activity traces were smoothed with Matlab’s Savitzky–Golay smoothdata method using a 20-sample (or 1-sample for interpolated activity) sliding window.
Chemogenetic inhibition
A dual viral strategy was taken to bilaterally target and inhibit MD→DMS projection neurons, different from that used for fiber photometry experiments. For this experiment, C57Bl6J mice were injected with 100 nl AAV-hSyn-DIO-hM4D(Gi)-mCherry (Addgene viral prep # 44362-AAV5) or a control virus AAV-Flex-tdtomato (Addgene viral prep # 28306-AAV5) in MD and 300 nl AAV-hSyn-GFP-Cre (UNC Vector Core) in DMS (coordinates as above). We waited 4 weeks for adequate viral expression before beginning training. Animals went through 800 and 1,600 duration criteria training as described above. A subset of animals went through outcome devaluation as described above. Clozapine-N-oxide (CNO, 1.0 mg/kg, 10 ml/kg, Sigma) was administered to both experimental and control subjects intraperitoneally 30 min prior to the start of each 800 and 1,600 ms criteria behavioral session and prior to outcome devaluation procedure.
Quantification and statistical analyses
Statistical significance was defined as an α of 0.05. Statistical analysis was performed using GraphPad Prism 10.2.3 (GraphPad Software) and custom Matlab R2022a (MathWorks) scripts. Acquisition data including lever presses, response rates, head entries, percent of lever presses that met criteria, and reinforcers earned were analyzed using one-way ANOVA (for the photometry experiment) or two-way ANOVA (for the DREADD experiment). Multiple comparisons were corrected for the false discovery rate.
Linear mixed-effect models
Linear mixed-effect (LME) models were built to investigate the relationship between individual lever presses (n) and immediate prior lever press (n − 1) as well as other experiential information such as time in session, head entry, whether a reward was received, interpress interval, and others. Random intercept terms for mouse and training day were included in order to account for the nonindependent, repeated structure of the session data. Fixed terms included the overall percentage of rewarded lever presses as well as the timestamps of each lever press to account for variance explained by overall performance in a session. To test how predictive relationships were contingent upon their sequential order, beta coefficient outputs pertaining to each behavioral measurement of interest were compared with a 1,000 order-shuffled distribution of beta coefficients using permutation testing. Importantly, shuffling occurred within individual sessions/mice to preserve overall performance statistics (e.g., total lever presses made). Order shuffling was done independently for each behavioral variable. The LME model for the DREADD cohort consisted of the following formula:
To determine how far back the predictive relationship existed between press n and any particular n-back press, we built and 100 shuffled-order tested a similar LME model that included additional variables accounting for the duration of lever press n and n-back (n − 1 through n − 10) lever press durations as follows:
To investigate the main effect of DREADD or DMS lesion treatment and interactions between treatment and n − 1 duration, treatment and n − 1 outcome, and treatment and IPI, we built a 1,000 shuffled-order additional LME. Regression coefficient terms βx in these models were as previously described, with the added covariates for main effects of treatment (Tx, binary 1 for experimental and 0 for control groups) and its interactions with prior lever press duration (Dn−1∗Tx), prior outcome (On−1∗Tx), prior presence of a head entry (HEn−1∗Tx), and interpress interval (IPIn−1∗Tx; Supplementary Table 3):
LME models of Ca2+ activity
For MD→DMS fluorescence activity monitoring experiments, LME models were built to predict Ca2+ activity given current and prior lever press durations, whether or not a reward was received, whether or not a head entry was made, the interpress interval, the timestamps of each lever press, and the area under the curve of n − 1 (Supplemental Table 2). The mean area under the curve of each behavioral epoch (−1 s to onset of lever press, interpolated lever press duration, and 0 to 0.5 s after the offset of the lever press) was calculated to predict activity at each time point. The following formula was used:
Results
MD projects to striatum
Mediodorsal thalamus sends axons into cortex and striatum (Hunnicutt et al., 2016), and mouse MD→DMS projections have also been shown to collateralize in cortex (Birdsong et al., 2019; Winnubst et al., 2019). We first confirmed MD sends projections into DMS. We employed a dual-virus tracing strategy in C57Bl6J mice by injecting AAV1 hSyn FLEx mGFP-2A-Synaptophysin-mRuby in MD and 200nl of AAV5-hSyn-GFP-Cre in DMS (Fig. 1A), such that targeted MD neurons would express mRuby in their terminals. We found mRuby expression in DMS (Fig. 1A); we also found evidence MD→DMS projection neurons collateralize in orbitofrontal cortex (OFC) and medial prefrontal cortex (mPFC), two regions whose MD inputs have been hypothesized to subserve different functions in relation to behavioral control (Cross et al., 2013; Mitchell and Chakraborty, 2013; Bolkan et al., 2017; Schmitt et al., 2017; Rikhye et al., 2018; Balleine, 2019; DeNicola et al., 2020; Fig. S1).
Mediodorsal thalamus projects into DMS. A, Schematic and photomicrographs of injection of AAV5-FLEX-mGFP-2a-synaptophysin-mRuby in mediodorsal thalamus and AAV-GFP-Cre in dorsomedial striatum, with photomicrograph mRuby expression in striatum. B, Schematic of optogenetic strategy targeting AAV-ChR2-eYFP to MD and whole-cell patch-clamp recording in DMS. C, Example optically evoked EPSCs (oEPSCs) and paired pulse ratio (oEPSC 2/oEPSC1). Data are shown as mean ± SEM. See also Supplementary Figure 1.
To confirm that MD sends monosynaptic glutamatergic input to dorsal striatum spiny projection neurons (SPNs), we injected AAV-hSyn-hChR2(H134R)-EYFP unilaterally into mediodorsal thalamus and recorded ex vivo from slices containing DMS (Fig. 1B). Optically evoked EPSCs were recorded with picrotoxin in the bath, allowing us to look exclusively at excitatory projections. We examined the probability of neurotransmitter release at these synapses using a paired pulse ratio protocol (Fig. 1C). We recorded from genetically targeted SPNs of the direct and indirect pathway, as well as unidentified SPNs; however, as there were no differences, we collapsed data across all SPNs. Paired pulse ratios varied across cells regardless of label for direct or putative indirect SPNs (animal n = 6, cell n = 10; RM one-way ANOVA, F < 0.25, p > 0.75), with some individual cells showing paired pulse depression while others showed paired pulse facilitation. This suggests that there is variability in the likelihood of neurotransmitter release from MD terminals onto SPNs in DMS.
Volitional action control and DMS
To examine volitional action control and confirm its dependency on DMS, mice underwent bilateral Sham or 0.12 M NMDA infusions into DMS (see Materials and Methods; Fig. 2A,B). A two-way ANOVA (Treatment × Cell Count) showed NMDA infusions resulted in significantly fewer neurons than controls (F = 13.10, p = 0.007), no effect of hemisphere and no interaction (ps > 0.23; Fig. 2C, Fig. S2). Sham and DMS lesioned mice were trained in a self-paced instrumental task where reward was contingent on mice holding down the lever for a minimum arbitrary duration (Fig. 2D; Cazares et al., 2022; Schreiner et al., 2022, 2023). That is, reward would only occur at the offset of lever press that met or exceeded the duration criteria. The presence of an analog variable (lever press duration) allowed us to examine how a mouse modifies its actions from one to the next, in contrast to traditional action outcome contingency tests that involve degrading the relationship between lever press and reward (Adams and Dickinson, 1981). After initial lever press training at a fixed ratio 1 (FR1), the minimum arbitrary duration was set at 800 ms for 6 d. We then shifted the action contingency to 1,600 ms for an additional six days. Behavior was self-initiated, self-paced, and self-terminated, with no predictive cues signaling required duration or reward delivery.
DMS lesions disrupt use of action information. A, Schematic of strategy to lesion DMS with a bilateral 200 nl injection of 0.12 M NMDA into DMS. B, Representative pictograph (10× magnification) showing NeuN-related fluorescence in DMS under 10× magnification for a Sham mouse (top) and DMS lesion mouse (bottom). C, Quantification of cells in Sham and DMS lesion mice. D, Description of task and timeline of behavioral training. Acquisition data showing (E) total lever presses over days of training, (F) percent of lever presses that met duration criteria over days of training, and (G) reinforcers earned over days of training. H, Histogram of durations (200 ms bins) of lever presses made under 800 and 1,600 ms lever press duration criteria across training days. I, β coefficients showing relationship between lever press duration n and lever press duration n − 1 for Sham and DMS lesion groups as well as shuffled data. Data are shown as mean ± SEM. *p < 0.05. See also Supplementary Figure 2.
Sham and DMS lesion mice similarly decreased the number of lever presses across days under each duration criteria (two-way RM ANOVA main effect of Day F(5,60) > 3.63, ps < 0.006, no effect of Lesion or Day × Lesion interaction; Fig. 2E), and the shift from 800 to 1,600 ms duration criteria led to an increase in lever presses made (planned post hoc comparison last day 800 to first day 1,600 comparison: ps < 0.01). The efficiency (lever presses that met or surpassed the duration criteria) of their performance also increased across days under each duration criteria (mixed-effects model Fs(5,56) > 3.25, ps < 0.01), once again with a reduction in efficiency upon the increase in lever press duration (planned comparison last day 800 to first day 1,600 comparison: ps < 0.01; Fig. 2F). Sham and DMS lesion mice also made similar numbers of lever presses that met criteria (i.e., rewards earned) under 800 and 1,600 ms duration criteria (no effects; Fig. 2G). The distribution of lever press durations made shifted similarly from under 800 to 1,600 ms duration criteria for both Sham and DMS lesion mice (Fig. 2H; two-way RM ANOVA Duration × Frequency interaction F(1,17) = 35.00, p < 0.001).
However, actions aimed at the same goal can be mediated through different behavioral controllers (Adams and Dickinson, 1981; Hikosaka et al., 1999; Yin and Knowlton, 2006; Gremel and Costa, 2013). We then examined the contribution of prior lever press durations as well as other experiential variables such as reward presence and checking behavior to ongoing lever press durations. We built LME models (see Materials and Methods) that measured the predictive relationship of these behavioral events on the subsequent lever press duration (n) and compared LME regression coefficients (β) against lever press order-shuffled data via permutation testing. We added Lesion group as a term in our model and found a significant interaction between Lesion group and prior press duration (n − 1) on predicting subsequent lever press duration (Fig. 2I, Table S1). A representation of β coefficients segregated by Lesion group showed Sham mice with a significant predictive relationship between the most recent lever press duration (n − 1) and the subsequent lever press duration (n); that is, they used prior action information to guide ongoing lever press durations. This use of prior lever press duration (n − 1) to inform current press duration (n) was reduced in the DMS lesion mice. In addition, Lesion group also showed significant interactions with additional experiential variables including prior checking behavior, prior outcome, and prior interpress interval (Table S1). This suggests DMS lesions leave mice less reliant on information gained from their recent experience to guide their subsequent lever press durations, including the use of the immediate prior lever press duration. Thus, DMS recruitment normally supports action control in this task.
MD-DMS terminal activity modulated during volitional actions
We next investigated whether this DMS-dependent volitional action learning and performance modulates MD→DMS projection neuron activity. We used fiber photometry and monitored bulk calcium activity from MD→DMS terminals as a proxy of recruitment (Broussard et al., 2018). We injected a cre-dependent axon-targeted GCaMP (hSyn-FLEX-axon-GCaMP6s) into the mediodorsal thalamus of VGlut2-Cre (n = 12, 5 M 7F) mice (Fig. 3A), thereby limiting viral expression to thalamic glutamatergic neurons (Fremeau et al., 2001). We placed our optical fiber over DMS to target MD terminals in DMS (Fig. 3A) as mice learned to lever press for a food reward under 800 and 1,600 ms duration criteria (Fig. S3).
Mediodorsal projections into DMS modulate activity during actions. A, Schematic of viral strategy to examine calcium activity of MD terminals in DMS and representative pictograph of GCaMP6s expression in MD (top) and DMS (bottom). B, Z-scored change in fluorescence over baseline fluorescence prior to the onset of the lever press (C) during an interpolated duration of lever press (center panel) and (D) after the offset of the lever press. Data is shown as the mean ± SEM (shaded area) for all traces, with black dashed lines indicating significant differences between traces based on permutation testing. E, Schematic example demonstrating use of LME model to predict activity. F, β coefficients for the relationship between MD→DMS terminal activity and n and n − 1 lever presses (n = 8,114 trials/12 mice) prior to onset (G) during an interpolated duration of lever press and (I) at offset of the lever press as well as order-shuffled data (white inset bars). Data are shown as mean ± SEM. *p < 0.01. See also Supplementary Figure 3.
We initially segmented fluorescence activity traces based on whether or not a lever press would go on to meet or surpass the assigned criteria and be rewarded. We examined neural activity 2 s prior to the onset, during the duration of the lever press, and 0.5 s following offset of lever press, the latter which constitutes a period of time prior to when the mouse makes an head entry for food reward retrieval (Fig. S3). We found significant differential modulation of MD terminal calcium activity between the to-be rewarded and unrewarded lever presses across all lever press epochs. MD→DMS terminal calcium modulation increased prior to lever press onset for all lever presses, and to a greater degree for to-be rewarded than unrewarded lever press (permutation testing ps < 0.05; Fig. 3B). For to-be rewarded lever presses, MD→DMS terminal calcium modulation was dynamically modulated during the duration of the lever press where it was gradually suppressed compared with the unrewarded lever presses until just before offset when such calcium modulation began to increase (ps < 0.05; Fig. 3C). This was not seen in unrewarded lever presses. At the offset of the lever press, a significant increase in modulation was observed only in rewarded lever presses (Fig. 3D). The above patterns of modulation were not driven by an individual animal or day (Fig. S3). Thus overall, there was greater modulation of MD→DMS terminal calcium activity for lever presses that would produce reward, compared with those that did not.
We next examined whether eventual performance outcomes reflected in activity modulation were driven by use of action-related information and/or other experiential information. We built a LME model to predict lever press-related modulation of calcium activity given current and prior lever press durations and compared against order-shuffled data (Fig. 3E). We also included other sources of information including whether or not a reward was received, whether or not a head entry was made and the interpress interval (see Materials and Methods; Table S1). Across lever press-related behavioral epochs, there was a significant relationship between MD terminal calcium activity modulation and lever press duration. MD terminal modulation reflected the most recent lever press duration (n − 1) across onset, while holding down the lever, and immediately after lever press release (Fig. 3F–H). That is, greater increases in MD→DMS terminal activity across lever press epochs were associated with longer prior (n − 1) lever press durations. In contrast, lower levels of activity during lever pressing and after corresponded with longer current (n) lever press durations (Fig. 3G,H). MD terminal activity was also modulated by other experiential information used to guide volitional actions, including making a head entry, whether an outcome was delivered or not, and the passage of time (Table S1). Thus, bulk modulation of MD→DMS terminal activity reflects lever press duration information related to volitional action control as well as other sources of accrued experience.
Attenuation of MD-DMS neurons alters action strategy learning and performance
As MD→DMS projections can collateralize with cortex, it could be that MD→DMS activity reflects a corollary discharge of ongoing behavior. It could also be that MD→DMS activity is important for early learning processes supporting acquisition of such action control in DMS. To test the functional contribution of MD→DMS activity to the acquisition and performance of volitional actions, we took a chemogenetic approach to attenuate MD–DMS projection neuron activity via expression of an inhibitory Designer Receptor Exclusively Activated by a Designer Drug (DREADD; Armbruster et al., 2007). To limit expression of the DREADD to MD neurons that send axons to DMS, we injected a AAV-hSyn-GFP-Cre into DMS and a Cre-dependent inhibitory DREADD [H4, AAV5-hSyn-DIO-hM4D(Gi)-mCherry, n = 10, 6F, 4M] or control virus (AAV5- FLEX-tdtomato, n = 13, 8F, 5M) into the MD of C57Bl6J mice (Fig. 4A,B; Fig. S4). All mice (control and H4-expressing) were given 30 min pretreatment with the DREADD actuator CNO prior to each 800 and 1,600 ms training day. DREADD attenuation of MD→DMS activity did not alter acquisition performance of lever press behaviors (Fig. 4C–E; Fig. S4). Control and H4 groups made similar numbers of lever presses across both 800 and 1,600 ms criteria sessions (Fig. 4C; two-way RM ANOVA, 800 d: main effect of session F(1.64,34.53) = 7.04, p < 0.005, no effect of treatment and no interaction, p’s > 0.8; 1,600 d: main effect of session F(2.08,43.71) = 24.48, p < 0.0001, no effect of treatment and no interaction, p’s > 0.19). Control and H4 animals also had similar efficiency of lever presses under both duration criteria (Fig. 4D; two-way RM ANOVA, 800 d: main effect of session F(2.41,50.57) = 6.76, p = 0.0014, no effect of treatment and no interaction, p’s > 0.42, 1,600 d: main effect of session F(3.43,72.50) = 16.52, p < 0.0001, no effect of treatment and no interaction, p’s > 0.53). Further, MD→DMS DREADD attenuation did not alter the number of reinforcers earned across 800 and 1,600 ms criteria training days (Fig. 4E; two-way RM ANOVA, 800 d: main effect of session F(2.07,43.44) = 6.35, p = 0.0035, no effect of treatment or interaction, p’s > 0.34, 1,600 d: no effect of session, treatment and no interaction, p’s > 0.10). Thus overall acquisition performance of lever press behavior was largely intact following attenuation of MD→DMS activity.
MD→DMS attenuation disrupts use of action information. A, Schematic of dual viral strategy to inhibit MD→DMS neurons. AAV-hSyn-DIO-hm4D(Gi)-mCherry in MD, AAV-hSyn-GFP-Cre in DMS. B, Representative photomicrograph of mCherry expression in MD thalamus. Acquisition of lever press behavior with (C) total lever presses across days of training, (D) percent of lever presses that met duration criteria over days of training, and (E) reinforcers earned over days of training. F, β coefficients for n − 1 duration using an LME segmented by criteria duration and treatment (800 ms criteria n = 30,640 trials/23 mice; 1,600 ms criteria n = 45,915 trials/23 mice). Significance indicators above bars show comparison to 1,000 shuffled duration. Shuffled data removed for visual representation. Data are shown as mean ± SEM. *p < 0.05. See also Supplementary Figure 4.
We next examined whether MD→DMS attenuation would alter the underlying strategy used to acquire and perform lever pressing (Adams and Dickinson, 1981; Hikosaka et al., 1999; Yin and Knowlton, 2006; Gremel and Costa, 2013). We constructed a behavioral LME model that accounted for MD→DMS DREADD treatment (see Materials and Methods; Table S1) for both 800 and 1,600 ms training days. We compared our model with 1,000-shuffled data where shuffling was done independently for each behavioral variable, and shuffling occurred within each individual mouse and session.
There was a significant Duration n − 1 × Treatment interaction for 800 ms (F(1,30640) = 66.36, p < 0.001) as well as for 1,600 ms duration criteria (F(1,45915) = 7.99, p < 0.0001; Table S1). However, the direction of effect of MD→DMS attenuation was opposite for 800 and 1,600 ms duration criteria. Treatment-segmented n − 1 β coefficients of 800 ms duration criteria training days showed H4 mice with a more positive relationship between prior and current lever press duration compared with control mice (Fig. 4F, Table S1; significant compared with 1,000 group-shuffled data). This suggests that during initial action contingency learning, MD→DMS attenuation resulted in mice making more consecutive lever presses of similar durations. In contrast, under 1,600 ms duration criteria, treatment-segmented n − 1 β coefficients of 1,600 ms training days showed MD→DMS attenuation reduced use of recent action information in H4 mice (Fig. 4F, Table S1; significant compared with 1,000 group-shuffled data). This suggests that upon a shift in action contingency, MD→DMS attenuation prevented mice from updating their prior repetitive action strategy and instead left MD→DMS attenuated mice unable to use any prior lever press duration information to guide current pressing.
To directly investigate whether this effect of MD→DMS attenuation emerged across acquisition and was apparent upon and following experience with the shift in action contingency, we segregated training days under each duration criteria into early and late training and constructed LMEs for each of these epochs (Fig. 5A). We found significant use of prior action information (n − 1 β coefficients) for both control and H4 mice during early 800 ms training (ps < 0.0001; Fig. 5B, Table S1). Late in 800 ms duration criteria training, treatment-segmented n − 1 β coefficients were still significant for both groups (Ctl mice: F(1,4122) = 51.26, p < 0.0001; H4 mice: F(1,3310) = 141.16, p < 0.0001). A two-way ANOVA (Treatment × Day) performed on n − 1 β coefficients showed the use of prior action information to guide current lever press duration strengthened across 800 duration criteria training for both control and H4 mice (main effect of Day F(1,26751) = 4.70, p = 0.03) and that H4 mice showed an overall greater reliance on the duration of the most recent press than control mice (main effect of Treatment F(1,26751) = 39.65, p < 0.0001; no significant interaction).
MD→DMS attenuation disrupts the updating of action contingencies. A, Schematic of training days, with early and late groups of days for analyses highlighted. B, β coefficients for n − 1 duration using an LME segmented treatment and by early (control n = 11,201 trials, H4 n = 8,122 trials) and late (control n = 4,122 trials, H4 n = 3,310 trials) days of 800 ms criteria and for (C) early (control n = 10,931 trials, H4 n = 11,176 trials) and late (control n = 5,864 trials, H4 n = 4,030 trials) days of 1,600 ms criteria (23 mice). Significance indicators above bars show comparison to 1,000 shuffled duration. Shuffled data removed for visual representation. D, β coefficients for n − 1 duration treatment-segmented between late 800 ms and early 1,600 ms showed a significant interaction via two-way ANOVA. Data are shown as mean ± SEM. *p < 0.001.
In contrast, when the duration criteria shifted to 1,600 ms, MD→DMS attenuation disrupted such use of recent action information. Early in 1,600 ms duration criteria training following the shift from 800 ms, the model showed significant n − 1 β coefficients for control mice indicating they still relied on prior action information (F(1,10931) = 23.957, p < 0.0001); however, H4 mice did not (F(1,11176) = 2.54, p = 0.11; Fig. 5C, Table S1) Late in 1,600 ms training, significant model n − 1 β coefficients showed control mice (F(1,5864) = 17.63, p < 0.0001) still relied on prior lever press duration to inform current press behavior and that H4 mice now showed use of lever press duration information (F(1,4030) = 5.302, p = 0.021). A two-way ANOVA (Treatment × Day) performed to directly compare across stages of 1,600 ms training showed a significant effect of Treatment only (F(1,31977) = 4.76, p = 0.03), with H4 mice overall relying less on prior duration information than control mice.
To investigate the epochs encompassing the shift in action contingency, we performed a direct comparison of treatment-segmented n − 1 β coefficients on the late 800 ms and early 1,600 ms epochs and found a significant interaction (Fig. 5D; two-way ANOVA Day × Treatment: F(1,52375) = 20.31, p < 0.0001; main effect of Day: F(1,52375) = 74.58, p < 0.001; marginally significant effect of Treatment: F(1,52375) = 4.358, p = 0.04). Multiple comparisons correcting for the false discovery rate found that both groups reduced their use of recent duration information upon the shift to the 1,600 ms duration criteria (qs ≤ 0.006). However, during late 800 ms duration criteria training days, H4 mice showed a greater use of recent duration information than control mice (q = 0.008). This reversed following the shift to early 1,600 ms, in which control mice showed a greater, albeit reduced from 800 ms, use of recent duration information compared with H4 mice (q = 0.013). This suggests that following the shift to a longer duration control mice still relied on an action strategy of using recent lever press duration information to guide current press duration, while H4 mice did not. That is, MD→DMS attenuation resulted in initially learning and performing a repetitive strategy for 800 ms duration rule that when shifted to 1,600 ms necessitated the recruitment of a different action control strategy instead of updating a previous strategy.
Discussion
Overall, we found evidence that MD→DMS projections contribute to self-initiated, self-paced, volitional action control. By employing a DMS-dependent instrumental task that allowed us to examine how an animal uses prior lever press information to guide current lever pressing, we found MD terminals in DMS reflect current as well as prior action information. This MD→DMS activity modulation contributed to the acquisition and updating of action contingency information used for ongoing behavior. These findings raise the hypothesis that MD input into DMS contributes to early plasticity supporting acquisition of action strategies.
While the DMS is critical for acquisition of action-related behaviors, the sources of input supporting such control have not been fully investigated. Focus has largely been on cortical inputs into DMS, with prelimbic cortex afferents to DMS shown to support initial plasticity underlying acquisition of goal-directed behavior (Balleine and Dickinson, 1998; Ostlund and Balleine, 2005; Hart and Balleine, 2016; Hart et al., 2018). Prelimbic input converges with additional cortical afferents, as well as thalamic input (Hunnicutt et al., 2016). Prior work has found input from parafascicular nucleus of the thalamus is necessary for the updating of existing behavioral contingencies but does not support initial learning (Bradfield et al., 2013a,b; Bradfield and Balleine, 2017). Here, our data suggests that input from another region of thalamus, specifically MD input into DMS, may contribute to initial acquisition of goal-directed strategies that include use of action contingency information.
Indeed MD has been implicated in aspects of initial goal-directed learning. In rats, MD lesions prior to training specifically disrupts the ability to use changes in action–outcome contingency to guide behavior (Corbit et al., 2003), while MD lesions made posttraining but prior to testing did not (Ostlund and Balleine, 2008). In another study, MD lesions were performed in monkeys after training, following which lesioned and control monkeys then underwent additional shifts in task contingencies. There, MD lesioned monkeys showed reduced use of past choices and their associated consequences to guide adaptive decision-making (Chakraborty et al., 2016). Interestingly, after lesioned monkeys accrued an extended choice history, they were just as capable as controls to use their choices to guide their future behavior (Chakraborty et al., 2016). This suggested that while MD may contribute to setting up plasticity during initial learning, experience can recruit additional mechanisms to support eventual learning and performance. However, when contingencies change and need to be updated, MD activity may be once again recruited.
In the present data, we attenuated MD→DMS activity prior to the onset of daily action training. MD→DMS attenuated mice did not show deficits in their overall acquisition performance of the initial 800 ms duration criteria; however, they did employ a different control strategy. Instead of the normal balance in exploratory behavior that control mice exhibited in the use of prior lever press durations to inform behavior, MD→DMS attenuated mice showed increased reliance on lever press durations just made, reminiscent of exploiting a known strategy. That is, they relied to a greater extent on a strategy of repeating a lever press duration that was more similar to the one just made (n − 1), with this similarity maintained further back in lever press history as well (n − 2, n − 3, n − 4; data not shown). However, when the lever press duration contingency increased, MD→DMS attenuated mice showed a complete loss of reliance on prior lever press durations, a strategy which control mice were still able to maintain. This suggests MD→DMS attenuated mice were unable to update their prior repetitive action strategy employed and instead had to learn a new strategy. Although there could be species differences in strength and density of MD→DMS projections, we found that similar to findings in monkeys (Chakraborty et al., 2016), with continued experience attenuated mice did eventually show limited use of prior lever press durations to support ongoing decision-making, suggesting they could eventually come to rely on prior action information.
Another way to examine MD contributions to initial goal-directed learning is through outcome devaluation studies, where MD lesions made prior to training, but not post training, disrupt sensitivity to outcome value changes (Corbit et al., 2003; Mitchell et al., 2007; Wicker et al., 2018). Here, a small subset of the MD→DMS attenuated mice went through an outcome devaluation procedure and also showed reduced sensitivity to outcome devaluation (Fig. 4-1). Interestingly, MD projections into prelimbic cortex have also been implicated in supporting goal-directed control assessed via outcome devaluation (Bradfield et al., 2013a,b). Thus, overall, our data suggests that MD supports initial acquisition as well as updating of goal-directed related information through in part its projections into DMS.
The convergent evidence from MD→DMS terminal recordings as well as observed changes in behavioral strategy employed after attenuation of MD→DMS provides evidence for the hypothesis that MD’s input into dorsal striatum contributes to early plasticity processes supporting volitional action control. Here we show that lesions to the DMS prior to training disrupt the use of immediate prior action information from informing current behavior. Prior findings have shown that initial acquisition of varying DMS-dependent behaviors is accompanied by plasticity changes in DMS (Costa et al., 2004; Yin and Knowlton, 2006; Yin et al., 2009; Perrin and Venance, 2019). For example, initial training under reinforcement schedules biasing goal-directed control produced increased AMPA/NMDA ratios in direct pathway dopamine-type 1 receptor expressing spiny neurons (D1-SPNs), but decreased AMPA/NMDA ratios in indirect pathway dopamine-type 2 receptor expressing spiny neurons (D2-SPNs; Shan et al., 2014). Our data shows MD synapses onto both D1 and D2 SPNs in DMS and rabies tracing studies have found that MD synapses on local parvalbumin positive and cholinergic interneuron populations as well (Klug et al., 2018). The dynamical modulation of MD→DMS calcium activity during the lever press itself does suggest that ongoing changes in transmission could underlie potential contributions to plasticity. Other thalamic input into striatum is thought to support the updating of behavioral contingencies through recruitment of cholinergic interneurons (CIN; Bradfield et al., 2013a), and the role of MD functional CIN input should be investigated. In general, thalamic input converges with cortical and monoaminergic input onto striatal circuits to shape plasticity. This striatal plasticity results in activity changes that support downstream control over motor output as well as thalamo-cortical feedback. Perhaps early in acquisition and upon changes in action contingencies, such plasticity is important for shaping which cortico-thalamo-basal ganglia circuit gains control over behavioral strategy, with MD playing a key role in helping to shape DMS plasticity supporting goal-directed processes.
There are several limitations to our study. While we took a dual-viral approach to target MD neurons that project to striatum, some MD→DMS neurons may also project to other regions. Our analysis approach using LME modeling of lever pressing behavior required many days of training to be appropriately powered, thus limiting our ability to do repeated targeted injections of CNO via cannula (which can lead to infection) into striatum. We did not use an optogenetic strategy given concerns that light stimulation would not cover the entirety of the diffuse projections from MD to striatum (Hunnicutt et al., 2016) and more importantly, limitations of using current readily available opsins for site-specific targeted terminal inhibition (Mahn et al., 2016). We did not track mouse movement throughout sessions and are unable to examine the contribution of this projection to the movement of action itself or its vigor. Another caveat is that we monitored bulk calcium activity of MD neurons in striatum, thus not allowing us to examine the full dynamics of MD input onto SPN subtypes and interneurons. However, we did see predictive information at the level of terminal bulk activity, and this was present in all epochs of lever pressing. Together, the convergent evidence from DREADD attenuation and MD terminal activity experiments strongly suggests MD→DMS afferents into DMS contribute to action control.
Potential sources of action-related information the MD is receiving are unknown. Prior work has shown that orbitofrontal cortex, a major source of input to MD, contributes to the encoding of lever press durations for future use (Cazares et al., 2022). However, mPFC has been implicated in learning new rules (Balleine, 2019) and like OFC, has strong reciprocal connectivity with MD (Alcaraz et al., 2018; Collins et al., 2018). MD also receives subcortical information from ventral pallidum (Groenewegen, 1988; Leung et al., 2024), serving as a direct connection from the basal ganglia for potential rapid updating of ongoing behavior. Future investigations on whether MD→DMS neurons are sending similar information to their cortical partners, and if so, what function this direct pathway to basal ganglia might be serving, are also warranted. One hypothesis is that MD is providing both feedforward information to cortical regions and feedback to the striatum. This could allow for efficiency in the learning and adaptation to new action contingencies.
These results have important implications for understanding neuropsychiatric disorders where volitional action control is disrupted. Whether MD-striatum are involved in exploratory behavior in humans is unknown. MD-caudate connectivity has been demonstrated in both 3T and 7T MRI (Metzger et al., 2010; Eckert et al., 2012) but little is known about this pathway in disease states.
Data Availability
The data reported in this paper will be shared by the lead contact upon request. Data analysis code is freely available online at https://github.com/gremellab. All scripts/functions were executed using Matlab 2022a.
Footnotes
This work was funded by NSF-GRFP DGE-2038238 (E.T.B.) and R01AA026077 (C.M.G.).
The authors declare no competing financial interests.
This paper contains supplemental material available at: https://doi.org/10.1523/JNEUROSCI.0835-25.2025
- Correspondence should be addressed to Christina M. Gremel at cgremel{at}ucsd.edu.











