Abstract
Pavlovian conditioning is thought to involve the formation of learned associations between stimuli and values, and between stimuli and specific features of outcomes. Here, we leveraged human single neuron recordings in ventromedial prefrontal, dorsomedial frontal, hippocampus, and amygdala while patients of both sexes performed an appetitive Pavlovian conditioning task probing both stimulus–value and stimulus–stimulus associations. Ventromedial prefrontal cortex encoded predictive value along with the amygdala, and also encoded predictions about the identity of stimuli that would subsequently be presented, suggesting a role for neurons in this region in encoding predictive information beyond value. Unsigned error signals were found in dorsomedial frontal areas and hippocampus, potentially supporting learning of non-value related outcome features. Our findings implicate distinct human prefrontal and medial temporal neuronal populations in mediating predictive associations which could partially support model-based mechanisms during Pavlovian conditioning.
- amygdala
- human single neurons
- intracranial recordings
- Pavlovian conditioning
- Reinforcement learning
- vmPFC
Significance Statement
In this study, epilepsy patients implanted intracranially with depth microelectrodes performed a Pavlovian conditioning task. We measured single neuron activity in ventromedial prefrontal cortex, amygdala, and hippocampus, and found a representation of predictive identity coding and Pavlovian expected reward values. Additionally, unsigned error signals were found in dorsomedial frontal areas as well as hippocampus. This study thus provides a rare glimpse into the role of frontal human neurons in predictive associative learning of values and stimulus identities.
Introduction
In Pavlovian conditioning, animals learn to predict outcomes and adapt their behavior in the form of conditioned responses (Pavlov, 1927; Rescorla, 1988). Understanding the associations that underpin the acquisition of Pavlovian conditioned responses has been an active area of research, neurally and behaviorally (Rescorla, 1988; Schultz, 2006; Dayan and Berridge, 2014; Fanselow and Wassum, 2016; O’Doherty et al., 2017).
An influential theory of Pavlovian conditioning is the Rescorla–Wagner (RW) rule and variants thereof. These models assume that conditioning occurs via a prediction error (PE) signal, indicating discrepancies between expectations associated with the conditioned stimulus (CS) and the unconditioned stimulus (Rescorla and Wagner, 1972). These models can account for a variety of phenomena observed in conditioning experiments, such as blocking, over-expectation, and conditioned inhibition (Kamin, 1967; Rescorla, 1969; Lattal and Nakajima, 1998). Compelling empirical support for the role of RW-type PEs in appetitive Pavlovian learning has emerged from the finding that the phasic activity of dopamine neurons in the midbrain resembles a reward-specific PE, manifesting many of the specific predictions of the RW model (Schultz et al., 1997; Tobler et al., 2003, 2006).
However, the RW model has long been known to fail to account for numerous phenomena observed in Pavlovian conditioning, such as sensory preconditioning (Rescorla, 1980; Jones et al., 2012) and revaluation sensitivity (Robinson and Berridge, 2013). To explain such behavioral phenomena, other forms of associations have been suggested, including stimulus-stimulus associations (Pavlov, 1927; Rescorla, 1982). These associations act as “model-based (MB)” Pavlovian learning, in which features about an outcome, such as predictions about its identity, are used to update values, analogous to the MB algorithms utilized in instrumental conditioning (Daw et al., 2005; Gläscher et al., 2010; Howard et al., 2015; Pauli et al., 2019). Broadly speaking, MB learning is a class of reinforcement learning algorithms in which a probabilistic transition structure between environmental states underpins decisions (Sutton and Barto, 2018), which is hypothesized to support Pavlovian conditioning (Dayan and Berridge, 2014).
Learning about stimulus features other than value is theoretically proposed to be mediated by a kind of PE signal known as the state-PE or identity PE (Gläscher et al., 2010; Howard et al., 2015; Boorman et al., 2016; Takahashi et al., 2017; Howard and Kahnt, 2018; Boorman et al., 2021). This type of error encodes discrepancies between an agent’s expectation about which stimulus will occur and which stimulus actually occurs, irrespective of the amount of reward involved. Such error signals have been found to be present in many brain areas in human fMRI studies, including a frontoparietal network encompassing dorsomedial prefrontal cortex, as well as in dopaminergic regions of the midbrain, the orbitofrontal cortex (OFC), and striatum (Gläscher et al., 2010; Rushworth et al., 2011; Lee et al., 2014; O’Doherty et al., 2015; Howard and Kahnt, 2018; Witkowski et al., 2022).
The goal of the present study was to investigate neural representations of Pavlovian associative learning at the cellular level in humans, leveraging high-resolution single neuron recordings. A fundamental gap in previous studies is understanding how MB learning utilizes stimulus–stimulus associations in the context of Pavlovian learning, beyond instrumental conditioning. To accomplish this goal, we asked subjects to perform a sequential appetitive Pavlovian learning task in which visual stimuli and outcomes were paired probabilistically. The sequential task design enabled the identification of MB processes such as the encoding of stimulus–stimulus associations.
We hypothesized that neurons in ventromedial prefrontal cortex (vmPFC) and amygdala might play a role in encoding predictive information about stimulus identity, consistent with a role for these structures in stimulus–stimulus learning, and in MB inference more generally (Jones et al., 2012; Howard et al., 2015; Boorman et al., 2016; Pauli et al., 2019). Furthermore, we tested whether neural correlates of PEs that could underpin learning of stimulus–stimulus associations could be found at the single neuron level.
Materials and Methods
Electrophysiology and recording
We used Behnke–Fried hybrid depth electrodes (AdTech Medical), positioned exclusively according to clinical criteria. Broadband extracellular bipolar recordings were performed with a sampling rate of 32 kHz and a standard bandpass of 0.1–9,000 Hz (ATLAS System, Neuralynx Inc.).
We performed electrode localization with pre- and post-operative MRI scans. We utilized the same pipeline detailed elsewhere (Kyzar et al., 2024), briefly described here. Pre- and post-operative scans were aligned with Freesurfer (Reuter et al., 2010), then a forward mapping from scans to the CIT168 template brain map (Tyszka and Pauli, 2016) was computed with the Advanced Normalization Tools suite (Avants et al., 2008), resulting in a projection of post-operative scans onto standardized MNI152-registered space. Electrode locations were then visualized and mapped in this space with Freesurfer. The data set reported here was obtained bilaterally from hippocampus (HIP), amygdala (AMY), vmPFC, dorsal anterior cingulate cortex (dACC), and pre-supplementary motor area (preSMA) with one macroelectrode on each side. Each macroelectrode contained eight 40 μm diameter microwires. To correct for noise and motion artifacts, recordings were locally referenced, with one of the eight microwires serving as a reference in every bundle, by subtracting its activity from all the other wires in the same bundle. No units were recorded on the reference wire.
vmPFC (including neurons on the medial orbital surface) as well as the amygdala are two brain areas that have long been suggested to play a role in value-based learning and Pavlovian learning in particular (Schoenbaum et al., 1998; Gottfried et al., 2003; Holland and Gallagher, 2004; Hampton et al., 2006; Paton et al., 2006; Ostlund and Balleine, 2007; Kennerley and Wallis, 2009; Salzman and Fusi, 2010; Rudebeck et al., 2017; Wang et al., 2017; Pauli et al., 2019). On the other hand, the hippocampus is a brain region that has been previously implicated in MB inference (Wimmer and Shohamy, 2012; Boorman et al., 2016; Miller et al., 2017). Additionally, the anterior cingulate cortex and preSMA in dorsomedial frontal cortex are regions that have been previously found to prominently encode error signals (Bonini et al., 2014; Fu et al., 2019; Sajad et al., 2019; Fu et al., 2022; Aquino et al., 2023).
Patients
Twelve patients (eight females) were implanted with depth electrodes for seizure monitoring prior to potential surgery for treatment of drug resistant epilepsy. One of the patients performed the task twice, totaling 13 recorded sessions. Human research experimental protocols were approved by the Institutional Review Boards of the California Institute of Technology and the Cedars-Sinai Medical Center. Patients provided informed consent and were reimbursed with a 50 dollar gift card at the end of their participation period. Electrode location was determined based on preoperative structural MRI and post-operative MRI and/or CT scans obtained for each patient as described previously (Minxha et al., 2020). Patient information (sex and age) is displayed in Table 1.
Patient labels, sex, and age for all subjects
Experimental design
Patients performed a sequential Pavlovian conditioning task (Pauli et al., 2015) in which two fractals were presented in sequence, acting as distal and proximal conditioned stimuli (CSd and CSp, respectively). Following the pair of CS (CSd and CSp), a video outcome was presented, either in the form of a hand depositing the patient’s candy of choice into a paper bag, for a positive outcome, or in the form of an empty hand approaching a paper bag, for a neutral outcome. Patients were informed that each time they received a positive outcome contributed to a grand total of actual candy pieces they would receive at the end of the experiment. Every patient experienced exactly 48 win trials and 48 no-win trials. Prior to the task, patients visually inspected and chose one among five possible candy options (Reese’s Pieces, Hershey’s Kisses, York Peppermint, Werther’s Caramel, or Hi-Chew) to be delivered, according to their preference. After the end of the session, they received the closest possible equivalent to 200 calories in their candy of choice to reward them for the 48 win trials they experienced.
Trial structure is detailed in Figure 1a. After a 1 s fixation period, a distal CS was presented for 3 s, followed by a 1 s fixation period. Then, a proximal CS was displayed for 3 s, followed by an outcome presentation video edited to a length of 3 s. Intertrial intervals were jittered between 0.5 and 1 s. The distal CS was always presented along the vertical axis, either above or below the center of the screen, at a random position. Similarly, the proximal CS was always presented in a random position along the horizontal axis, inside one of two possible squares positioned to the left or the right of the screen. There was no relationship between CSd identity and display location, and display positioning was randomly chosen for each trial. As an attention check, patients were asked to perform a button press whenever they saw a CSp inside one of the squares, reporting which of the two squares it was. To ensure this remained a purely Pavlovian paradigm, patients were instructed that these button presses did not affect the outcome of the trial in any way. Button presses were collected with a CEDRUS button box.
Pavlovian conditioning task and behavior. a, Trial structure. After a fixation period, patients saw a sequence of two CS, distal and proximal, with a 1 s fixation period in between them. Then, outcomes were presented: for positive outcomes, a video of a hand depositing a piece of candy in a bag; for neutral outcomes, an empty hand approaching a bag. b, Rating screen. Before the task and between blocks, patients used a sliding bar to rate fractal stimuli according to their subjective preference, from extremely unpleasant to extremely pleasant. c, Electrode positioning. Each dot indicates the location of a microwire bundle in amygdala (magenta), hippocampus (yellow), preSMA (red), dACC (blue), or vmPFC (green). d, Trial types. Stimuli transitioned from distal to proximal according to a common/rare probabilistic structure. The same two fractals were used as CSp throughout the entire task, while new fractals were picked as CSd in every block. CSd+/CSdn denote the distal stimuli associated with rewards or neutral outcomes in common transition trials, respectively. Proximal stimuli were fully predictive of outcomes. e, Pupil diameter change at CSp. We compared pupil diameter during CSp presentation with a baseline period, for each trial type, testing whether expectations and PEs interacted in determining conditioned pupil responses. Error bars represent SEM. The largest pupil dilation was elicited in rare transition trials from neutral CSd to positive CSp. f, Changes in stimulus ratings, grouping distal and proximal. After each block, patients rated stimuli for their subjective preference. We compared how ratings changed for each fractal compared to its previous value, depending on whether they were a positive or neutral CS in that block.
Each session contained a total of 96 trials, split into four blocks of 24 trials, with a 2 min break between blocks. Within each block, there were four possible CS, two distal (A, B) and two proximal (X, Y). The identity of the distal stimuli was re-selected in every block, but the two proximal stimuli were kept the same throughout the entire session, though their valences were reversed between blocks. There were four trial types in total (Fig. 1c): two common transitions (A-X-Reward and B-Y-No reward) and two rare transitions (A-Y-No reward and B-X-Reward), meaning that the CSp was always fully predictive of the outcome, while the CSd was only partially predictive. Trial frequencies were
The 10 fractals used in each session (two CSd fractals for each block and two CSp fractals for the whole session) were chosen specifically for each patient out of a set of 24 possible fractals using the following procedure: before the beginning of the task, patients rated all 24 fractals for their subjective preference using a sliding bar from extremely unpleasant to extremely pleasant. The 10 stimuli which elicited the most neutral responses were selected for the experiment and assigned randomly to their CSd or CSp roles. All the 24 fractals had the same mean luminance. We enforced this by enforcing equal average red-green-blue (RGB) values across pixels for all fractals, along the (0,0,0) to (255,255,255) spectrum. At the end of every block, patients rated the four fractals they experienced for how pleasant they were. We z-scored ratings for each patient and used aggregate stimulus rating changes across patients as a metric of Pavlovian conditioning.
Eye tracking
We tracked patients’ pupil diameter during the task as a candidate conditioning metric (Pauli et al., 2015, 2019; Pool et al., 2019). We used an EyeLink 1000 system (SR Research Ltd.) attached to the bottom of the task screen as described in Minxha et al. (2017) and Wang et al. (2019). We calibrated the system using EyeLink’s five point calibration before the session and between blocks (average calibration error = 0.40° ± 0.17°). Patients were instructed to fixate during fixation periods but were free to gaze anywhere in the screen otherwise. Pupil data were preprocessed to remove blinks and outlier points further than 5 s.d. from the mean diameter. We utilized the proprietary automated blink detection method provided by EyeLink 1000 (SR Research Ltd.), which finds time points with missing pupil data and tags the beginning and end of blink events. We interpolated missing values removed in this way with the closest previous value and then filtered data with a 50 ms moving average window. Pupil diameters were normalized in every trial relative to the initial 1 s fixation period, using the average diameter in that period as a baseline. Statistical analyses were then performed using average diameter changes in the 0.5–3 s time windows after CSd and CSp presentations.
Computational model of learning
We used a normative MB model to obtain estimates of how patients encoded transition probabilities between task states, stimulus expected values (EVs), and unsigned prediction errors (uPEs). We adapted a model used for a sequential instrumental task (Gläscher et al., 2010) to estimate a matrix T(s, s′) for the transition probabilities from start states s to end states s′. Given distal states (A, B), proximal states (X, Y), and binary outcome states (reward, value = 1, R; no reward, value = 0, N), we defined the transition matrix T with transition probabilities tss′ as
Finally, the EV of each state could be computed assuming the values of arriving into the reward and no reward states were 1 and 0, respectively:
MB regressors in an example session. a, Time course for proximal (blue) and distal (orange) uPE. b, Same for EV. c, Same for EVs tied to each of the two specific CSp identity.
We chose a normative MB learning mechanism for analyzing the data because this model allowed us to estimate the presumed identity of subsequent stimuli in the associative chain, thereby allowing us to probe stimulus–stimulus mappings in the task. It is important to note that a model-free temporal difference (TD) model (Sutton, 1988) could be used instead to model these data, and that such a model generates highly similar time-series for value signals as we used in the behavioral and neural analyses (though it would not allow us to make predictions about stimulus identities). Using such a TD-like model, a signed reward prediction error (RPE) can be estimated, and the absolute value (unsigned) of this error signal also strongly correlates with uPE signals used in the analysis. While we implemented such a TD model alongside the MB model and used it to analyze the data, we do not report on the results of this analysis because it produces highly similar results about value encoding as reported using the existing MB model, and because this TD model does not allow us to test for stimulus identity representations.
Neural data pre-processing
We performed spike detection and sorting with the semiautomatic template-matching algorithm OSort, utilizing the energy detection method, with energy signal thresholds from 4.5 to 5 standard deviation units. This process is described elsewhere (Rutishauser et al., 2006), and we reproduce some key components here for brevity. After OSort provided automated spike detection and putative cluster formation, we manually determined which clusters were valid putative neural units, by considering the following criteria: absence of contamination by 60 Hz power noise in waveform power spectra, waveform stability in width and amplitude, and absence of refractory period violation (less than
Across all 13 sessions, we obtained 86 vmPFC, 165 amygdala, 119 hippocampus, 137 preSMA, and 103 dACC putative single units (610 total). We refer to these isolated putative single units as “neuron” and “cell” interchangeably.
Single neuron encoding analysis
To quantify how single neuron activity correlated with several variables of interest, we performed a negative binomial generalized linear model (GLM) analysis. As a dependent variable, we measured spike counts in three time windows: CSd window, from 0.25 to 3 s after CSd presentation; CSp window, from 0.25 to 3 s after CSp presentation; outcome window, from 0.25 to 2.25 s after outcome presentation. Table 2 includes all dependent variables and the time windows in which they were tested. We defined RPEs as the difference between proximal EV at the time of outcome and the outcome valence itself.
Dependent variables and respective time windows for negative binomial GLM and population decoding analyses
Population decoding analysis
We performed population decoding analyses by training a linear support vector machine (SVM) with MATLAB’s function fitcsvm for binary classes, or fitecoc for more classes. In each session, we defined a population activity matrix X of dimensions (nTrials, nNeurons) by counting spikes in each trial within the same time windows from the encoding analysis. We then defined the decoded variable y as the same dependent variables from Table 2. To reduce the decoding problem to a classification task, we binned the continuous regressors (EV and uPE) into three tertiles before training the SVM with MATLAB’s multiclass function fitcecoc. Cross-validation was performed by training on two trial blocks and testing on the two remaining trial blocks of each session. Since there were four blocks in total, we repeated the procedure for every possible combination of train/test blocks, resulting in six cross-validation folds. Test accuracies were averaged across folds and reported separately for each session. To obtain test accuracy significance levels, we repeated this entire procedure 500 times while shuffling the decoded variable y and compare the null mean test accuracy obtained in this manner with the true test accuracy, for each brain region. Finally, we corrected significance thresholds for the number of tested brain areas. For this analysis, we excluded one session, which contained only one block of trials. For comparisons between GLM encoding analyses and SVM decoding analyses, we utilized the SVM beta values in binary classes or the average absolute beta values across multiple binary learners for multiple class decoding.
Cross-correlation analysis
To measure how neural activity across brain areas of interest was correlated, we performed a spike–spike cross-correlation analysis, with shuffle correction (Brody, 1999). First, we divided all trials in EV or uPE tertiles, excluding the middle tertile to obtain low EV or uPE and high EV or uPE trial groups. Then, for each trial group, we computed cross-correlations for each neuron pair containing neurons A and B recorded from the same session, but different brain areas (e.g., each neuron pair contained one vmPFC neuron vs one amygdala neuron).
For two spike trains
Finally, the reported shuffle-corrected correlation results were obtained by averaging the V vectors across all neuron pairs. To summarize the correlation results, we obtained the integrals from the positive and negative time lag regions and performed an ANOVA with EV or uPE level (low vs high) and time lag sign (which region leads) as factors.
Statistical analysis
Binomial test
After labeling each neuron as coding for a variable of interest, a way to determine whether neuron counts in each brain area were larger than expected by chance were binomial tests on the number of significant neurons, relative to the size of the tested population of each brain area. Concretely, assuming a false positive rate of
Jaccard index test
After performing negative binomial GLM encoding analyses, we tested whether the subpopulations of neurons which were sensitive to two variables of interest had significant overlap. For this, we computed the Jaccard index (Jaccard, 1912) of overlap between neurons sensitive to each of the variables X and Y, where NX and NY indicate the number of neurons sensitive to the variables X and Y, respectively, and NX,Y indicates the number of neurons concurrently sensitive to both variables:
Results
Behavioral evidence of Pavlovian conditioning
We recorded 87 vmPFC, 166 AMY, 127 HIP, 112 dACC, and 138 preSMA single neurons (630 total) in 13 sessions from 12 patients implanted with hybrid macro/micro electrodes for epilepsy monitoring (Fig. 1c). Neuron counts per brain area for each patient are displayed in Table 3. Patients performed a sequential Pavlovian conditioning task (Fig. 1a) with two CS in the form of fractal images: distal (CSd), followed by proximal (CSp). Here, proximal and distal denote the temporal sequence of stimuli, without any spatial connotation. CS were then followed by an outcome, which could be rewarding or neutral (Pauli et al., 2015, 2019). Outcomes were delivered in the form of videos, either of a hand depositing a piece of candy in a bag or of an empty hand approaching a bag. Patients were told that every display of the rewarding video contributed partially to the amount of real candy they would be given after the end of the session. Patients were asked to pay attention to CS identities as they would be predictive of rewards. Subjects were asked to indicate by button which side of the screen the CSp stimulus was shown to verify attention, but were informed that their accuracy or speed of doing so did not influence the outcome of the trial in any way. In each of the four blocks, a CSd/CSp pair would be more likely associated with the reward, and were therefore defined as CSd+/CSp+ (see Materials and Methods for task details), according to a common/rare transition structure (Fig. 1d). Conversely, the other CSd/CSp pair was more likely to precede the neutral outcome, and are referred to as CSdn/CSpn.
Neuron count per brain area for each patient
To characterize value learning and stimulus predictions in this task, we fit a normative MB transition matrix model (see Materials and Methods for model details) to infer EV, uPEs, and transition probabilities on a trial by trial basis, for each session (Fig. 1e). With these transition probabilities, we could also estimate which CSp was most likely to follow a CSd in each trial, which we refer to as CSp presumed identity. To do so, we utilized the inferred transition probability matrix to determine which state was most likely to come next in the trial, given the current state. In each block, six states were possible: two distal stimuli, two proximal stimuli, and two possible outcomes (reward or neutral). Transitions from distal to proximal stimuli followed a common–rare transition probability structure of
Subjective preference ratings for all fractal images were obtained in the beginning of the task. Additionally, between blocks, we asked patients to re-rate the fractals that were included in the previous block to obtain measures of changes in subjective preference as a function of patients’ experience in the task (Fig. 1b). When grouping CSd and CSp together, we observed that stimuli used as CS+ were rated significantly higher than stimuli used as neutral conditioned stimuli (CSn) (p = 0.02, one sided t-test; Fig. 1f). We also tested whether distal and proximal stimuli had their ratings change by a different amount by contrasting absolute rating changes for (CSd+,CSdn) versus (CSp+,CSpn), and found no significant difference for distal versus proximal stimuli (p = 0.16, two-tailed t-test). Finally, we found that initial ratings for stimuli which would be used as CSd+ and CSdn did not significantly differ (p = 0.91, t-test), further indicating stimulus rating changes were a consequence of experiencing the task. We did not find an effect of conditioning over response times (RTs) for the button press during proximal stimulus presentation, after testing for a correlation between RTs and proximal EVs (p = 0.96, linear regression), and between RTs and uPEs at proximal stimulus presentation (p = 0.36, linear regression).
We performed eye tracking simultaneously with single neuron recordings to further assess the success of conditioning (see Methods). Pupil diameter was analyzed in two distinct time windows: during CSd presentation and CSp presentation (see Materials and Methods for details on pupil analysis). We obtained the average pupil diameter change within these periods, relative to a baseline, and tested whether they correlated with model covariates with a linear mixed effects model, with session number as a random effect. Specifically, the model for pupil diameter during CSd presentation included the EV of the CSd, while the model for pupil diameter during CSp presentation included CSp EV, uPE, and an interaction term between CSp EV and uPE. We found no effect for CSd EV in the first model (p = 0.30). In the second model (Fig. 1e), there was no significant effect of CSp EV (p = 0.07) nor uPE (p = 0.06), but there was a significant interaction between CSp EV and uPE (p = 0.02). This result indicates that pupil diameter correlated with a combination of computational factors inferred from the MB framework. A similar interaction was previously observed in a Pavlovian conditioning paradigm performed in a neurotypical population (Pauli et al., 2015).
Overall, the aggregate behavioral evidence from changes in subjective stimulus ratings and pupil sensitivity to an interaction of EV and uPE indicates collective evidence of Pavlovian conditioning across our subject sample.
Predictive identity coding in vmPFC
We next investigated whether firing rates in individual neurons correlated with task variables and the estimated computational model-derived covariates using a negative binomial GLM analysis (see Materials and Methods for details). We obtained spike counts for each neuron in the time windows that were relevant for each regressed variable (e.g., counting spikes during outcome presentation for regressing outcomes). After obtaining the number of neurons significant for a given criterion, we tested whether the number of significant neurons in each brain region was more than expected by chance with a binomial test (Bonferroni corrected for the number of tested brain areas).
We first regressed the CSp presumed identity (i.e., the most likely proximal stimulus identity) with the firing rate of neurons at the time of CSd presentation (Fig. 3a) to test whether the most likely identity of the next presented stimulus was already encoded by neurons at distal time. A total of
Predictive identity coding in vmPFC. a, T-scores for every neuron in each brain area for a GLM predicting spike counts during CSd presentation with the presumed identity of the CSp as a regressor. Red dots indicate significant neurons and stars indicate significance across the entire region, corrected across areas. b, Time course of normalized firing rates in vmPFC neurons which coded stimulus identities predictively. Trials are separated between preferred (black) versus nonpreferred (blue) stimulus identities, and red dots indicate time points with a significant difference between trajectories. c, vmPFC neuron whose activity during distal presentation correlates with the presumed identity of the proximal stimulus (blue: first CSp; black: second CSp). Top: raster plot; bottom: peristimulus time histogram (PSTH). d, Decoding accuracy for presumed proximal identity during distal presentation. Each dot indicates accuracy in one session, stars indicate significance across sessions with a bootstrapped null distribution, corrected across areas. Bars and dashed lines indicate standard error and chance level, respectively. e, SVM linear versus negative GLM weights for predictive identity coding in each vmPFC neuron.
To determine whether the activity of CSp presumed identity neurons in vmPFC also correlated with the identity of the distal stimulus, we pre-selected neurons that were classified as sensitive in the previous analysis and tested whether activity in this neuron cohort correlated with distal stimulus identity by performing an ANOVA with eight categories (for the two possible CSd identities in each of the four blocks), utilizing z-scored spike counts within the CSd presentation period as the analyzed variable. In vmPFC, we found that only 1/10 CSp predictive neurons also correlated with CSd identity, indicating a predominantly distinct code between these two variables.
For vmPFC neurons whose response signaled predictive identity coding, we summarized their normalized firing rates over time, separated by trials containing their preferred versus nonpreferred identity (Fig. 3b). For each time point, we tested whether preferred versus nonpreferred firing rates were different, and found that they first differed 0.73 s after CSd onset (p < 0.05, t-test, uncorrected). An example neuron performing this type of encoding is shown in Figure 3c. We also regressed the EV of the presumed CSp at distal time and the actual CSp identity at proximal time with firing rate but did not find a significant neuron count in any region (p > 0.05 for all regions, binomial test).
We next turned to population-level analysis to investigate if the joint activity patterns from all neurons recorded simultaneously in a given brain area were predictive of the variables of interest. All population-level decoding analysis throughout this paper is performed session-by-session, only utilizing neurons that were recorded simultaneously. We performed cross-validated population decoding analysis with a linear SVM, obtaining significance levels with a bootstrapped null distribution (see Materials and Methods for details). We found that CSp presumed identity could be significantly decoded at distal time in vmPFC (p < 0.01, permutation test), in consonance with the single neuron selection result discussed above (Fig. 3d). Decoding results per stimulus for each patient are displayed in Table 4. We also found that the linear SVM weights for vmPFC neurons correlate (Pearson correlation test, ρ = 0.79, p < 0.001) with the negative binomial GLM weights from the encoding analysis, indicating a correspondence between these two modalities of analysis (Fig. 3e). These results further establish vmPFC neural activity as a substrate for predictive coding during learning of stimulus–stimulus associations in Pavlovian conditioning.
Decoding accuracy in vmPFC neurons for presumed stimulus identity, for each individual stimulus
Medial frontal cortex and hippocampal neurons encode Pavlovian unsigned error signals
At the population level, uPEs were decodable at the time of outcome in both dACC and preSMA (both p < 0.01, permutation test; Fig. 4a). Additionally, in the encoding analysis of individual neurons, a significant number of neurons coding for uPEs at outcome were present in amygdala (
uPE encoding. a, Decoding accuracy for uPE during outcome presentation. Each dot indicates accuracy in one session, stars indicate significance across sessions with a bootstrapped null distribution, corrected across areas. Bars and dashed lines indicate standard error and chance level, respectively. b, T-scores for every neuron in each brain area, for a GLM predicting spike counts during outcome presentation with uPE as a regressor. Red dots indicate significant neurons and stars indicate significance across the entire region, corrected across areas. c, Same for signed PE. d, Time course of normalized firing rates in amygdala neurons which coded outcomes. Trials are separated between preferred (black) versus nonpreferred (blue) outcomes, and red dots indicate time points with a significant difference between trajectories. e–g, Same for hippocampus, dACC, and preSMA, respectively. h, Spike–spike cross-correlation between dACC and preSMA neurons recorded in the same sessions. Correlograms were computed separately by the level of uPE (red: low; black: high). i, Correlogram integrals by the level of uPE (low or high). Integrals were computed as a summary metric, separately in the positive (preSMA leads, yellow) and negative (dACC leads, blue) time lag regions. j–k, Correlation between absolute SVM decoding weights and GLM T-scores for uPE in dACC and preSMA, respectively.
We found that absolute linear SVM weights significantly correlate with the negative binomial GLM weights from the encoding analysis which included uPE signals in dACC (Pearson correlation test, ρ = 0.28, p < 0.01) but not in preSMA (Pearson correlation test, ρ = 0.07, p = 0.37), indicating a slight discrepancy in the weight of each individual neuron during encoding and decoding in one of the tested brain areas.
In order to determine whether neurons responding to these uPE signals might be better explained by a signed PE signal, we also included a regressor corresponding to the value of the outcome minus the EV at outcome time, corresponding directly to a signed PE signal. When including a regressor carrying this signal in the same neural analysis from which the uPEs were found, we found no significant evidence for neural encoding of signed PEs in either ACC or preSMA (or elsewhere; Fig. 4c). Qualitatively similar results were obtained even if generating the signed PEs via a standard TD model (not shown). We summarized the normalized firing rates for uPE coding neurons in amygdala, dACC, preSMA and hippocampus (Fig. 4d–g) and determined that the first times in which preferred versus nonpreferred uPE activity differed in each region were 0.98 s, 0.81 s, 1.08 s, and 0.47 s, respectively (p < 0.05, t-test).
Unsigned error signals do not correspond to general state-PEs
SPE signals are hypothesized to play a key role in learning of stimulus–stimulus associations (so as to facilitate learning a state-space model necessary for MB inference). We therefore next set out to test whether the uPE signals described above correspond to state-PEs as defined by the MB agent. A SPE signal should not only signal unexpected states occurring at the time of outcome receipt (where encountering an unexpected outcome can be considered to be an unexpected state) but also during presentations of unexpected proximal stimuli. Because associations between distal and proximal stimuli were varied across blocks, our experimental design elicited variance in state-PE signals at the time of proximal stimulus delivery. We could therefore test whether the same neurons responding to the uPE signal at the time of outcome were also responding to unexpected stimuli at the time of the proximal stimulus. We tested for neurons correlating with unsigned error signals generated by the model at the time of proximal stimulus delivery, utilizing a GLM which included CSp identity, EV, uPE, and button press as covariates. We found no significant neuron count in any brain area (p > 0.05 for all brain areas, binomial test), and we only found decoding evidence for a population of neurons coding for SPE-like signals at the time of proximal stimulus in the hippocampus (p < 0.05 for all brain areas, permutation test; Fig. 5). We also ruled out the possibility that uPE signals correlated with trial number within the task. Due to the reversals designed into CSp identity across blocks, these two variables did not correlate with Pearson’s ρ = −0.02 (p = 0.46, Pearson correlation test). Additionally, we found that no brain area had a significant number of neurons which correlated with the identity of the button press (p > 0.05 for all brain areas, binomial test).
uPE encoding at proximal time. a, Decoding accuracy for uPE during proximal stimulus presentation. Each dot indicates accuracy in one session, stars indicate significance across sessions with a bootstrapped null distribution, corrected across areas. Bars and dashed lines indicate standard error and chance level, respectively. b, T-scores for every neuron in each brain area, for a GLM predicting spike counts during proximal stimulus presentation with uPE as a regressor. Red dots indicate significant neurons and stars indicate significance across the entire region, corrected across areas.
To check for possible changes in activity in response to button presses, we subdivided the CSp periods into 1 s before and 1s after button presses, aligned to the button press. Utilizing these two periods, we tested whether the spike counts of each neuron within the CSp period significantly differed before and after button presses (two-sample Wilcoxon rank-sum test, two-sided). We found that
Thus, the uPE signals we observed in the dACC and preSMA appear to be unique to the time of outcome delivery, suggesting that their response profiles are more specific than would be expected for a general state-PE signal. Instead, these signals appear to be specific to salient biologically relevant outcomes (such as visual cues signaling food rewards or their absence) that are unexpected, and respond little to stimuli that have no established value such as the fractal stimuli presented at the proximal stimulus time. In the hippocampus, results were less conclusive, given the positive results in uPE encoding at outcome and uPE decoding at the time of proximal stimulus presentation, followed by null results in uPE decoding at outcome and uPE encoding at proximal time. Such a discrepancy between single neuron encoding results and population decoding results can be the result of a distributed code or the presence of several weakly coding neurons (Aquino et al., 2020; Stefanini et al., 2020).
Cross-correlations in medial frontal cortex neuronal pairs
To further investigate the dynamics of the two regions (dACC and preSMA) that simultaneously displayed a PE signal during the outcome phase, we examined temporal cross-correlations between neural activity in each region. While these two areas had a significant number of neurons correlating with PE signals, it was possible in theory that the dynamics of PE coding across regions was uncorrelated, or that one region consistently displayed spikes which led the other region temporally. Specifically, we tested the hypothesis that cross-correlations for dACC–preSMA neuron pairs were modulated by uPEs in the outcome time period. For this, we split all trials into three uPE tertiles and defined the first and third tertiles as low and high unsigned error trials, respectively, discarding the middle tertile. Then, for the low and high unsigned error trials, we separately computed spike–spike cross-correlations for all simultaneously recorded dACC–preSMA neuron pairs (see Materials and Methods for details; Fig. 4f). Finally, to summarize our results, we computed cross-correlogram integrals, separately for the positive and negative time lag periods. A peak in the positive time lag region indicates that preSMA spikes preceded dACC spikes more often, whereas a peak in the negative time lag region indicates that dACC spikes preceded preSMA spikes more often.
We performed an ANOVA on the correlogram integrals, with leading region and uPE level as factors, and obtained a significant effect of uPE level (p = 0.005; Fig. 4g), but no significant effect for leading region nor an interaction. Taken together, these results shed light on the relationship between preSMA and dACC in PE coding during Pavlovian conditioning. In summary, both regions contained predictive information for uPE, and the degree to which dACC and preSMA spikes were aligned in time correlated with the degree of uPE, with no directional preference between the two regions. One possible interpretation of this result is that uPE signals are computed elsewhere in the brain and influence activity in these two areas within a similar time scale.
EV coding in vmPFC and amygdala
Next we performed a population decoding analysis for EVs and found significant decoding of proximal EVs in vmPFC and amygdala neural populations (vmPFC: p < 0.05; amygdala: p < 0.01, permutation test; Fig. 6a). Unlike in population decoding, we did not find a significant neural count for EV coding in any of the brain areas. Nevertheless, given that EV information was present in a distributed code through both neural populations, we tested whether correlations across vmPFC-amygdala neuron pairs were modulated by the level of EV in each trial. Excluding the middle tertile of EV trials, we computed shuffle-corrected correlograms for low and high EV trials (Fig. 6b) and obtained summary correlogram statistics by obtaining their integrals (Fig. 6c).
EV coding in vmPFC and amygdala. a, Decoding accuracy for EV proximal stimulus. Each dot indicates accuracy in one session, stars indicate significance across sessions with a bootstrapped null distribution, corrected across areas. Bars and dashed lines indicate standard error and chance level, respectively. b, Spike–spike cross-correlation between vmPFC and amygdala neurons recorded in the same sessions. Correlograms were computed separately by the level of EV (red: low; black: high). c, Correlogram integrals by the level of EV (low or high). Integrals were computed as a summary metric, separately in the positive (amygdala leads, yellow) and negative (vmPFC leads, blue) time lag regions.
Given that vmPFC neurons predictively encoded both stimulus identities and EVs, we next tested whether there was a significant overlap between these two vmPFC neuron subgroups. We found no significant overlap between EV coding neurons and presumed identity coding neurons in vmPFC (p = 0.171, Jaccard overlap test), indicating that predictive identity decoding and outcome decoding in vmPFC are independent subpopulations in this dataset.
We found that the average cross-correlogram peaked at +60 ms (amygdala leading) for low EV trials and peaked at −0.127 ms (vmPFC leading) for high EV trials, indicating an inversion in the most likely leading region in spike correlations as a function of EV level (Fig. 6b). Additionally, an ANOVA on correlogram integrals, with leading region and EV level as factors, revealed a significant interaction between these two factors (p = 0.01; Fig. 6c). Taken together, these results shed light on the relationship between amygdala and vmPFC in predictive value coding during Pavlovian conditioning. While both regions contained predictive information for value, which region’s spikes tended to precede the other’s was determined by the expected valence of the outcome. This finding suggests the possibility, in principle, that values learned through Pavlovian conditioning have the ability to dynamically alter connectivity patterns between amygdala and vmPFC, regions which have been shown to selectively code value-based cues (Schoenbaum et al., 1998; Kennerley and Wallis, 2009).
Pavlovian outcome coding
At the time of the outcome, we did not find a significant proportion of single neurons correlated with outcome (reward vs neutral) in any region (p > 0.05, binomial test; Fig. 7a). An example dACC neuron which coded outcomes is shown in Figure 7b. At the population level, we found significant decoding of outcomes in dACC (p < 0.001, permutation test; Fig. 7c). Furthermore, outcome was also decodable from the hippocampus (p < 0.05, permutation test) and preSMA (p < 0.05, permutation test), highlighting a widespread population-level representation of outcome valence.
Outcome encoding. a, T-scores for every neuron in each brain area, for a GLM predicting spike counts during outcome presentation with outcome as a regressor. Red dots indicate significant neurons and stars indicate significance across the entire region, corrected across areas. b, dACC neuron whose activity correlates with outcome (blue: no reward; black: reward). Top: raster plot; bottom: PSTH. c, Decoding accuracy for outcome during outcome presentation. Each dot indicates accuracy in one session, stars indicate significance across sessions with a bootstrapped null distribution, corrected across areas. Bars and dashed lines indicate standard error and chance level, respectively.
Discussion
In this study, we probed for neural representations of associations and learning signals in the human brain during Pavlovian conditioning at the level of single neurons. We hypothesized that neurons in the vmPFC would be involved in prospective identity coding, on account of prior evidence about a role for rodent OFC neurons in identity coding, as well as fMRI results from human OFC (Schoenbaum et al., 1998; Gottfried et al., 2003; Ostlund and Balleine, 2007; Kennerley and Wallis, 2009; Pauli et al., 2019). We found that neurons in the human vmPFC did indeed encode information about subsequent stimuli that are expected to occur, suggesting that neurons in this area are involved in encoding stimulus–stimulus relationships. The finding that vmPFC neurons (including a subset of neurons in the medial OFC) are involved in encoding stimulus–stimulus associations is compatible with a theoretical proposal implicating the OFC in facilitating a cognitive map of task contingencies (Wilson et al., 2014). In the present study, we tested for neurons encoding representations of the most likely stimulus to be subsequently presented given the current associative structure, which could be considered a relatively simple form of inference over states. This type of stimulus–stimulus identity coding during Pavlovian conditioning could form part of the neural substrates for MB Pavlovian conditioning (Prévost et al., 2013; Dayan and Berridge, 2014; Pauli et al., 2019).
We also aimed to test for neurons coding for state-PEs and found individual neurons and population-level encoding of an unsigned error signal in the amygdala, hippocampus, preSMA, and dACC at the time of outcome receipt. Note that the present study does not focus on direct functional comparisons between brain areas or make any claims interpreting the absence of signals, given our limited sampling within different brain areas. Importantly, a state PE signal should also respond to unexpected stimuli even if they do not have strong affective significance, such as the fractal stimuli presented in our sequential Pavlovian paradigm. An unexpected presentation or omission of a particular stimulus in the sequential chain should elicit a state-PE response. While we found robust state-PE responses at the time of outcome delivery, these same neurons did not respond to unexpected stimulus presentations or omissions, nor did we find evidence for a separate population of neurons responding to unexpected non-outcome stimuli. Thus, the unsigned error signal we found appears not to meet the criteria necessary to be deemed a state-PE as defined by the theoretical model. However, we found a separate population of neurons in dACC responding instead to outcome valence. These outcome-specific unsigned prediction neurons appear to exhibit similar properties to a subset of dopaminergic neurons found to respond to unexpected outcomes irrespective of their valence (Matsumoto and Hikosaka, 2009). One possible explanation for these error signals is that they play a specific role in mediating learning of stimulus–outcome associations where outcome properties are features such as outcome identity, but they are not involved in more general processes pertaining to learning of a cognitive map.
Intriguingly, we found these unsigned outcome-specific error signals in the medial temporal lobe as well as in dorsomedial prefrontal cortex. It is also notable that we could also decode error signals at the time of proximal stimulus presentation relating to how unexpected that particular stimulus was, albeit this result should be viewed as tentative given it was only weakly significant. The finding of a role for hippocampal neurons in encoding errors associated with learning of stimulus–outcome relationships and potentially even in stimulus–stimulus associations adds further insight into the putative role of the hippocampus in Pavlovian conditioning, as well as in MB inference more generally (Wimmer and Shohamy, 2012; Boorman et al., 2016; Miller et al., 2017). One hypothesis is that these hippocampal error neurons play a general role in facilitating predictive learning about stimulus–stimulus associations as part of a cognitive map, even as they are encountered passively through Pavlovian learning.
Our findings also shed unique light on the functions of dorsomedial prefrontal cortex in error monitoring. There is considerable evidence to suggest that neurons in the dorsomedial prefrontal cortex are involved in error signaling during or shortly after the active performance of motor responses and involved in particular in signaling when an incorrect response has occurred (Bonini et al., 2014; Fu et al., 2019; Sajad et al., 2019; Fu et al., 2022). Crucially, the present study reveals that neurons in dmPFC also track error signals even when no motor responses are being actively performed by the participant to obtain the outcome. A number of theories of dmPFC function in error monitoring have instead emphasized a role for the RPE signal carried by the dopaminergic system in driving error-related activity changes in these areas, especially the error-related negativity detected in scalp EEG (Falkenstein et al., 1991; Gehring et al., 1993; Holroyd and Coles, 2002; Godlove et al., 2011). Such a reward error signal would signal a deviation in expected reward obtained, as opposed to depicting errors in responses performed per se.
In addition to finding neuronal signals related to identity coding and unsigned error coding that are orthogonal to value, we also found evidence of a role for neurons in the vmPFC and amygdala in encoding stimulus–value at the population level. Importantly, neurons in OFC and amygdala have been shown to respond selectively in anticipation of rewarding or aversive outcomes in rodents and monkeys (Schoenbaum et al., 1998; Kennerley and Wallis, 2009), with a role for human OFC and medial prefrontal cortex encoding reward-related computations, as measured by invasive recordings (Saez et al., 2018; Domenech et al., 2020; Marciano et al., 2023), and a role for human amygdala in coding salient and unexpected events (Zheng et al., 2017; Tzovara et al., 2024). Recent invasive work also pointed toward RPE coding in human dorsomedial prefrontal cortex (Hoy et al., 2023). Additionally, BOLD responses in humans have been found to correlate with EV across an array of paradigms including Pavlovian appetitive conditioning (Hampton et al., 2006; Tobler et al., 2006; Rushworth and Behrens, 2008; Kahnt et al., 2010; O’Doherty et al., 2017; Pauli et al., 2019). Above and beyond its aforementioned role in predictive value coding in tandem with OFC, the amygdala has been shown to track EVs during decision making (Gottfried et al., 2003; Holland and Gallagher, 2004; Hampton et al., 2006; Salzman and Fusi, 2010; Rudebeck et al., 2017; Wang et al., 2017; Aquino et al., 2020).
Importantly, recent opto- and chemogenetic circuit dissection studies in rodents have demonstrated that bidirectional amygdala-OFC interactions are critical for both encoding and retrieval of reward value, and for the construction of cue-triggered reward expectations (Lichtenberg et al., 2017; Malvaez et al., 2019). Our findings provide preliminary evidence in support of a similar network-level substrate in humans. We investigated whether spiking activity in simultaneously recorded vmPFC and amygdala neuron pairs correlated in time, and found that cross-correlation patterns were modulated by the EV of the CS. Specifically, amygdala neuron spikes tended to precede vmPFC spikes when patients were exposed to a cue which predicted a low value outcome, while the opposite occurred for high value cues. These findings provide further insight into the nature of the interactions between these regions and are congruent with prior literature suggesting a critical role for interactions between the amygdala and OFC in value computations. For instance, a lack of healthy amygdala input induces significant changes to vmPFC activity during reward learning and decision making (Hampton et al., 2007) and impairs the formation of neural ensembles in OFC to represent new contingencies during reversal learning (Schoenbaum et al., 2003). When taken together with this prior literature, our results are compatible with converging evidence positioning amygdala as a center for predictive value coding acting in consonance with OFC, which creates associations between stimulus identities and outcomes. Still, these results should be interpreted with caution given the relatively low count of significant EV neurons found in the amygdala.
Broadly, our findings provide a glimpse into how Pavlovian conditioning takes place in human prefrontal cortex neurons as well as in the amygdala and hippocampus, adding to an electrophysiology literature in other species as well as human fMRI studies. We provide unique evidence that human vmPFC neurons encode representations of predictive identity, consistent with previous findings from BOLD fMRI and electrophysiological results in rodents. Moreover, we found evidence that neurons in dorsomedial prefrontal cortex such as the preSMA and dACC and the hippocampus encode error signals related to unexpected outcomes that could be used to facilitate learning of stimulus–outcome associations involving features other than value.
Data Availability
Behavioral and neural data are available at https://osf.io/5rv8f/.
Code Availability
Analysis scripts are available at https://github.com/43technetium/pavlovianTaskAnalysis.
Footnotes
We would like to thank the members of the O’Doherty and Rutishauser labs for discussions and feedback. We thank all subjects and their families for their participation, as well as nurses and medical staff for their work. This work was supported by National Institutes of Health Grants R01DA040011 and R01MH111425 (J.P.O.), R01MH110831 and U01NS117839 (U.R.), and P50MH094258 (J.P.O. and U.R.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
↵*Joint senior authors.
The authors declare no competing financial interests.
T. G. A.'s present address: Biomedical Engineering, Columbia University, New York, New York 10027.
- Correspondence should be addressed to Tomas G. Aquino at tg2863{at}columbia.edu.