Abstract
The anterior cingulate cortex (ACC) is believed to be involved in many cognitive processes, including linking goals to actions and tracking decision-relevant contextual information. ACC neurons robustly encode expected outcomes, but how this relates to putative functions of ACC remains unknown. Here, we approach this question from the perspective of population codes by analyzing neural spiking data in the ventral and dorsal banks of the ACC in two male monkeys trained to perform a stimulus-motor mapping task to earn rewards or avoid losses. We found that neural populations favor a low dimensional representational geometry that emphasizes the valence of potential outcomes while also facilitating the independent, abstract representation of multiple task-relevant variables. Valence encoding persisted throughout the trial, and realized outcomes were primarily encoded in a relative sense, such that cue valence acted as a context for outcome encoding. This suggests that the population coding we observe could be a mechanism that allows feedback to be interpreted in a context-dependent manner. Together, our results point to a prominent role for ACC in context setting and relative interpretation of outcomes, facilitated by abstract, or untangled, representations of task variables.
SIGNIFICANCE STATEMENT The ability to interpret events in light of the current context is a critical facet of higher-order cognition. The ACC is suggested to be important for tracking contextual information, whereas alternate views hold that its function is more related to the motor system and linking goals to appropriate actions. We evaluated these possibilities by analyzing geometric properties of neural population activity in monkey ACC when contexts were determined by the valence of potential outcomes and found that this information was represented as a dominant, abstract concept. Ensuing outcomes were then coded relative to these contexts, suggesting an important role for these representations in context-dependent evaluation. Such mechanisms may be critical for the abstract reasoning and generalization characteristic of biological intelligence.
Introduction
Expected outcomes are encoded in multiple brain areas, including orbitofrontal cortex (OFC) and anterior cingulate cortex (ACC; Matsumoto et al., 2003; Padoa-Schioppa and Assad, 2006; Burke et al., 2008; Kennerley et al., 2009; Kahnt et al., 2010; Cai and Padoa-Schioppa, 2012; Howard et al., 2015; Del Arco et al., 2017). In OFC, outcome expectations play a role in choice behavior (Padoa-Schioppa and Assad, 2006; Ballesta et al., 2020), but in ACC the importance of these signals is less clear. Beyond decision-making, expecting a particular outcome motivates motor responses and provides context to determine when results are better or worse than expected. Different lines of evidence support a role for ACC in each process. ACC is implicated in goal-based motor selection (Shima and Tanji, 1998; Matsumoto et al., 2003) and computing action costs (Rudebeck et al., 2006; Kennerley et al., 2009; Cai and Padoa-Schioppa, 2021) but also tracks information important for predicting and interpreting outcomes, such as richness, temporal proximity, or volatility of rewards (Shidara and Richmond, 2002; Behrens et al., 2007; Sallet et al., 2007; Kolling et al., 2012).
An important challenge in interpreting expectation signals in ACC is the heterogeneity of single neuron responses (Matsumoto et al., 2003; Hayden and Platt, 2010; Kolling et al., 2016). Increasing evidence suggests that computations arising from such heterogeneity may be understood by examining population codes (Saxena and Cunningham, 2019; Barack and Krakauer, 2021; Ebitz and Hayden, 2021). Heterogeneous or nonlinear mixing among single neurons produces higher-dimensional population activity, which is advantageous for flexibly encoding combinations of variables (Rigotti et al., 2013; Fusi et al., 2016). However, when neural encoding is more linear or less heterogeneous, lower-dimensional structure can be resolved, revealing how a population organizes information (Gao and Ganguli, 2015; Bernardi et al., 2020). This structure typically represents latent factors that are relevant to the function of the brain region in a manner that is invariant to the other features of incoming stimuli. For instance, representations in inferotemporal cortex distinguish objects regardless of rotation or size, consistent with its role in object recognition (DiCarlo and Cox, 2007; DiCarlo et al., 2012; Pagan et al., 2013). Such representations are said to be untangled (or disentangled), and similar invariance related to unique functions of the region have also been reported in areas of prefrontal cortex (Yoo and Hayden, 2018; Xie et al., 2022).
From this perspective, we assessed how latent factors related to expected and received outcomes and motor responses structure population representations in ACC. We analyzed neurons recorded from monkeys performing a stimulus-motor mapping task to earn rewards (positive valence condition) or avoid losses (negative valence condition). Compound cues indicated the valence condition and instructed the motor response, allowing us to discern how neural populations represent these variables—jointly or as untangled concepts. In addition, trial outcomes could be interpreted in either an absolute (amount of reward) or relative (better or worse than expected) sense, allowing us to assess how expectations influence outcome coding. We separately analyzed data from dorsal and ventral anterior cingulate sulcus (dACC and vACC respectively) because previous studies found differential encoding of valence and motor information between these two subregions. Specifically, motor correlates were more common in dACC, and vACC preferentially responded to negative information, whereas dACC included a mix of responses to positive and negative information (Amemori and Graybiel, 2012; Cai and Padoa-Schioppa, 2012).
Our results show that both regions separated expected outcome valence from other task information, resulting in nonrandom structure in correlation statistics of population activity. Valence was represented persistently throughout the trial in an abstract format, meaning that it was independent of the other information carried by each cue. When an outcome was received, ACC primarily encoded reward or loss relative to this abstract valence representation, such that the valence of the cue acted as a context for interpreting outcomes. We suggest that this role in setting a context for subsequent outcomes is a key factor encouraging the abstraction of cue valence and that untangled, generalizable representations might facilitate context-dependent behavior.
Materials and Methods
Subjects and behavioral task
Behavioral data have been previously described in Rich and Wallis (2014). Subjects were two male rhesus monkeys (Macaca mulatta), M and C, ages 6 and 10 years and weighing ∼11.0 and 14.5 kg, respectively, at the time of recording. Subjects sat in a primate chair and viewed a computer screen. Affixed to the front of the chair was a joystick that could be displaced to the left or to the right with minimal force. Cues were presented on the computer screen, behavioral contingencies were controlled using Monkey Logic software (Asaad and Eskandar, 2008), and eye movements were tracked with an infrared camera system (iSCAN). Each subject was chronically implanted with a titanium head positioner and two cylindrical titanium chambers, positioned over target areas in the frontal lobe in each hemisphere. All procedures were performed in accord with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and recommendations of the University of California at Berkely Animal Care and Use Committee.
Each recording session consisted of several hundred consecutive trials. At the start of each trial, subjects maintained their gaze within a 1.3° radius of a central fixation spot for 650 ms. Immediately following this fixation period, one of four cues appeared on the screen. Cues consisted of images of natural scenes, ∼2° × 3° in size. Before recording, subjects had been trained to move the joystick in a particular direction (left or right) to either earn reward or avoid a loss. Subjects were discouraged from arbitrary responding by penalizing responses within 150 ms of cue presentation with a 5 s time-out but were otherwise free to respond as quickly as they would like. Following a joystick response, subjects received feedback through changes in the length of an onscreen reward bar. The reward bar functioned as a secondary reinforcer, which allowed us to equate the magnitude of gains and losses and present valence information in a single sensory modality (as opposed to, e.g., juice reward vs air puff). The bar was visible at the bottom of the task screen throughout the session and indicated the amount of reward the subject had cached at each point in time. At the end of every block of six completed trials, the subject earned a juice reward proportional in amount to the current length of the bar. This schedule was selected to balance the requirement of keeping the monkey motivated throughout a session with relatively little interruption in consecutive trials for bar cash-ins. The bar was then reset to length = 2 to start the next trial block.
On trials featuring one of the two positively valenced cues (referred to as “positive cues”), a correct response resulted in an increase to the length of the reward bar by 1 increment (+1 bar), and an incorrect response resulted in no change to the bar (0 bar). On trials with negative cues, a correct response resulted in no bar change (0 bar), whereas an incorrect response resulted in decreased bar length by one increment (−1 bar; Fig. 1A). The subjects were well trained on the task and performed with high accuracies (Rich and Wallis, 2014). To ensure sufficient numbers of each of the four possible trial outcomes [+cue/+1 bar], [+cue/+0 bar], [−cue/+0 bar], and [−cue/−1 bar], the 0 bar outcome was delivered on 15% of all positive cue trials, and the −1 bar outcome was delivered on 15% of negative cue trials, regardless of the response of the subject.
Neural recording and general preprocessing
Neural recordings were obtained on a Plexon MAP system. In each session, 4–20 tungsten microelectrodes (FHC) were acutely placed in target regions (Fig. 1G). Electrode paths were calculated based on 1.5T MRI scans of the brain of each subject obtained before recording. Recorded neurons were not screened for response properties and therefore represented a random sample in each area. Waveforms were digitized and well-isolated units identified with Offline Sorter (Plexon) and saved for further analysis. Following spike sorting and before subsequent analyses, we removed any neurons whose mean firing rate across an entire session was <1 Hz as such neurons would not provide sufficient activity to analyze statistically. All analyses were implemented with custom MATLAB code (MathWorks).
Single-unit regressions
The selectivity of single neurons was assessed with multiple linear regression models. For each neuron on each trial, we averaged spike counts in 150 ms time bins stepped at 25 ms, with bin centers ranging from 200 ms preceding cue presentation to 650 ms following presentation. All time windows were chosen a priori. The overall analysis window was selected to approximately capture the time preceding median responses of both monkeys. Sliding windows and step sizes were selected based on ranges that are common in the literature. For the ith time bin, we fit an ordinary least-squares regression model as:
To assess selectivity of neurons for task variables, we examined the p values for β coefficients of each predictor. To control Type I errors in the precue epoch (false hits < 0.05 of the population in all bins preceding cue), we imposed a significance level of 0.005 for p values in each bin and deemed significant only those bins that were part of a consecutive series of three bins below the 0.005 threshold. We defined trial epochs a priori as windows of 500 ms duration, offset from an event by 100 ms to account for processing delays. Any neuron that passed these criteria during the analysis window (101–600 ms following cue and feedback onset) was counted as significant. An overall direction of activation was assigned to significant neurons based on the sign of the mean beta coefficient during this time window. Additionally, we considered a unit as displaying linear mixed selectivity (LMS) if both the valence and response direction predictors had significant p values. We considered units as displaying non-LMS (NLMS) if the interaction term between valence and response direction was significant.
The coefficient of partial determination (CPD) was computed for the kth predictor in the ith time bin as follows:
During the feedback epoch, we aligned firing rates to the times the reward bar did or would have changed. Similar models assessed coding of relative feedback by replacing cue valence in the predictor matrix. To assess coding of absolute feedback magnitude, we fit an additional model where the predictor matrix included absolute feedback as a signed magnitude rather than cue valence, with all other predictors unchanged. As the two models had the same number of free parameters, we compared the coefficient of determination (
Preprocessing for population analyses
Single-unit data were aggregated to form pseudopopulations for each subject and each area, and all descriptions of populations below refer to pseudopopulations. Here, we excluded any neuron without at least 10 trials in each condition, as smaller trial counts might not support reliable resampling as described below. (Neuron counts remained the same for both regions in both subjects, except for subject C vACC, where we excluded 6 of 70 neurons; however, this exclusion did not noticeably have an impact on results.) Here, we combined neurons from the subjects because we observed good overall agreement between separate population analyses by subject. Spike counts sampled at 1 ms were aligned to the onset of the cue or feedback. Trials from each condition were randomly subsampled to select
Neural decoding
Linear support vector machines (SVMs) were used to classify trials of pseudopopulation responses according to task variables. All task variables were binary, except for absolute feedback, where there were three classes. This multiclass classification was implemented using an ensemble method (fitcecoc.m in MATLAB) that combines the performance of three binary one-versus-all linear SVMs. SVMs were trained and tested with a resampling strategy as follows. For each time bin, we first randomly split all n trials into 10 cross-validation (CV) folds. We then resampled with replacement 100 trials from the training set for each of eight conditions (4 cues × 2 possible outcomes) for each neuron separately (any non-task-related covariance was already destroyed in the initial construction of the pseudopopulation array), yielding an 800 × p matrix, whose rows are p-dimensional single-trial population response vectors. Each of the single-trial vectors was assigned a class label that depended on the variable being decoded (see Table 2). For decoding of absolute feedback magnitudes, this resulted in unbalanced class sizes (i.e., there were twice as many 0 bar trials as +1 or −1 because the 0 bar result occurs in four of the eight conditions), so we randomly subsampled the resampled 0 bar trials to match the size of the +1 bar and −1 bar classes. Not performing this subsampling, so that class sizes for the zero-versus-all binary classifier were balanced, yielded qualitatively similar results.
The same resampling strategy was conducted for the held-out data fold, generating another 800 × p matrix as a test set. Crucially, single trials were first separated into training and test sets, and resampling was performed separately for these groups so that the same data were never used for both training and testing. Final performance was based on the concatenation of predicted labels from each of the test sets across folds. The entire CV process (over 10 folds) was repeated 100 times for each bin, randomizing the partition of the n single trials into folds with each repetition. The final decoding performance for the ith bin was derived by averaging across the 100 repetitions. Decoding performance time series were obtained by performing this CV process in each of the b
Decoding performance measures included accuracy, balanced accuracy, area under the receiver operating characteristic (ROC) curve, and area under the precision-recall curve. Additionally, we assessed decoding performance on each of the classes individually using precision, recall, and the F-measure. We observed good agreement among these calculated metrics. For binary task variables, we report accuracy; for multiclass decoding, we report accuracy as well as the F-measure of the binary one-versus-all decoders for each of the classes; p values were computed by repeating the entire process above, including resampling of training and test sets within each CV iteration, on 1000 datasets in which class labels were randomly permuted. (Although for each permutation, we performed only one repetition of the entire CV process, rather than 100.). The p values are derived as
Representational geometry
We assessed measures of representational geometry by generally following previously described methods (Bernardi et al., 2020). We first preprocessed the data as above, then resampled trials from each task condition as described (see above, Neural decoding), except drawing from all available trials within each condition without first subsampling. For the main results (see Fig. 3), we averaged spike counts in a single 500 ms time window from 101 to 600 ms following cue or feedback onset. Time series results were derived from mean spike counts taken in consecutive, nonoverlapping 100 ms bins. We shifted from sliding windows used in our previous analyses to nonoverlapping bins to minimize run time as these analyses were computationally intensive. Firing rates of each neuron were independently z-scored across trials for single-window analyses or across trials and time bins for time series analyses.
Cross-condition generalization performance
We first identified sets of conditions that define dichotomies as described in Bernardi et al. (2020). For a set Z of m integers
More specifically, for a given dichotomy, let
Across all values of k, the total number q of classifier models trained and tested is given by the following:
Parallelism score
The parallelism score (PS) measures abstraction using a different approach from CCGP that is based on similar geometric ideas (Bernardi et al., 2020). Namely, if the optimal separating hyperplane
Next, for each permutation, we considered all
Statistical significance
We used random permutations to generate null distributions for the CCGP and PS tests by independently permuting each column of the predictor matrix. For each permutation, we compute the CCGP and PS for each dichotomy separately. This results in a discrete approximation of one null distribution for each dichotomy, although we observed that these null distributions are in practice often very similar. Two-tailed p values were derived as
Shattering dimensionality
First, we computed the shattering dimensionality for the recorded neural data. The procedure was the same as that described (see above, Neural decoding), with separate resampling within the training and test sets during CV, except that we averaged population activity within a window of 101–600 ms following cue or feedback onset. We then varied the class labels assigned to trials to decode all possible dichotomies. For example, to decode the cue valence dichotomy, we assigned conditions {1,2,3,4} (see Table 2) a class label of 0, and conditions {5,6,7,8} a class label of 1. To decode the relative outcome dichotomy, we assigned conditions {1,3,5,7} a class label of 0 and conditions {2,4,6,8} a class label of 1. The mean decoding accuracy across all dichotomies is the shattering dimensionality (Bernardi et al., 2020).
Next, we adopted a similar approach to Bernardi et al. (2020) to compare the geometric structure of our data to a linear (factorized) model. This model, in which neural responses are linear combinations of the abstracted task variables, was generated as follows. For m task conditions arising as the cartesian product of k binary task variables, we modeled condition centroids as vertices of a randomly rotated k-dimensional hypercuboid (see Fig. 4) embedded in a p-dimensional vector space, where p is the number of neurons. During the cue epoch, the relevant task variables are cue valence and response direction (k = 2), with relative feedback added during the feedback epoch (k = 3). For the ith condition, we simulated a single-trial point cloud by sampling 100 points from the multivariate Gaussian distribution
Representational similarity analysis
Preprocessing was the same as that described (see above, Preprocessing for population analyses) except that no subsampling by condition was performed here. Briefly, we created a
Template matrices serve as predictors in an ordinary least-squares regression that modeled the empirical correlation matrix in each bin as a weighted sum of each template. We assessed the contributions of each template in each bin by computing CPDs as described (see above, Single-unit regressions). We generated null distributions for CPDs of each template in each bin by randomly permuting the elements of the empirical condition correlation matrix 10,000 times and for each permutation fitting an ordinary least-squares model and computing CPDs for each predictor; p values were computed as
Latent task factors via principal components analysis
We performed a principal components analysis (PCA) to better understand population structure and task selectivity driven by variance across conditions. First, we formed the
ePAIRS
To assess functional clustering among ACC neurons, we applied the ePAIRS (elliptical Projection Angle Index of Response Similarity) algorithm, originally proposed by Raposo et al. (2014) and extended by Hirokawa et al. (2019) and Dubreuil et al. (2022). For each loading vector from the PCA above, the mean angular distance to its k nearest neighbors was computed, resulting in a set of p mean distances, where p is the number of neurons. We used k = 3 for our analyses, but note that various values of 1, 2, and 9 also give qualitatively similar results. In addition, we used angular distance to emphasize the alignment rather than magnitude in the loading space of the preferred directions of neurons. A null model of nonclustered data was constructed following Dubreuil et al. (2022), by drawing 1000 random sets of p vectors from a multivariate Gaussian distribution with mean and covariance matching the empirical loadings. For each of these 1000 sets of vectors, we then computed mean angular distances for each of the p null vectors as for the empirical loadings and used a rank sum test to compare the pooled 1000p null distances with the distances from the empirical loadings, yielding a two-sided p value. The effect size ψ is computed as follows:
Dimension removal
To assess the contribution of a given dimension to clustering, we iteratively removed the variance in the original dataset along different principal axes identified with PCA. To do this, we took the SVD of the
Geometry simulation
We simulated neural data in four conditions (see Fig. 7) that corresponded to the following possible valence-outcome combinations: (1) +cue/+bar, (2) +cue/0 bar, (3) −cue/0 bar, and (4) −cue/-bar. An encoding of each condition was represented by a binary sequence over the four conditions, such that the vector [1 1 −1 −1] represented cue valence encoding, and the vector [1 −1 1 −1] represented relative feedback encoding. These two vectors were arranged as the columns of a 4 × 2 matrix,
Low-dimensional neural dynamics
To aid in the visualization of neural population dynamics as low-dimensional trajectories, we used dimensionality reduction using PCA. We began as described (see above, Representational similarity analysis) with a
Results
Two monkeys were trained to perform an instructed response task by moving a bidirectional joystick to the right or left, depending on an instructional cue (Fig. 1A). Cues were displayed centrally on a computer screen, with two cues instructing rightward responses and two instructing leftward responses. Two cues (one right and one left) also indicated that the monkey would earn a reward for responding correctly and nothing for responding incorrectly, whereas the other two cues (also one right and one left) indicated that the monkey would receive a punishment (suffer a loss) for incorrect responses and nothing for correct responses (Table 1). Therefore, each cue carried a unique combination of valence (potential reward/loss) and response (left/right) information. Reward and loss were delivered by increasing or decreasing the size of a reward bar visible at the bottom of the task screen. The reward bar was cashed in for a proportional amount of fruit juice after every six completed trials, so that monkeys were motivated to earn increases and avoid decreases in bar size.
Task and behavioral performance. A, Schematic of the instructed response task. To start a trial, monkeys fixated a central point. Then one of four familiar pictures appeared in the screen center (Cue presentation). The monkey responded by moving a joystick to the right (R) or left (L; Motor response). Following a 300 ms delay, the reward bar either did or did not change size, providing feedback for their response. The reward bar was visible throughout the session, and its size carried over to the next trial until six trials were completed. At that point, the bar was cashed in for a juice reward proportional to the bar length, and the bar was reset to an initial starting size (data not shown). B, Percentage of trials of each type completed correctly by each subject. Error bars indicate ± SE of the median across all trials of the same type. C, Median reaction times on trials of each type for each subject. Error bars indicate ± SE of the median across all trials of the same type. SEs were computed via bootstrapping. Solid line indicates correct responses, dashed line indicates incorrect responses. D, Trial-averaged peristimulus time histogram by cue type with corresponding raster plot for a neuron encoding cue valence. Rasters binned spikes by 10 ms for increased readability. E, Same as D, showing a neuron that encodes a linear combination of cue valence and response direction. F, Same as D, showing a neuron that encodes a nonlinear combination of cue valence and response direction. G, Target regions for dACC and vACC. The schematic shows a coronal section of one hemisphere, approximately at the middle of the recording chambers; dACC is shaded in teal, vACC in orange. Note that vACC targets included the ventral bank of the cingulate sulcus as well as the most dorsal portion of the cingulate gyrus. AP positions of recorded neurons in both subjects were +1 to +7 mm relative to the anterior extent of the corpus callosum (median AP subject M, 31 mm; subject C, 32 mm). Subject M was recorded in the left hemisphere, subject C in the right.
Feedback types
Behavior in this task was described previously (Rich and Wallis, 2014). Briefly, both subjects performed well, with slightly higher response accuracies and faster response times for positive versus negative cues (Fig. 1B,C). Reaction times varied between monkeys, with median values (across all trials from all sessions) of 490 and 678 ms for monkeys M and C, respectively. Our previous study focused on single-neuron responses in subregions of the OFC, and here we assessed similar measures in the dorsal and ventral ACC (Fig. 1G). We analyzed data from 35 sessions (21 in subject M, 14 in subject C; n = 81 and 64 dACC neurons, and 92 and 70 vACC neurons, in subjects M and C, respectively).
Valence encoding among single units
To understand how cue information is represented by single neurons, we used multiple linear regression (ordinary least squares) to model trial-by-trial firing rates (see above, Materials and Methods). During a 500 ms cue epoch (101–600 ms postcue presentation), many neurons encoded the valence of the cue (+1/−1; Fig. 1D). In some neurons, this encoding was mixed with the response direction it indicated (L/R). We quantified this as either LMS (significant main effects of cue valence and response direction on multiple regression; Fig. 1E), NLMS (significant interaction; Fig. 1F), or both. In subject M, proportions of neurons displaying pure selectivity for valence and response direction were 22 and 26%, respectively in dACC, and 26 and 12% in vACC. In subject C, the same respective proportions were 27 and 11% in dACC and 26 and 16% in vACC. However, in subject M and C, 48 and 23% of all dACC units displayed LMS, 31 and 6% displayed NLMS, and 26 and 5% displayed both. In vACC, the proportions in each monkey were 15 and 3% LMS, 9 and 1% NLMS, and 7% and 0 both, for subject M and C, respectively.
Overall, there were more valence selective neurons in dACC than vACC (subject M, χ2 = 5.20, p = 0.023; subject C, χ2 = 3.20, p = 0.074). Among these, there was a tendency toward different patterns of selectivity in each region in subject M, where more dACC neurons were selective for positive than for negative cues, and more vACC neurons were selective for negative than for positive cues, consistent with previous reports (Amemori and Graybiel, 2012; area × selectivity, χ2 = 3.66, p = 0.056; Fig. 2A). In subject C, however, this trend was not apparent (χ2 = 0.36, p = 0.55). To compare encoding strength within a region rather than counts of selective neurons, we averaged CPDs within the analysis window for each neuron and compared mean CPDs for a given population using a Wilcoxon rank sum test. For cue valence, positive- compared with negative-responsive neurons had slightly higher average CPDs in dACC in subject M (Wilcoxon rank sum, p = 0.039, W = 1859, effect size = 0.014), but there were no differences in subject C or in vACC in either subject (Wilcoxon rank sum tests, all p values > 0.3). We also assessed single-unit encoding of the instructed response direction and found similar encoding prevalence and strength, compared with valence encoding (Fig. 2B). Overall, cue valence and response direction were both robustly represented in ACC, with a tendency toward stronger encoding in dACC. There was also some evidence supporting valence selectivity in vACC, although these results were inconsistent across subjects.
ACC predominantly encodes cue valence and relative feedback. A, Top, Neurons selective for cue valence during the cue epoch were assigned an encoding preference based on the mean sign of the regression coefficient during the analysis window. Bars show proportions of neurons preferring positive (+) and negative (−) cues. Bottom, Mean (±SEM) CPD for cue valence across all single units. B, Top, Same as A but showing encoding preferences of neurons selective for response direction during the cue epoch. vACC had more neurons selective for left (L) than right (R) responses in subject M (χ2 = 4.33, p = 0.037; post hoc binomial test on proportion of left-preferring response selective neurons, p = 0.13 and p = 5.34 × 10–4 for dACC and vACC, respectively) but not subject C (χ2 = 1.01, p = 0.31). Bottom row, Average CPDs for response direction among all neurons. C, Same as A but showing encoding preferences of neurons selective for relative feedback value and CPDs for all neurons. D, Each neuron was marked as better explained by cue valence or absolute feedback during the feedback epoch by averaging the difference in
Abstract geometry in ACC. A–D, CCGP (A, C) and PS (B, D) for the cue epoch (A, B; 3 dichotomies defined over the 4 task conditions) and feedback epoch (C, D; 35 dichotomies defined over the 8 task conditions; Table 2). Each marker in A–D represents a dichotomy. Dots denote CCGP, and triangles PS. Black indicates significance at α = 0.05, gray indicates nonsignificance. Error bars indicate 95% CIs for the null distributions of the dichotomy with the matching color. E–H, CCGP (E, G) and PS (F, H) for dACC (top) and vACC (bottom) during the cue epoch (E, F; 3 dichotomies) and feedback epoch (G, H; 35 dichotomies). Thin lines indicate nonsignificance, medium indicates significance at α = 0.05, thick lines indicate significance at α = 0.01. Gray lines indicate all remaining dichotomies. I, Decoding performance for each of the three cue epoch dichotomies. Red dotted lines show mean performance across all dichotomies (shattering dimensionality). For each brain region, data on the left are decoding results for empirical data, whereas data on the right are decoding results for the tuned linear (factorized) model. Inset, Match between empirical and tuned CCGP. J, Same as I but for the feedback epoch.
Potential geometries for expected and actual outcome coding. A, Schematic showing neural responses generated as linear combinations of cue valence and relative feedback value. Representations lie on a 2D linear manifold and are captured by a 2D vector subspace of the full 3D neural activity space. B, Similar to A, except neural responses incorporate a nonlinear combination of cue valence and relative feedback, so representations lie on a 2D nonlinear manifold embedded in a 3D vector subspace (here, the full neural activity space). C, The geometry in B allows linear separation (blue hyperplane) of the zero bar versus bar change conditions necessary for absolute feedback encoding. Note that the weak nonlinear interaction the weak nonlinear interaction of cue valence and relative feedback results in low variance along the bar change dichotomy dimension. Bottom right inset, The positions of the zero bar (purple) centroids and bar change (blue/red) centroids relative to the same separating hyperplane (green) learned during training are reversed between the blue circled training set and orange circled test set. This occurs for all four ways of choosing training and testing sets. D, Abstract encoding of cue valence in which instances of each valence condition overlap. Note that encoding cue valence alone, without tracking relative feedback value, could be sufficient to interpret the 0 bar conditions (purple centroids are separated). E, Strong abstraction of cue valence relative to response direction leads to below-chance cross-condition decoding for response direction. A decoding hyperplane distinguishing left (L) from right (R) learned from the blue circled training set would actively misclassify direction on the orange circled test set. This occurs for two of four ways of choosing training and testing sets. F, Schematic of the representational geometry in which cue valence, relative feedback, and response direction are abstracted to different degrees. Vector differences (gray) linking opposite sides of the cue valence dichotomy (bold black squares) are close to parallel, resulting in a higher PS. Vector differences for the relative feedback and response dichotomies would be less and least parallel, respectively, consistent with the level of abstraction observed for each. Note that absolute feedback is not decodable here, as bar change versus no change conditions are not linearly separable. To achieve this separation, the geometry would have to incorporate a nonlinearity and bend into a fourth embedding dimension, analogously to B. Simply moving centroids within the depicted 3D vector space only moves points around on a 3D linear manifold, analogous to that in A, and would result in loss of linear separability of another dichotomy. In general, sufficiently weak and specific nonlinearities largely preserve abstract (linear) structure while allowing additional linear separation of select dichotomies.
RSA. A, B, CPDs were computed for each template correlation pattern for dACC (A) and vACC (B). Trials were aligned to the onset of feedback (solid vertical line at time, 0 ms), and black and gray dashed lines indicate median times of cue onset and motor response, respectively. Line thickness denotes level of significance as in the legend based on random permutations of the empirical correlations. C, Templates used in the RSA.
Task variables are nonrandomly and approximately linearly mixed in dACC and vACC. A, Schematic coordinate plots showing canonical response patterns encoding task variables. B, Top four standardized PCs from dACC population activity;
Condition-wise population responses for real and simulated data. A–C, Simulated neural data consisting of 145 neurons in 4 conditions corresponding to possible valence–outcome combinations. B, Fully factorized population structure, where neural responses are random linear combinations of equally weighted cue valence and relative feedback, recoverable by PCA. C, This results in randomly distributed PC loadings. A neuron x has a two-dimensional loading vector h, whose elements are respective correlations of x with the two PCs (see above, Materials and Methods). In general, n latent variables result in loadings on a unit (n −1)-sphere. D–F, Same as A, but if cue valence (PC 1) variance ≫ relative feedback (PC 2), which also has nonzero variance. Loadings are then clustered on the 0-sphere, indicating that all neurons are strongly anti/correlated with a canonical cue valence response and mostly invariant to relative feedback. G–I, Same as A–C, D–F but for empirical spiking data recorded in dACC, averaged across trials and time from 101 to 600 ms after feedback. Note that these data emphasize the separation of cue valence but still contain enough nonlinearities (a canonical response encoding bar change vs no-change as a nonlinear combination of cue valence and relative feedback, captured by a third PC (Figs. 4B, 6A–C) to permit absolute feedback decoding. Cluster structure near the first loading axis (I) indicates that more neurons are correlated with PC1 (cue valence) than expected, if cue valence and relative feedback are abstracted with equal strength, consistent with the dominant abstraction of cue valence in this population.
Next, we compared different ways the valence of trial outcomes (feedback) could be coded in this task, as an absolute signed magnitude (i.e., +1, 0, −1 for gain, nothing, loss) or as a relative value denoting the receipt of the better or worse possible outcome given the preceding cue (Table 1). We first assessed relative feedback encoding, which is uncorrelated with the preceding cue, using linear regression. In subject M, there were more vACC neurons selective for worse (negative), compared with better (positive) outcomes (area × valence selectivity, χ2 = 5.72, p = 0.017, post hoc binomial test, p = 6.62 × 10−5 in vACC), and no differences in dACC (binomial test, p = 0.28; Fig. 2C). Although the trend was similar in subject C, there were no statistical differences in valence selectivity by area (χ2 = 0.0045, p = 0.95). The strength of relative feedback encoding, as measured by CPDs, was also higher for worse outcomes in vACC in subject M (Wilcoxon rank sum, p = 3.23 × 10−4, W = 1313, effect size = −0.0033), but not C (Wilcoxon rank sum, p = 0.79, W = 1159, effect size = −7.58 × 10−5), and there were no differences in either subject in dACC (Wilcoxon rank sum, both p values > 0.3).
We also considered whether neurons encoded the absolute outcome of the trial (gain, nothing, loss). As these outcomes were correlated with the valence of the preceding cue (Table 1), we first fit each neuron with a multiple regression model that included an absolute feedback predictor and then fit the same responses with the same model but replaced absolute feedback with cue valence. We considered a neuron to be better modeled as encoding absolute feedback value if it had a significant regression coefficient for the feedback regressor and a higher R2 for the feedback model compared with the cue valence model. Conversely, we considered a neuron to be better described as encoding cue valence at the time of feedback if it had a significant regression coefficient for cue valence, and the cue valence model had a higher R2 than the feedback model. We found that absolute feedback encoding was very low before feedback onset, as expected, whereas cue encoding was high (Fig. 2D). After the onset of feedback, cue valence encoding decreased somewhat but continued to be encoded by ∼20–30% of neurons in each area throughout the remainder of the trial. Overall, in a 500 ms epoch following feedback onset, cue valence encoding dominated in both subjects and both areas (binomial test on model preference, subject M dACC, p = 0.014; vACC, p = 0.12; subject C dACC, p = 7.73 × 10−5; vACC, p = 0.0011; area × model preference for subject M,
Population decoding
Next, we assessed how cue information is coded at the level of neuron populations. To do this, we first decoded the valence and instructed response direction from pseudopopulation activity in each area. Population activity was aligned to feedback presentation and spike counts averaged within 150 ms bins, stepped forward by 25 ms, including 1200 ms preceding feedback to capture presentation of the instructive cue, and 600 ms after feedback. For each time bin, we trained and tested linear SVMs using 10-fold cross-validation to assess classification of cue valence or response direction (see above, Materials and Methods). We report accuracies below, but evaluating model performance using the area under the ROC curve yielded qualitatively similar results.
Both cue valence and response direction could be decoded from activity in both regions, with higher accuracies in dACC (Fig. 2E,F). Interestingly, despite the fact that each cue conveyed both valence and response direction information, the time course of decodability differed for these two variables. For response direction, decoding peaked just before the median time of response and subsequently decayed, whereas valence decoding increased when the cue was presented and persisted past the motor response through the brief (300 ms) delay preceding feedback and well beyond the delivery of feedback itself. This effect was present in both regions but was especially evident in dACC. In vACC, decoding of cue valence was less accurate but increased as the time of feedback approached. In addition, the inclusion of error trials, defined as those in which the monkey moved the joystick in the wrong direction, decreased decoding accuracy for response direction but not cue valence (Fig. 2E,F). This further suggests that the valence and response conveyed by a compound cue are treated as separable types of information, which may have different salience to the animal or play different roles in future behavior.
Although cue valence appeared to be persistently encoded into the feedback epoch, another possibility is that the ostensible decodability of cue valence was driven by representations of absolute feedback, which correlates with the valence of the cue. To assess this, we attempted to decode the signed magnitude of feedback received (+1, 0, or −1) from population activity. We used an ensemble classifier that handles multiclass classification by combining the results of constituent binary one-versus-all linear SVMs, where each binary classifier decoded one of the feedback conditions from the others. (Using three one-versus-one constituent classifiers yielded qualitatively similar results.) Importantly, for this analysis we balanced the number of trials in which receipt of zero feedback was preceded by positive and negative cues (see above, Materials and Methods). Even when these trials are balanced, overall decoding of absolute feedback (combined across the three constituent classifiers) should be significantly above chance if neural populations encoded cue valence but not absolute feedback because +1 feedback and −1 feedback always follow positive and negative cues, respectively. However, the zero-feedback-versus-all binary classifier should perform poorly, as instances of zero feedback are always preceded by an equal mix of positive and negative cues. We found, as expected, that overall feedback decoding was above chance in both regions preceding feedback onset, consistent with a correlative contribution of the preceding cue valence to feedback decoding. Critically, the one-versus-all classifier for zero feedback exhibited above-chance performance in dACC but showed minimal significance in vACC (Fig. 2H,J), indicating that dACC does carry information about absolute feedback, whereas vACC does not. Rather than the signed magnitude of the feedback itself, vACC encodes the valence of a predictive cue at the time of feedback.
In dACC, this analysis, alongside cue valence decoding, indicated that both cue valence and the signed magnitude of the outcome are encoded at feedback time. If dACC populations only represented the absolute value of feedback, the accuracy of a classifier attempting to decode cue valence should be closer to 0.75 because its performance would be at chance on the zero feedback trials (which constitute one-half of all trials; Table 2). However, we observe near-perfect decoding accuracy of cue valence throughout the entire trial, beginning shortly after cue onset and lasting well past feedback delivery (Fig. 2E).
Task conditions
Together, both single-unit encoding and population decoding indicate that cue valence is a dominant and persistent signal in both ACC regions. In addition, only dACC reliably carried information about the absolute amount of feedback, whereas both regions processed feedback in a relative sense. One potential reason for cue valence to be held online into the feedback epoch is to serve as a context signal necessary for interpreting outcomes in a relative manner. That is, the +1 and −1 bar outcomes have a clear interpretation, but 0 bar is ambiguous, being the better possible outcome following a negative cue (avoidance of loss) or the worse possible outcome following a positive cue (no gain despite the opportunity). This view would also explain why cue valence decoding did not decrease compared with response direction decoding on trials in which the subject erred; zero feedback is interpretable if response direction is incorrectly encoded but not if the context signaled by the cue valence is confused (Fig. 2E,F). In the following analyses, we took more direct approaches to determine the degree to which cue valence signals in ACC may serve a context-relevant function.
ACC represents valence abstractly
If cue valence sets a context for the trial, it may be useful to represent valence in a consistent format across different cues (e.g., left and right instructing cues), such that positive and negative valences are always distinguished in the population code. In other words, valence information may be abstracted to achieve context-dependent outcome coding, such as relative feedback value. Based on this idea, we assessed the abstract nature of valence encoding by investigating the representational geometry of variables in this task (Bernardi et al., 2020). For example, consider the four cues (Table 2) represented as points in a vector space, where each dimension corresponds to the firing activity of one neuron in the population. If a hyperplane separating representations of [+/R] from [−/R] can also separate [+/L] from [−/L], this cross-condition decodability suggests that the concept of positive versus negative valence generalizes across response directions, providing evidence for an abstract representation.
Bernardi et al. (2020) assessed this operational definition of abstraction by investigating cross-condition decodability within different dichotomies, which refer to unique divisions of task conditions into two sets of equal cardinality. For four task conditions during the cue epoch, there are three such dichotomies, [+/R,+/L] vs [–/R,–/L]; [+/R,–/R] vs [+/L,–/L]; [+/R,–/L] vs [–/R,+/L], with the first two corresponding to identifiable task concepts (cue valence and response direction, respectively). For each of these dichotomies, there are four cross-condition training and testing possibilities. For example, for dichotomy 1 (cue valence), one can train on population responses for [+/R] versus [–/R] and test whether the solution generalizes to [+/L] versus [–/L], train on [+/L] versus [–/L] and test on [+/R] versus [–/R], train on [+/R] versus [–/L] and test on [+/L] versus [–/R], or train on [+/L] versus [–/R] and test on [+/R] versus [–/L]. During the feedback epoch, there are eight unique conditions, corresponding to four cues and two potential outcomes each, and thus 35 dichotomies (four of which are interpretable concepts of cue valence, response direction, relative feedback, and reward bar change vs no change), with 68 training and testing possibilities for each (see above, Materials and Methods).
For each dichotomy, the average decoding performance over all training and testing possibilities is termed the CCGP (Bernardi et al., 2020). High CCGP indicates robust decodability of a concept regardless of the conditions in the training and testing sets. That is, the concept generalizes over different particular instances that vary along other irrelevant features. Using this approach, we found evidence for abstract representation of cue valence that persisted throughout the entire trial, consistent with our finding that cue valence is persistently represented in ACC. dACC exhibited stronger abstraction of valence information (higher CCGP) than vACC, although both were significant (Fig. 3A,E). In contrast, the concept of response direction had CCGP near chance levels, consistent with the interpretation that, unlike valence, response direction can but does not need to be abstracted across multiple conditions. Finally, the last dichotomy of cue conditions, which corresponded to the interaction of cue valence and response direction, exhibited below chance (nearly zero) CCGP. This suggests a weakly nonlinear geometry in which cue valence and response direction occupy nearly orthogonal dimensions, resulting in very little variance along their interaction dimension and thus active misclassification for all training and testing set combinations (Fig. 4C). The near-chance decoding of response direction further indicates that the variance along the response dimension is reduced to achieve abstraction in the other (i.e., valence) dimension (Fig. 4D,E).
We also computed the PS, an alternative method for assessing population geometry without committing to a particular choice of classifier (Bernardi et al., 2020). Briefly, stronger abstraction of a dichotomy should be characterized by a more substantial overlap of conditions on the same side of a dichotomy relative to variance across conditions on opposite sides. As such, vector differences linking centroids on opposite sides of the dichotomy should be closer to parallel (Fig. 4F). To assess this, we calculated cosine similarities of these vector differences in the neural activity space to define the PS, with higher scores indicating stronger abstraction (see above, Materials and Methods). The PS analysis largely confirmed our CCGP results. Cue valence had the highest scores in both areas, indicating the strongest abstraction (Fig. 3B,F). Compared with CCGP, evidence for abstraction of response direction was slightly stronger with the PS, but both measures found the remaining interaction dichotomy to be below chance levels, which is consistent with the abstraction of the other two variables. Overall, both salient task variables (cue valence and response direction) are abstracted in both regions during the cue epoch, although more strongly in dACC than vACC, and cue valence was represented more abstractly than response direction.
Next, we used the same methods to determine whether task concepts are abstractly represented during the feedback epoch. Our decoding results showed that cue valence was strongly represented at this time, and here we found that this signal is also abstracted. The highest measures of both CCGP and PS in both regions were for the dichotomy corresponding to cue valence (Fig. 3C,D,G,H). This effect was particularly strong in dACC, where cross-condition decoding accuracy within the feedback epoch approached one and dominated all other dichotomies, suggesting that population codes in dACC (and to a lesser extent vACC) strongly prioritize cue valence abstraction. During this epoch, relative feedback was also represented in an abstract fashion in both areas. However, response direction was much different and even displayed below-chance CCGP. This indicates that generalizability of response information was de-emphasized by decreasing variance between conditions on opposing sides of a hyperplane separating left versus right (Fig. 4D). It is likely that this occurred to support the extreme abstraction of cue valence, resulting in classifiers trained under one subset of conditions that actively misclassified conditions from the held-out test set. For example, a decoder tasked with separating the [+/L] and [−/R] cues by response direction will learn a hyperplane more orthogonal to the valence dichotomy and will thus perform below chance on the [−/L] and [+/R] test set as response direction is preserved but valence is reversed (Figs. 4C,E).
Finally, following Bernardi et al. (2020), we computed shattering dimensionalities, defined as the mean cross-validated decoding accuracy across all possible dichotomies. The goal of this analysis is to better understand the geometries of task representations in high-dimensional neural activity space. For example, consider four conditions defined by all possible combinations of cue valence and relative outcome (Table 1). If neurons represent these four conditions as linear combinations of the two variables, then population representations of these conditions would, save for noise, lie on a linear 2D manifold, allowing linear separation of the two dichotomies corresponding to valence and relative outcome but not the last dichotomy corresponding to receipt of feedback (bar change vs no change; Fig. 4A). To linearly separate the final dichotomy (resulting in a higher shattering dimensionality), the neural representation would have to become nonlinear (lying near a nonlinear 2D manifold), allowing it to bend into a third embedding dimension (Fig. 4B,C,F). In other words, a higher shattering dimensionality arises from a more nonlinear population geometry with higher dimensionality as measured by linear methods such as PCA. Experimental and theoretical work has argued that prefrontal cortical neurons tend to have firing rates that encode nonlinear combinations of relevant task variables for this purpose (Rigotti et al., 2013; Fusi et al., 2016). However, abstract neural representations could sacrifice maximal dimensionality by restricting possible geometries to those supporting cross-condition generalization, especially once the task is well learned (Bernardi et al., 2020). We therefore tested the degree to which our recorded neural data exhibits nonlinearities compared with a purely linear model. The shattering dimensionality can be thought of as an overall statistic measuring such nonlinearity, and hence linear separability of conditions in the population geometry, without commenting on the interpretability of any of the dichotomies (see above, Materials and Methods).
During both the cue and feedback epochs, we found empirical shattering dimensionalitys slightly but not substantially higher than that of a model in which neural responses are linear combinations of key variables (cue valence and response direction during the cue epoch, plus relative outcome during the feedback epoch; see above, Materials and Methods; Fig. 3I,J) whose CCGPs match empirical values (cue epoch empirical vs model shattering dimensionality, 0.89 vs 0.81 for dACC and 0.65 vs 0.63 for vACC; feedback epoch empirical vs model shattering dimensionality, 0.71 vs 0.62 for dACC and 0.62 vs 0.59 for vACC; Fig. 3I,J, insets). Larger differences in dACC are consistent with its larger proportion of nonlinear mixed selective neurons compared with vACC and with the decodability of absolute feedback in dACC but not vACC (Fig. 2). However, compared with shattering dimensionalitys in other tasks (Bernardi et al., 2020), our data suggest that neural responses in the present task are closer to linear combinations of task variables, with only enough nonlinearity to permit separation of a limited number of other dichotomies (such as bar change vs no change, which is necessary for absolute feedback encoding).
Representational similarity and clustering
As abstraction is driven by clustering of conditions in neural activity space, we reasoned that it should also be reflected in similarity measures, such as correlations and clustering, among conditions. For instance, if cue valence is abstractly represented, two conditions should be highly correlated if they share the same cue valence and anticorrelated if they do not. This can be captured by an ideal correlation pattern (which we term “a template”) in which all conditions featuring the same cue valence are perfectly correlated and those with opposite cue valences are perfectly anticorrelated. We performed RSA (Kriegeskorte et al., 2008) to assesses the contributions of six such templates to condition-wise correlation patterns observed in recorded neural data, using linear regression (Fig. 5C). Empirical correlations were calculated from all pairings of eight p-dimensional vectors of trial-averaged firing rates, where each vector corresponds to mean activity of p neurons in one of eight different task conditions. This was done in sliding 150 ms time bins, stepped at 25 ms. We then calculated the CPD for each template regressor to determine its marginal contribution to the empirical condition correlations at each point in time throughout a trial; p values were computed by comparing results to a null distribution generated by shuffling the empirical correlations (see above, Materials and Methods).
We found that the template matrix for cue valence made the greatest marginal contribution to the model in both regions, consistent with our preceding analyses (Fig. 5). Representational similarity for cue valence was also stronger in dACC compared with vACC and, in line with decoding results, increased in vACC later in the trial, approaching the time of feedback. After the occurrence of feedback, the most robust and consistent signals in both regions were cue valence and relative feedback, also consistent with our previous analyses, and latencies were again shorter in dACC. Representational similarity of absolute feedback was intermittently present in dACC, and less so in vACC. Together, these results suggest an important role for representations of cue valence that persist during feedback processing.
In addition to cue valence, there was significant representational similarity among conditions featuring the same response direction in dACC (Fig. 5A). This was predominantly around the time the response was made and decayed by the time of feedback onset. In contrast, vACC showed little representational similarity for response direction (Fig. 5B), consistent with a previous study that found more motor correlates in dACC (Cai and Padoa-Schioppa, 2012). This is also consistent with the relative lack of response abstraction in vACC, although this information was decodable above chance levels in the same population. This confirms that significant RSA results are more related to abstraction than to traditional decoding, in that both representational similarity and abstraction require relative overlap between neural responses to stimuli featuring the same value of a task variable (even as other task variables change). Decoding, however, requires no such overlap but only that population responses be linearly separable. Thus, representational similarity and abstraction imply decodability, but the converse does not hold (Diedrichsen and Kriegeskorte, 2017). Finally, the interaction between valence and response direction accounted for little of the observed correlation patterns throughout the trial in either region.
To more directly study neural population variance across conditions, we used PCA to identify latent factors underlying population responses. The activity of each of the p neurons was averaged both within a 500 ms analysis window from 101 to 600 ms following feedback and across all trials within each of eight conditions (4 cues × 2 possible outcomes each). The activity of each neuron was then z-scored across conditions. PCA of the resulting
PCA also yields loadings that represent the contribution of each standardized PC to the observed standardized activity of each neuron (within the fixed time window). As PCs can be interpreted as canonical response patterns across different task conditions, often mapping onto a single task variable, we can assess the loadings to understand whether single neurons mix these patterns randomly or in characteristic ways that result in clustering among the loading vectors (Raposo et al., 2014). Previous work in the OFC of macaques and rodents has suggested that prefrontal populations mix task variables nonrandomly, producing clustered responses (Hirokawa et al., 2019; Onken et al., 2019), and theoretical models have been proposed to explain these results (Dubreuil et al., 2022). To quantify this in ACC, we used the ePAIRS algorithm (Raposo et al., 2014), which assesses the cluster tendency of a set of vectors. The loading weights on the top PCs required to explain 90% of population variance across conditions demonstrated significant cluster tendency in both dACC (Fig. 6D) and vACC (Fig. 6E), similar to reports in OFC (Hirokawa et al., 2019; Onken et al., 2019). To determine whether loadings associated with particular PCs drove this clustering, we iteratively removed variance in the population captured by the eigenvector associated with each PC (see above, Materials and Methods). Removing the first dimension was sufficient to destroy cluster structure in vACC but not dACC (Fig. 6E), suggesting that the first PC, which primarily varied with cue valence, drives clustering in vACC. In dACC, removal of the two leading PCs was necessary (Fig. 6D). This suggests that clustering in dACC was also driven mainly by the tendency of neurons to represent cue valence, as well as relative feedback. The dominant abstraction of cue valence caused valence information to appear more strongly in both single units and populations, even as other task variables contributed lower variance (Fig. 7).
Cue valence drives across-condition variance and population dynamics
Finally, to summarize our findings and visualize the temporal evolution of task selective information in each region, we projected population activity onto 3D subspaces capturing maximal population variance across time and conditions, as identified by PCA. We used firing rate matrices of size
Separation of cue valences is a dominant and stable feature of neural dynamics in ACC. A, PCA trajectories show the first three canonical responses through time, across the eight conditions (1 trajectory for each condition), for dACC (left) and vACC (right). Colors correspond to signed magnitude of feedback, and points represent time bins. Darker and lighter points correspond to earlier and later time points in the trial, respectively. Feedback onset is marked with black circles. B, Same as A but with each PC plotted separately as a function of time. C, Matrices whose rows are the first three right eigenvectors of the neuron correlation matrix for dACC (top) and vACC (bottom). Eigenvector elements define an activation pattern over neurons (a neural mode). The associated canonical response is the activation of this mode at each point in time, across conditions.
Discussion
Here, we investigated how ACC represents expected outcomes in relation to actions and the outcomes received. Single-unit responses were heterogeneous, encoding a variety of task-relevant information and varying somewhat across individuals. There was also interindividual variability in behavioral performance, as subject C performed less accurately and had slower reaction times, and concomitantly had lower proportions of neurons encoding task-relevant information. Such differences are common and likely represent natural variability among individuals, potentially relating to attention, motivation, or task proficiency. In contrast, however, population representations were more consistent and demonstrated simpler structure, primarily organized around a persistent separation of conditions with the potential for gain or loss on each trial. This is similar to a study of rodent ACC that found that heterogeneous single-unit responses contributed to population activity that persistently signaled valence throughout a trial and the following intertrial interval (Caracheo et al., 2018). In addition, we found that trial outcomes were primarily coded relative to the valence of initial cues. Based on this, we suggest that potential outcomes encoded by ACC set an expectation, or context, for interpreting ensuing outcomes, and this role explains the strength and persistence of valence encoding in the present task.
Separation of cue valence and motor response representations
The ACC is strongly connected to the motor system (Bates and Goldman-Rakic, 1993; Calderazzo et al., 2021), and lesions encompassing dACC and vACC impair the use of reward contingencies to guide action, but not stimulus, selection in both monkeys and humans (Kennerley et al., 2006; Rudebeck et al., 2008; Camille et al., 2011). ACC neurons in primates, as well as neurons in medial frontal regions in rodents, encode actions in concert with diverse representations of rewards and other task-relevant information (Ito et al., 2003; Matsumoto et al., 2003; Mulder et al., 2003; Williams et al., 2004; Kennerley and Wallis, 2009; Rich and Shapiro, 2009; Hayden and Platt, 2010; Hayden et al., 2011; Kennerley et al., 2011; Hyman et al., 2013; Luk and Wallis, 2013; Blanchard and Hayden, 2014; Simon et al., 2015; Del Arco et al., 2017; Grunfeld and Likhtik, 2018; Hunt et al., 2018; Kaminska et al., 2021). In particular, some single neurons encode interactions between actions and outcomes or task contingencies, which has been interpreted as a mechanism linking these pieces of information (Matsumoto et al., 2003; Mulder et al., 2003; Luk and Wallis, 2013; Grunfeld and Likhtik, 2018). In contrast, we did not find interactions between actions and reward contingencies at the level of neuron populations. Instead, ACC appeared to treat this information separately. First, representations of cue valence and motor responses had different dynamics. Valence was strongly represented throughout the trial, whereas response direction was encoded primarily when responses were executed, appearing later and disappearing earlier. Second, error trials reduced the decodability of response directions while leaving cue valence decoding intact. Third, valence and response had independent influences on population geometries. These two variables were orthogonalized and abstracted in the neural activity space following cue presentation, with minimal nonlinear interaction. On feedback, cue valence continued to be strongly abstracted, whereas response direction was not. Cross-condition decoding of the latter even fell below chance, indicating that classifiers trained on one subset of conditions actively misclassified response directions from held-out conditions. This occurred because generalizability of response information was reduced to support the abstraction of cue valence, as extreme variance along the cue valence dimension caused some decoders tasked with separating response direction to learn cue valences instead. Together, these distinctions in population coding depend on untangling compound cue information into separate latent factors to capture the independence of valence and action information. Consistent with our results, other studies have emphasized that action-contingency interactions are relatively uncommon in ACC, and these factors tend to be represented independently (Hayden and Platt, 2010).
Valence contexts shape ACC representations
Although interactions between actions and reward contingencies were not encoded by ACC populations, we did find evidence that ACC is involved in context-dependent behavior. Probabilistic trial outcomes allowed us to dissociate neural responses reflecting the valence of predictive cues and the feedback itself. Across multiple analyses, cue valence was represented throughout the feedback epoch, even increasing in vACC, and we suggest that this is because cue valence was important not just for anticipating potential outcomes but for interpreting them appropriately. That is, no change to the reward bar could be considered either an optimal or suboptimal outcome, with correct interpretation contingent on the valence of the cue on that trial. From this view, the valence of the cue sets the context for interpreting the outcome.
In line with this, ACC predominantly coded feedback relative to the valence context on each trial, as either the better or worse possible outcome given the preceding cue, rather than as an absolute, signed magnitude. This is consistent with previous suggestions that ACC in rodents plays an important role in coding contextual information (Holroyd and Yeung, 2012; Hyman et al., 2012; Ma et al., 2016; Umemoto et al., 2017). Notably, this task could be learned from actual feedback alone (i.e., by seeking gains and avoiding losses), and the 0 bar condition could be interpreted by encoding both cue valence and feedback magnitude. Neither of these strategies require tracking, let alone abstracting, relative feedback value, yet this was the dominant format for representing trial outcomes. Similarly, another study found that single units in ACC coded reward values relative to alternatives available in the trial block, rather than as absolute magnitudes (Sallet et al., 2007). In that case, trial blocks with different rewards provided contextual information, emphasizing that contexts need not be valenced to have an impact on ACC coding.
Abstract representations in ACC
We also found that the geometric structure of population responses primarily emphasized abstract representations of cue valence. Following Bernardi et al. (2020), we determined how well a decoder trained on one set of stimuli performs on a different stimulus set. If the concept is abstractly represented, the decoding rule should generalize. Such generalization occurs when conditions on different sides of one dichotomy (e.g., positive vs negative for cue valence) are widely separated in the neural activity space relative to other dichotomies. This results in a more reliable representation of the concept across conditions. Moreover, this separation was stable across relevant time scales and accommodated encoding of other task information.
This type of abstraction is consistent with the interpretation of cue valence as a context signal, insofar as context-dependent behavior entails applying a general principle or rule across various instances. Indeed, abstracted context information has been shown in ACC, as well as dorsolateral prefrontal cortex and hippocampus, in a contingency reversal task (Bernardi et al., 2020). In that study, two other task variables, value and action, were also represented abstractly. Similarly, we found evidence for abstraction of relative feedback in both dACC and vACC, as well as response direction around the time of motor execution. However, the abstraction of cue valence dominated here, producing functional clustering in both regions. Although abstraction can occur with random mixing of task variables, this functional clustering is consistent with representational geometries that strongly prioritize the separation of valence contexts, causing this variable to appear more prominently across the population (Fig. 7).
Abstraction in ACC suggests linear disentanglement
Here, we operationally defined abstraction as encoding of a concept not dependent on any particular instance thereof, as when different cues with the same valence are represented identically. This type of encoding implies a relative invariance to other sources of information. That is, the response directions instructed by two cues of the same valence can differ, but this does not have an impact on valence coding. Learning invariant representations can be accomplished by decomposing inputs into separate generative factors, such that the representation of each factor lies in its own subspace that is invariant to transformations defined by the others. This process is referred to as untangling or disentangling and has been identified as part of the solution to many complex tasks in neuroscience and machine learning (Bengio, 2009; Achille and Soatto, 2018a,b; Higgins et al., 2018; Ansari and Soh, 2019). Disentangled representations of cue valence and response direction in our task would mean that the transformation of moving in the neural activity space from left to right motor responses should not affect the representation of cue valence, which is consistent with our data. In addition, a simple PCA of averaged neural activity was remarkably effective for identifying distinct and orthogonal latent dimensions, along which components corresponded to single task factors. Crucially, linear combinations of these three leading components explained the large majority of population variance, and shattering dimensionalities closely matched models where responses are linear combinations of these key task variables. Together, these results suggest that condition centroids lie near the vertices of a three-dimensional hypercuboid embedded in neural activity space, such that transformations for each factor are roughly linear and act primarily on separate orthogonal vector subspaces that encode single latent factors, consistent with linearly disentangled representations (Fig. 4) that allow generalization of linear readouts across conditions.
Conclusions
Here, we found that population activity in dACC and vACC exhibits robust low-dimensional structure in a stimulus-motor mapping task. This complements a growing literature showing that cortical representations of well-learned tasks typically occupy less than the high-dimensional potential of the recorded neurons (Ganguli et al., 2008; Rigotti et al., 2013; Sadtler et al., 2014; Gao and Ganguli, 2015; Fusi et al., 2016; Chaudhuri et al., 2019; Bernardi et al., 2020; Cueva et al., 2020; Flesch et al., 2022). Lower-dimensional representations are less flexible, in that fewer combinations of variables can be read out linearly, which may slow new learning (Flesch et al., 2022) but may allow key task variables to be represented in a more robust and generalizable fashion. We found that low-dimensional representations in ACC are shaped by potential outcomes that provide context for interpreting ensuing events. Based on this, we suggest that contextual information is a primary driver of population responses in ACC, with additional information integrated into this geometry as needed. These representations likely support context-dependent operations, such as encoding outcomes relative to expectations.
Footnotes
This work was supported by National Institutes of Health–Office of Extramural Research Grants R01DA19028 and P01NS040813 to J.D.W. and R01MH121480 and a Hilda and Preston Davis Foundation grant to E.L.R. We thank Feng-Kuei Chiang, Pooja Viswanathan, Jessica Overbey, and Peter Rudebeck for discussion and comments on the manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Erin L. Rich at erin.rich{at}mssm.edu