Abstract
The ventral frontal cortex (VFC) in macaques is involved in many affective and cognitive processes and has a key role in flexibly guiding reward-based decision-making. VFC is composed of a set of anatomically distinct subdivisions that are within the orbitofrontal cortex, ventrolateral prefrontal cortex, and anterior insula. In part, because prior studies have lacked the resolution to test for differences, it is unclear if neural representations related to decision-making are dissociable across these subdivisions. Here we recorded the activity of thousands of neurons within eight anatomically defined subdivisions of VFC in male macaque monkeys performing a two-choice probabilistic task for different fruit juice outcomes. We found substantial variation in the encoding of decision variables across these eight subdivisions. Notably, ventrolateral Area 12l was unique relative to the other areas that we recorded from as the activity of single neurons integrated multiple attributes when monkeys evaluated the different choice options. Activity within Area 12o, in contrast, more closely represented reward probability and whether reward was received on a given trial. Orbitofrontal Area 11m/l contained more specific representations of the quality of the outcome that could be earned later on. We also found that reward delivery encoding was highly distributed across all VFC subdivisions, while the properties of the reward, such as its flavor, were more strongly represented in Areas 11m/l and 13m. Taken together, our work reveals the diversity of encoding within the various anatomically distinct subdivisions of VFC in primates.
Significance Statement
The ventral frontal cortex (VFC) is essential for flexible decision-making and is composed of many anatomically defined subdivisions. How neural representations related to decision-making vary or not between these subdivisions is unclear. Here we recorded single-neuron activity from eight anatomically distinct subdivisions of VFC, while macaques made choices between stimuli based on the probability of receiving different flavored fruit juices. We report that neural representations across these subdivisions were dissociable. Area 12l exhibits the most integrated representations of decision variables at the level of single neurons. In contrast, activity in Area 12o was closely related to reward probability, whereas activity in Areas 11m/l and 13m represented juice flavor. Thus, neural representations are distinct across anatomically separable parts of VFC.
Introduction
The ventral frontal cortex (VFC) in primates is involved in a multitude of cognitive and affective functions and plays a prominent role in reward-guided decision-making (Rushworth et al., 2011; Padoa-Schioppa and Conen, 2017; Murray and Rudebeck, 2018). Specifically, lesions or focal disruption of activity in either of the two parts of the VFC, orbitofrontal cortex (OFC) or ventrolateral prefrontal cortex (vlPFC), impact the ability of humans and nonhuman primates to choose adaptively based on different costs and benefits (Izquierdo and Murray, 2004; Rudebeck et al., 2013, 2017; Murray et al., 2015; Reber et al., 2017; Folloni et al., 2021). OFC and vlPFC are, however, not anatomically homogeneous structures. Within these parts of the frontal lobe, researchers have identified subdivisions that are distinct based on their cytoarchitecture and receptor densities (Walker, 1940; Morecraft et al., 1992; Carmichael and Price, 1994, 1995a,b, 1996; Ongur and Price, 2000). For instance, Area 12 is composed of four unique areas that are dissociable based on different anatomical features such as the presence of a granule cell layer, degree of laminar SMI-32 staining, or dopamine receptor densities to name just a few (Carmichael and Price, 1994; Rapan et al., 2023). Subdivisions of OFC and vlPFC are also distinct based on the connections that they send and receive from other parts of the prefrontal cortex, temporal, premotor, subcortical, and sensory areas (Carmichael and Price, 1995a,b, 1996). Such distinct connectivity between dissociable subdivisions of ventral frontal is also observable at the level of fMRI functional connectivity in both human and monkeys (Kahnt et al., 2012; Rapan et al., 2023). Despite these clear differences, it is less clear if such anatomical distinctions are associated with variation in function across these subdivisions of OFC and vlPFC.
A number of studies have revealed both anteroposterior and mediolateral functional differences within VFC (Kobayashi et al., 2010; Klein-Flügge et al., 2013; Rich and Wallis, 2014; Murray et al., 2015). For instance, Murray and colleagues found that anterior OFC Area 11 and posterior OFC Area 13 make dissociable contributions to updating and using the current value of rewards to guide choices in a reinforcer devaluation task (Murray et al., 2015). In contrast, Rich and Wallis (2014) reported that there was a gradient of encoding within the mediolateral extent of VFC, where a greater proportion of neurons within lateral regions represented the valence of outcome that macaque would receive for choosing a course of action. Extending this work to directly link the patterns of activity to anatomically distinct subdivisions of VFC would provide further insight into variation in function in VFC.
Here, we recorded the activity of thousands of single neurons from VFC of macaque monkeys while they performed a two-choice probabilistic task for different outcome flavors. During and following recordings, we confirmed the location of electrodes using CT scans and postmortem histological reconstruction, allowing us to identify eight cytoarchitectonic subdivisions within OFC, vlPFC, and agranular insula (AI). We compared representations across these subdivisions using complementary encoding/decoding approaches at the level of single neurons and populations of neurons. In particular, we assessed the prevalence, strength, and stability of the representation of multiple decision variables within the activity of single neurons in order to understand the diversity of neuronal encoding in VFC. We then turned our attention to the temporal evolution of the representational geometry in the activity of populations of neurons in the different subdivisions. Taking this approach allowed us to understand what and when information could be read out from each subdivision. These complementary analyses revealed marked differences between areas. Neurons in Area 12l exhibited the most integrated representations of decision variables, whereas neural activity within Area 12o was closely related to reward probability. OFC Areas 11m/l and 13m exhibited differential representations of outcome flavor over time.
Materials and Methods
Subjects
Subjects consisted of two adult male rhesus macaques (Macaca mulatta), Monkeys M and X. They were 8 and 5.5 years old and weighed 11.9 and 7.9 kg, respectively, at the start of the neurophysiological recordings. Animals were grouped-housed and kept on a 12 h light/dark cycle and had access to food 24 h a day. Throughout training and testing each monkey's access to water was controlled for 5 days per week. All procedures were reviewed and approved by the Icahn School of Medicine Animal Care and Use Committee.
Apparatus
Monkeys were trained to sit head restrained in a custom primate chair situated 56 cm from a 19 inch monitor screen. Choices were reported using gaze location, which was monitored and acquired at 90 fps using an infrared oculometer (PC-60, Arrington Research). Juice rewards were delivered to the monkey's mouth using custom-made air–pressured juice dispenser systems (Mitz, 2005). Trial events, reward delivery, and timings were controlled using MonkeyLogic behavioral control system (https://monkeylogic.nimh.nih.gov), running in MATLAB (version 2014b, MathWorks). Raw electrophysiological activity was recorded using an Omniplex data acquisition system (Plexon) and sampled at a 40 kHz resolution. Spikes from putative single neurons were automatically clustered off-line using the MountainSort plugin of MountainLab (Chung et al., 2017) and later curated manually based on the principal component analysis (PCA), interspike interval distributions, visually differentiated waveforms, and objective cluster measures (isolation, >0.75; noise overlap, <0.2; peak signal-to-noise ratio, >0.5; firing rate, >0.05 Hz; Stoll and Rudebeck, 2024).
Behavioral task
Monkeys were trained to perform three closely related tasks during each session, including a single option, instrumental, and dynamic probabilistic tasks (Stoll and Rudebeck, 2024). Our current analyses focused on instrumental trials only. In this task, monkeys could choose between two options presented simultaneously on the right and left side of the screen. Each option was composed of two features: an external-colored rectangle indicating which outcome flavor (out of two possible juices per session) monkeys could earn and a second central rectangle, more or less filled, indicating the probability at which this particular outcome flavor would be delivered at the end of the trial. During a given session, monkeys were faced with options containing one of the two possible colors (randomly selected from a set of nine colors) associated with two different juice flavors (randomly picked from a set of five, which included apple, cranberry, grape, pineapple, and orange juices, diluted in 50% water). Probabilities used were from 10 to 90% (by steps of 20% for Monkey M and 10% for Monkey X).
Monkeys initiated a trial by fixating a central fixation cross for 0.7–1.3 s (steps of 0.3 s). Monkeys were then free to look at the two options, each composed of an outcome flavor and probability, which were displayed for 0.4–0.8 s (steps of 0.2 s, pseudorandomly). The stimulus was then turned off for 0.2 s, and two response boxes appeared on both sides of the previously shown options (three possible locations equidistant to the options’ locations; bottom left/right, center left/right, top left/right). Monkeys had to fixate the response box on the side of the desired option within 8 s to make their choice. To register their choice, subjects had to maintain fixation on the response box corresponding to that option for a minimum of 0.25 s, at which time the other response box associated with the unchosen option would disappear. The 0.25 s fixation time required to signal their choice meant that subjects could change their choice within a trial. Continued fixation was required for an additional 0.6–1.2 s (steps of 0.3 s). At that point, the selected response boxes would disappear for 0.3–0.7 s (steps of 0.2 s). Feedback was then delivered. Here, both options were presented again at the same locations, with the selected stimulus option initially flashing (five times 0.1 s ON followed by 0.1 s OFF, total time of 0.5 s, indicated by ×5 in Fig. 1A), before staying on the screen for the duration of the reward (if delivered) and an additional 0.5 s. In rewarded trials, monkeys received 2–3 pulses of 0.03–0.06 s of fluid (separated by 0.1 s each, 0.25–0.36 ml total reward per trial of the outcome flavor and at the probability indicated by the selected option). Nonrewarded trials were matched in time to trials that included reward delivery. Finally, rewarded trials were followed by a 2 s intertrial interval (ITI). Unrewarded trials were followed by a 3.5–4 s ITI. If monkeys failed to maintain fixation when required, a large red circle was presented at the center of the screen for 1 s, followed by a longer ITI (4–6 s for Monkey M, 3–4 s for Monkey X). Failure to initiate a trial by looking at the fixation cross within 6 s of its appearance resulted in the same red circle and associated ITI.
Task and behavioral performance. A, Schematic representation of the behavioral task. Following a central fixation period, monkeys were shown a set of two stimuli, each comprised of a background color (indicating the outcome flavor that could be earned) and a central gauge (indicating the probability of receiving such outcome). Monkeys reported their choices by maintaining fixation on one of the two response boxes located on the side of the chosen stimulus. After a feedback period, a reward was delivered (or omitted) given the characteristics indicated by the chosen stimulus (flavor and probability). B, The left panel shows the adjusted coefficient of determination observed when monkeys’ choices were fitted using a logistic regression that included the log ratio of probabilities associated with each outcome flavor (ratioPB) and the flavor of the chosen outcome on the previous trial (Prev Flavor or PrevFL). The right panel shows the estimates associated with both regressors. Single dots represent individual sessions for Monkeys M (mk M, green) and X (violet), with box-and-whisker plots indicating the median (central line), interquartile range (box), and standard deviation. C, Predicted probability of choosing Flavor J1 against the log ratio of probabilities associated with the two outcome flavors for each session collected in Monkeys M (left) and X (right). Dashed lines represent the average choice probabilities across sessions. Monkeys’ choice depended on the offered probability (sigmoid function) and the outcome flavor (different bias from session to session).
The two options were associated with different outcome flavors (Juice 1 vs 2, referred to as different-flavor trials) in half of the trials for Monkey M and in three out of four trials for Monkey X. In these trials, the probability was always different for the two outcomes in Monkey M but could be either different or similar in Monkey X (e.g., Juice 1 at 70% vs Juice 2 at 70%). In the remaining trials (one-half for Monkey M and one out of four for Monkey X), the two options were associated with the same outcome flavor (e.g., Juice 1 vs Juice 1, referred to as same-flavor trials) but with different probabilities. Same-flavor trials were not considered in the current analyses.
Surgical procedures and neural recordings
Surgical procedures and details on neural recordings were previously described (Stoll and Rudebeck, 2024). In brief, monkeys were implanted with a titanium head restraint device and a form-fitted plastic recording chamber that contained a 157-channel semichronic microdrive system (Gray Matter Research; Fig. 2A) housing glass-coated electrodes (1–2 MΩ at 1 kHz; Alpha Omega Engineering). The current dataset contained neurons recorded across a total of 72 and 86 independently moveable electrodes that targeted VFC (Monkey M and X, respectively).
Recording locations and anatomical confirmation. A, A model of the recording electrodes (gray) and extent of the targeted areas (yellow) aligned on the MRI on Monkey X. B, A list of areas considered, represented on a ventral view of the macaque frontal cortex and colored as in the following figures. C, Coronal section of a Nissl staining at +36 interaural level in Monkey M and highlighting four electrode tracks (small arrows). D–F, SMI-32 stain at three anteroposterior levels (panel D, +42 mm; E, +33 mm; and F, +29 mm interaural) in Monkey M. Large arrows represent transitions between subdivisions. Colored rectangles represent zoomed in portions displayed in the following panels. G–K, A detailed view of transitions between distinct subdivisions. A, anterior; P, posterior; M/m, medial; L/l, lateral; D, dorsal; V, ventral; o, orbital; r, rostral; PrCo, precentral opercular cortex.
Recording locations were confirmed using several approaches. First, we recorded the cumulative depth of each electrode as they were being lowered and tracked the changes in background noise and electrophysiological activity suggestive of white/gray matter transitions. We also acquired CT images at different time points during the period where we were recording from VFC, which were then coregistered to postoperative MRIs to estimate the location of each electrode within the brain. Finally, after sacrifice and brain extraction, we captured and digitalized block-face images of every brain section which were later stained, as these showed clear marks of the electrodes’ track. Combined with microscope observations of histological stained sections (see below for more details) and using the Free-D software (Andrey and Maurin, 2005), this allowed us to reconstruct the precise anatomical location of every electrode within the brain.
Tissue preparation and immunohistochemistry
Following recordings, monkeys were deeply anesthetized and transcardially perfused with 4% formaldehyde in phosphate-buffered saline. The brain was then extracted, postfixed, and cryopreserved before being shipped to FD NeuroTechnologies for tissue preparation and staining. Detailed methods were previously reported in Stoll and Rudebeck (2024). Briefly, the recorded brain hemispheres were further cryoprotected in solution before fast freezing in isopentane. Serial sections of 50 μm were then cut coronally. A series of four consecutive sections were collected and stained separately (resolution of 200 μm). Specifically, we stained every first section of each series using cresyl violet solution (Nissl staining), while the second and third series were processed for calbindin (using mouse monoclonal anti-calbindin-D-28K antibodies, 1:1,000 dilution; Millipore Sigma) and SMI-32 (using mouse purified anti-neurofilament H nonphosphorylated antibodies, 1:12,000 dilution; BioLegend) immunohistochemistry.
Defining neuroanatomical boundaries
Following the approach taken by Carmichael and Price (1994), we looked for variation in the composition of the cortex across the VFC of each of our subjects on the Nissl, SMI-32, and calbindin-stained sections. We specifically chose these three stains as they qualitatively provided the greatest resolution to discern the areas of interest within VFC, namely, parts of Areas 11, 12, 13, and AI. Furthermore, recent anatomical analyses of the macaque frontal cortex by Rapan and colleagues have largely supported Carmichael and Price's parcellation of VFC subdivisions (Rapan et al., 2023).
All sections were inspected at magnification between 2× and 20×, as required, using either Nikon or Zeiss light microscopes. The precise definitions of each area are provided in Table 1, and examples of areal boundaries on stained sections are shown in Figure 2. In each monkey, we were reliably able to discern Areas 13l, 13m, 12m, 12r, 12l, and 12o. While Areas 11m and 11l could be reliably differentiated in one subject, they were harder to identify in the other. We therefore combined these two areas into one which we label 11m/l. We did not attempt to discern subdivisions of AI as the number of neurons in each would have been too few to meaningfully analyze.
Experimental design and statistical analyses
Data analyses were performed off-line using custom MATLAB scripts (MATLAB version R2022a, MathWorks) which are available at https://github.com/RudebeckLab/POTT-carto. Here, we focused our analyses on the instrumental task, specifically the response of neurons during trials where different outcome flavors were offered.
Behavior
Monkeys’ choices were analyzed using logistic regressions (function fitglm in MATLAB), by fitting the odds of choosing one of the two flavors in trials where the two options were associated with different flavors using the following model:
Preprocessing of neurophysiological data
The tuning of neural activity relative to decision variables was assessed at the level of single neuron and neural population. Across the analyses described in the following sections, we extracted the tuning likelihood (i.e., whether information could be encoded/decoded), the tuning strength (i.e., whether these representations were reliable), and how the tuning of neurons to different variables were related to each other over time. Neurons were included based on the quality of isolation only, with no response pattern or firing rate restrictions. Spiking activity for each trial was first smoothed using a 50 ms Gaussian kernel before being averaged over 20 ms bins. Neurons’ firing rates were aligned to multiple events across trials (central fixation, stimulus onset, response fixation, feedback and reward onsets). Our analyses were either performed on these sliding windows (e.g., single-neuron ANOVAs) or applied on the average firing rate for each neuron around events of interest, notably the stimulus (100–700 ms following the stimulus onset) and reward (100–700 ms following reward delivery) periods. Note that the reward period was redefined as being from 100 to 600 ms when assessing whether information was maintained over time (see below and Fig. 9). This was done to keep time windows of maximal but similar length across events, with the feedback period being limited to 500 ms before the reward was delivered. Because both options were presented simultaneously, unambiguously separating pre-/post-decision aspects of the choice is fraught (as previously reported, Raghuraman and Padoa-Schioppa, 2014) due to the high correlation between the attributes of the offered and chosen options. Consequently, we focused on how neural activity across VFC signaled chosen attributes, which were more strongly and consistently represented in our data. As a result, we did not use/compare the timing of representations of the different variables as a proxy of the unfolding of the decision processes (e.g., from valuation to choice). Given that monkeys rarely chose the lowest probability and to ensure our neural encoding and decoding analyses are well-powered, we only considered four levels of chosen probabilities (30–50–70–90%). Results are provided for each monkey as well as combined (indicated by individual symbols in the plots). When performing cross-condition decoding and subspace analyses (Figs. 8, 10), we only considered neurons with an average firing rate >1 Hz (across the considered trials and time bins) and at least five trials for each considered condition.
ANOVA on single-neuron responses
We assessed the tuning of neurons’ binned firing rates using two different ANOVAs. For this analysis, the firing rate for each neuron was min–max normalized across trials and time bins. First, we explained the variance in the firing rate from each neuron and at every time bin using three-way ANOVAs which included the following factors: chosen flavor (two levels, categorical), chosen probability (four levels, 30–50–70–90%, linear), and chosen side (two levels, categorical). We also assessed the neurons’ tuning to reward variables using two-way nested ANOVAs which included reward delivery (two levels, categorical) and reward flavor when delivered (three levels, J1, J2, and none, categorical, nested under reward delivery). Neurons were considered as encoding a task factor significantly if they discriminated that factor for three consecutive bins (covering a time period of 100 ms) with a threshold of p < 0.01. This led to a chance level of ∼5% of significant neurons for any given factor when considering the period before the stimulus onset (100–700 ms following central fixation; Fig. 3A, black line). We also extracted the omega-squared, a measure of the effect size, for each factor included in the ANOVAs. To assess differences between areas across our measures (proportion of significant neurons and effect size), we fitted generalized logistic (or linear) mixed-effect models (fitglme function in MATLAB) with area (categorical) as a fixed factor as well as monkey (two levels, categorical) and session (categorical) as random factors (intercept only; Tables 3, 4). We then applied multiple-comparison correction using the false discovery rate (FDR) at p < 0.05.
Encoding/decoding of the chosen flavor, chosen probability, and chosen side during the stimulus period. A, The percentage of neurons across areas showing significant firing rate modulation with chosen outcome flavor (p < 0.01 for three consecutive time bins) during the stimulus period (0.1–0.7 s following the stimulus onset, event “Stim on”). The black line shows the proportion of significantly modulated neurons during the prestimulus period (0.1–0.7 s following central fixation) for each area. B, The time-resolved average effect size for neurons with significant encoding of chosen flavor during the stimulus period (black bar), aligned to four successive events (stimulus onset, response fixation, feedback onset, and reward onset). C, Decoding performance for chosen flavor using simultaneously recorded neurons in each area. Each point represents a single session. Statistical significance was assessed using generalized mixed-effect models, with the area as a factor and monkey/session as random intercepts. Star's locations (p < 0.01 FDR corrected) indicate the direction of the effect (e.g., brown star, AI, under the yellow boxplot, 13m, means that AI decoding performance was significantly worse than 13m). The horizontal bar represents the theoretical chance level. D–F, As in A–C but for chosen probability. G–I, As in A–C but for the chosen side. Circles and triangles represent data for Monkeys M and X, respectively. See Table 3 for statistical comparisons.
Next, we assessed the extent to which neurons signaling decision variables during the visual stimuli were reactivated in other periods of the task, including the feedback and reward periods (Fig. 1). To do this, we extracted the proportion of stimulus-related neurons that were also significantly tuned to the same variable across three following 500 ms time periods during the trial (response, 100–600 ms following response box fixation; feedback, 0–500 ms following the feedback onset; reward, 100–600 ms following reward delivery). The beta coefficients associated with a given variable across the stimulus period and each other time periods, obtained for each neuron, were correlated (FDR-corrected at p < 0.05).
Population decoding
We used population decoding methods to assess the strength of representations of the considered factors within both simultaneously recorded and pseudopopulations of neurons. Decoding on simultaneously recorded population of a few neurons can uncover richer representations by preserving the temporal dynamics and interaction between neurons over time, but the limited number of neurons might lead to lower decoding performance. Decoding on pseudopopulations of hundreds of neurons on the other hand provides a more robust estimate of whether information is truly present but fails to capture the higher dimensionality associated with the presence of nonlinear mixed selectivity (Panzeri et al., 2022).
For decoding on simultaneously recorded neurons, we only analyzed sessions with at least four neurons simultaneously recorded in the considered area. We first averaged the activity of all neurons from a single session across either the stimulus or reward periods before extracting a subset of similar number of trials for each category to avoid biasing classifiers (minimum of five trials for each category). This trial selection procedure and the following steps were performed 200 times. We then applied a PCA to extract the first three principal components. These were then used to decode the factor of interest by applying linear discriminant analysis (LDA) using 10-fold cross-validation. The decoding performances on the testing sets were then averaged across the 10-fold cross-validations, before being averaged across the randomly sampled 200 trial permutations. Additionally, we assessed the significance of decoding performance by permuting the trial labels for each set of selected trials, resulting in 200 random permutations. Decoding was considered significant if <10 random permutation decoding performance were greater than the true average performance (one-tailed, p < 0.05). As previously reported, we assessed differences between areas by fitting a generalized linear mixed-effect model (fitglme function in MATLAB) with area as a fixed factor as well as monkey and session as random factors (intercept only).
For decoding on pseudopopulations, we concatenated the activity of neurons recorded across sessions. To account for the different numbers of recorded neurons across areas and allow a fair comparison of decoding performance, we ran the following analyses using multiple fixed numbers of neurons for each area (from 25 to 1,000; Figs. 6, 7). For each run, we first applied a PCA on the time-averaged activity of the considered neurons, randomly sampled 200 times from our recording set, across random selection of trials (minimum of 10 trials for each category) and extracted the top 20 principal components, which were used to predict the variable of interest using 10-fold cross–validated LDA. The same variables of interest that we used above were tested here, with the addition of the chosen flavor during nonrewarded trials. This was done to assess whether flavor representations were modulated by the expected and/or received juice flavors. Decoding performances on the testing sets were then averaged across the cross-validations, before being averaged across the 200 trial-/neuron-selection permutations. As before, we also permuted the trial labels 200 times to obtain random decoding performance, which was then statistically compared with the true decoding performance (one-tailed, p < 0.05). We assess differences between areas using generalized linear mixed-effect model with area as a fixed factor as well as monkey and the number of included neurons as random factors. We also assessed the difference between chosen flavor decoding performances when rewards were delivered or not using trial type as a fixed factor instead of area and conducting this analysis for each area independently. Finally, we estimated the amount of information contained within neurons and populations by fitting a saturating function on the average decoding performance from the different ensemble sizes using the following:
Cross-condition pseudopopulation decoding
We assessed the overlap in neural activity related to chosen flavor and chosen probability using a cross-condition decoding approach (Wójcik et al., 2023, Fig. 8B,C). Compared with pseudopopulation decoding, LDA classifiers were first trained to discriminate chosen probabilities on trials where a given flavor was chosen and then tested on trials where the other flavor was chosen (and vice versa). The same procedure was also used to train classifiers to decode the chosen outcome flavor on trials where a given probability was chosen and tested them on trials with a different chosen probability. We also ran the 10-fold cross–validated LDA classifiers without reducing dimensionality using PCA. As before, we performed cross-condition decoding using multiple fixed numbers of randomly selected neurons for each area (from 25 to 500), which was repeated 200 times (random neuron and trial selection). Cross-condition decoding performance was first averaged across the two classifiers (for chosen probability readout) or across all possible pairs of chosen probabilities (for chosen flavor readout), before being averaged across the 200 trial-/neuron-selection permutations. We assessed the significance using permutation testing, as previously described. We also compared the cross-condition performance with standard decoding performance obtained on the activity of the same considered neurons and trials but after permuting the nondecoded secondary variable. As previously described, we also fitted saturating functions on the standard and cross-condition average decoding performance from the different ensemble sizes.
PCA of neural reactivation
Finally, we used targeted dimensionality reduction analysis to highlight the dominant information within each neural population and investigate whether distinct information was represented within similar or distinct neural spaces. Here, the firing rate of each neuron was first z-scored and centered. For the qualitative comparisons shown in Figure 8A, we extracted a random pseudopopulation of 100 neurons for each area and applied a PCA on the average population activity across conditions (four levels of chosen probability by the two chosen flavor, minimum of five trials per condition).
To assess whether chosen probability and reward representations shared neural spaces (Fig. 10), we implemented a PCA on pseudopopulation activity from each subdivision of VFC. Specifically, to isolate the chosen probability subspace, we averaged the activity of each neuron across trials during the stimulus period (100–700 ms following the stimulus onset) for the four chosen probabilities. For the reward subspace, we averaged the activity of each neuron during the reward period (100–700 ms following the reward onset) in rewarded and nonrewarded trials. We then performed a PCA on each matrix, extracting in each case the eigenvectors of the first principal component. These eigenvectors were then used to project the average firing rate across time bins and each of the eight conditions (four chosen probabilities by two reward conditions) onto the two identified neural subspaces. We repeated this procedure 100 times, each time randomly selecting a different pseudopopulation of 100 neurons. We also ran a permutation for each of the 100-neuron–selection procedure, in which we computed the average firing rate across the eight conditions after randomly permuting the trial labels and before projecting the activity onto the previously isolated subspaces. For each time bin, we then compared the average Euclidean distance computed across all pairs of conditions (28 total comparisons across the eight conditions) and for each of the 100-neuron–selection sets with the 100 permutations (one-tailed test). We considered differences to be significant when no permutations were greater than the truth (p < 0.01) for four consecutive time bins. We further highlight only the significant time bins where these conditions were fulfilled in >50% of the neuron-selection sets.
Results
Task and behavioral performance
Two monkeys were trained to perform an instrumental choice task in which two options were presented simultaneously (Fig. 1A). Each option was composed of a central gauge, the level of which (more or less filled) signaled the probability that juice would be delivered if selected, and a colored frame indicating the potential juice flavor that would be delivered. Monkeys were free to select either option by fixating a response box located on each side of the screen. Following a feedback period in which both options were displayed again, a reward was delivered (or not) according to the monkeys’ choice (which defined the probability and flavor). Here, we analyzed choices from 289 sessions (Monkey M and X, 103 and 186 sessions, respectively) using a logistic regression model that included the log ratio of the probabilities associated with each outcome flavor and the flavor monkeys chose on the previous trial. This simple model explained a large proportion of monkeys’ choice variability (Fig. 1B). All sessions exhibited a significant influence of the log ratio of probabilities, suggesting that the probability of receiving a reward had a strong influence on behavior. The proportion of sessions showing a significant influence of the previous flavor varied between monkeys (n = 8/103 in Monkey M and n = 122/186 in Monkey X), although model estimates were overwhelmingly positive for this parameter, indicative of a greater likelihood of choosing the flavor that had been previously selected (Fig. 1B). This between-monkey variation might be due to the greater proportion of the same juice trials in Monkey M, which could have impacted their interest in choosing the previously selected flavor (see Materials and Methods). Our model further reveals the influence of outcome flavor on monkeys’ choices, as highlighted by the variable shift in the sigmoid functions from one session to another (Fig. 1C). Previous behavioral analyses of this dataset revealed that monkeys’ preference for specific juice flavors were transitive and that they often weighted the probability associated with each flavor differently (Stoll and Rudebeck, 2024). Flavor preferences also evolved over the course of the session, suggesting that reward value was updated throughout the session. We refer readers to our previous study for further discussion on the influence of flavor preference on behavior and neural activity within prefrontal and limbic neurons (Stoll and Rudebeck, 2024).
Recordings and anatomical confirmation
We recorded a total of 6,284 neurons across VFC. The trajectory of each electrode was determined based on postmortem reconstructions of Nissl-stained coronal brain sections. These trajectories were then mapped to immunohistology-stained sections, and neurons were assigned to one of the eight distinct VFC subdivisions based on the depth of each electrode at the time of recording (Tables 1, 2). These subdivisions followed the detailed parcellations of this part of the frontal lobe reported by Price, Palomero-Gallagher, and colleagues (Carmichael and Price, 1994; Rapan et al., 2023).
Structural characteristics used to define VFC subdivisions
Number of recorded and analyzed neurons for each monkey
Representation of the decision variables during the stimulus period across VFC
Prior work has revealed the existence of a dissociation within VFC regarding the representation of outcome probability and outcome identity; the former relies on vlPFC, whereas the latter relies on OFC (Rudebeck et al., 2017; Stoll and Rudebeck, 2024). Here we extend this dissociation by showing that neurons across cytoarchitectonic subdivisions within OFC and vlPFC do not represent information about outcome probability or identity uniformly (Fig. 3). This was evident across both encoding and decoding approaches at the level of single neurons (assessed using sliding-window ANOVAs) and populations (assessed using LDA classifiers).
During the stimulus period, 20–50% of neurons encoded the outcome flavor subjects would subsequently choose (Fig. 3A). Areas 12m and 12l showed the highest proportion of neurons, followed by 12r and all OFC subdivisions, while 12o and AI showed the lowest proportion (mixed-effect logistic regression with FDR correction for area comparisons in Table 3). The proportion of neurons correlating with a behavioral variable does not inform us on the reliability on the representation; the activity of many neurons in a population can be weakly tuned to a given variable, while a few can be very highly tuned to that variable. We therefore assessed the effect size; a measure of how well the variable of interest explained the variance in the firing rate (Fig. 3B). Interestingly, a different pattern emerged where the chosen flavor explained the activity of neurons in 11m/l and to a lesser extent in 12l and 13m, better than in any other recorded areas. This was also true when assessing the ability to decode chosen flavor using the activity of simultaneously recorded neurons from single sessions, a complementary measure of the tuning's reliability within neuronal populations. Here the highest decoding performance was observed within 11m/l compared with that within all other areas (Fig. 3C). Decoding performance was also higher in 12l compared with that in other parts of OFC and vlPFC, which exhibited relatively poor decoding performance.
FDR-corrected multiple comparisons for area differences in percentage of significant neurons, effect size, and decoding performances for chosen flavor, probability, and side
A large proportion of neurons were also tuned to the chosen outcome probability following the stimulus onset, with specific vlPFC subdivisions showing the highest proportion of neurons classified as encoding this decision variable compared with other areas (Fig. 3D). Neurons in both 12l and 12o that were classified as encoding chosen outcome probability also showed larger effect sizes than other areas, although the time course differed between these areas (Fig. 3E). Encoding of chosen probability was more transient in 12o neurons, while 12l neurons maintained a strong representation until shortly before the decision was reported. The time course of the effect size in these neurons varied not only during the stimulus period but peaked again at later time points during the trial, notably during the feedback and reward periods. We will explore this apparent “reactivation” of encoding in more detail in a later analysis. Finally, when we looked at how chosen probability could be decoded from simultaneously recorded neurons, we again found that the best decoding performance was found in 12l, followed by 12o (Fig. 3F). All other areas including 12r/m, all subdivisions within OFC, and AI showed relatively poor decoding performance.
The side of the chosen stimulus might also be represented in the activity of neurons within VFC and might be related to decision-related processes, from orienting attention to motor planning. Indeed, we previously reported strong representations of the chosen response side in inferior frontal gyrus and to a lesser extent in vlPFC (Stoll and Rudebeck, 2024). Here, we found that 12l showed the highest proportion of neurons (Fig. 3G), explained variance (Fig. 3H), and population decoding performance of the chosen side (Fig. 3I) compared with other areas. It is worth noting that the representation of the chosen side in 12l was more transient than the representation of other decision variables. Area 12m also exhibited moderate representations of the chosen side. As others have reported previously (Padoa-Schioppa and Assad, 2006; Kennerley and Wallis, 2009), the chosen side was weakly represented in OFC.
Altogether, we found that many neurons across VFC represented the chosen flavor and probability of the outcomes that would follow a specific stimulus, further supporting the involvement of this part of the frontal cortex in the valuation stage of decision-making (Noonan et al., 2017; Murray and Rudebeck, 2018; Stoll and Rudebeck, 2024). Representations were, however, not uniform across subdivisions and were dissociable based on encoding and decoding approaches. Of note, neurons in 12l exhibited the strongest and most diverse representation of the different variables critical for the valuation of the options presented on each trial. In contrast, neurons in 11m/l more selectively coded for the chosen flavor, whereas neurons in 12o represented the chosen probability.
Representation of the decision variables during the reward period across VFC
In addition to the valuation of different potential choice options, VFC is also involved in assigning received rewards to chosen stimuli, a process known as credit assignment that is central to contingent learning (Walton et al., 2010; Chau et al., 2015; Noonan et al., 2017; Behrens et al., 2018). Notably, changes in neural activity in response to reward delivery have been repeatedly found in large swath of the brain, and the VFC is no exception (Thorpe et al., 1983; Tremblay and Schultz, 1999; Padoa-Schioppa and Assad, 2006; Kennerley and Wallis, 2009). Thus, we next assessed whether there were any differences in how reward delivery influenced the activity of neurons within the eight different subdivisions of VFC that we recorded from.
We found that the activity of a large proportion of neurons across VFC was modulated by reward delivery (Fig. 4A). All vlPFC subdivisions and AI showed a higher proportion of tuned neurons compared with OFC subdivisions (mixed-effect logistic regression with FDR correction for area comparisons in Table 4). However, no clear differences were found between vlPFC and OFC subdivisions regarding the effect size, with the exception of 13m which exhibited the lowest explained variance of all areas that we recorded from (Fig. 4B). Population decoding performance was also generally high across almost all subdivisions, with 12o and 12r showing the highest performance, while 13m showed the lowest (Fig. 4C). By comparison, the flavor of the reward that was received was represented in the activity of a smaller proportion of neurons (Fig. 4D,E), with only moderate differences in proportion of neurons or effect sizes between VFC subdivisions. The population representation of the chosen flavor following reward delivery was stronger in 11m/l compared with that in 12r, 12o, 13l, and AI and marginally different in 11m/l compared with that in 12l and 12m (Fig. 4F). Interestingly, no differences were observed between 11m/l and 13m. Thus, while reward delivery and the delivered reward flavor were represented by a large proportion of neurons in VFC, there were differences in the degree of encoding. Notably, vlPFC areas showed the strongest representations of reward delivery, while Areas 11m/l and 13m exhibited the strongest representation of juice flavor.
Encoding/decoding of reward delivery and chosen outcome flavor during the reward period. A, The percentage of neurons across areas showing significant firing rate modulation whether reward was delivered or not during the reward period (0.1–0.7 s following the reward onset, event “Rew on”). Black line shows the proportion of significantly modulated neurons during the prestimulus period (0.1–0.7 s following central fixation) for each area. B, The time-resolved average effect size for neurons with significant encoding of reward delivery during the reward period (black bar), aligned to four successive events (stimulus onset, response fixation, feedback onset, and reward onset). C, Decoding performance for reward delivery using simultaneously recorded neurons in each area. Each point represents a single session. D–F, As in A–C but for the chosen flavor when rewarded. Conventions as in Figure 3. See Table 4 for statistical comparisons.
FDR-corrected multiple comparisons for area differences in the percentage of significant neurons, effect size, and decoding performances for the reward and chosen flavor when rewarded (related to Fig. 4)
Mixed selectivity within VFC
Neurons across the frontal cortex often code for multiple task variables, a feature which has been referred to as mixed selectivity (Rigotti et al., 2013; Fusi et al., 2016). Mixed selectivity enables the efficient representation of information which is thought to be critical for complex and flexible behavior (Tye et al., 2024). The degree to which the activity of single neurons represents various decision variables in a mixed manner can inform us of the mechanisms being engaged during decision-making, notably how information is integrated and used to guide decisions. We and others previously reported that vlPFC and OFC neurons are highly likely to exhibit mixed selectivity while maintaining orthogonal representations of the factors important for valuation (Wallis and Miller, 2003; Stoll and Rudebeck, 2024). To establish if there were any differences in the degree of mixed selectivity between subdivisions of VFC, we conducted a similar set of analyses here.
Although mixed selectivity was observed in neurons across all subdivisions, 12m and 12l were unique in that most stimulus-related neurons recorded exhibited a high degree of mixed selectivity (Fig. 5A). Specifically, we found 472/730 (64.6%) of 12l neurons and 202/368 (54.9%) of 12m neurons exhibiting modulation of their firing rate by two or more variables during the stimulus period (Fig. 5B). In fact, 12l showed the greatest proportion of neurons tuned to all three variables that we considered in our analysis of neural activity (chosen flavor, probability, and side) compared with all other areas (FDR-corrected χ2 tests, χ2 > 12.4; p < 9.1 × 10−4), with 244/730 (33.4%) neurons. By comparison, all OFC subdivisions were characterized by <10% of such multimodal neurons (from n = 108/1,046 in 11m/l to n = 41/685 in 13m). Thus, mixed selectivity was highest in parts of vlPFC, especially 12l.
Mixed selectivity across VFC neurons. A, Venn diagrams showing the numbers/proportions of neurons significantly representing the chosen flavor (orange), chosen probability (purple), and chosen side (gray) during the stimulus period. The number of nonselective neurons are displayed within the white ovals. B, Proportion of neurons showing selective representation of a single variable (light gray) compared with neurons showing multimodal integration (dark gray) across areas. Multimodal neurons are defined as representing two or three variables (left panel) or all three variables (right panel). Proportions were compared using χ2 tests. Statistics and FDR-corrected p values are displayed for each area (bold when significant). Circles and triangles represent the proportions for Monkeys M and X, respectively. C, The proportion of neurons representing the reward (green), the chosen flavor in rewarded trials (yellow), or both (gray) during the reward period. Circles and triangles represent the proportions for Monkeys M and X, respectively.
At the time of the reward onset, a non-negligible proportion of neurons across areas represented both the reward receipt and its flavor (Fig. 5C; mixed-effect logistic regression, factor area, F(7,4,050) = 4.85; p = 1.8 × 10−5). In all OFC subdivisions, 24–28% of neurons exhibited this type of mixed selectivity. A different pattern emerged in parts of vlPFC, where the proportion of neurons exhibiting mixed selectivity for reward and flavor was lowest in 12l (72/563, 12.8%) compared with most other subdivisions (FDR-corrected post hoc: 12l vs 12m/12r, W < 2.1; p > 0.11; 12l vs all other, W > 2.95; p < 0.013). It was also notable that Areas 12o and AI exhibited the highest degree of selective reward encoding neurons compared with OFC subdivisions (Fig. 5C, FDR-corrected χ2 tests; AI vs OFC, χ2 > 10.2; p < 0.013; 12o vs OFC, χ2 > 5.1; p < 0.09). In summary, while 12l had the highest degree of mixed selectivity during the stimulus period, it showed the opposite pattern during the reward period.
Representation of decision variables within pseudopopulations of ventral frontal neurons
In our previous analyses, we were able to decode multiple task-relevant variables using simultaneously recorded population of neurons across the different cortical subdivisions (Figs. 3, 4). Nevertheless, such an approach could be limited by the number of neurons we were able to simultaneously record from [median (min–max) population size = 5 (4–21) neurons] and might not reveal the full extent of VFC representations. We therefore assessed how well information could be decoded from the activity of neurons across VFC using pseudopopulation of different number of neurons, and up to 1,000 neurons, testing the robustness of our previous observations. Similar to what we observed before, clear differences in decoding performance were apparent across subdivisions (Figs. 6, 7).
Pseudopopulation decoding of stimulus-related task variables across cytoarchitectonic areas. A, Average decoding performance of the chosen flavor, chosen probability, and chosen side (from left to right) during the stimulus period, using pseudopopulations of neurons from each isolated cytoarchitectonic area and against the size of the neuronal population considered. Dashed lines represent the fitted saturating functions on decoding performance, while thin lines represent the average decoding performance across permutations. Black and gray arrows represent the numbers used for panels B and C, respectively. B, Average decoding performance using 200 neurons (black arrows in panel A) for the three considered variables reported on a ventral view of the macaque frontal cortex. C, Average decoding performance using 100 neurons (gray arrows in panel A) for the three considered variables, for each individual monkey (Monkey M, circles; Monkey X, triangles) as well as both combined (bars). Note that bars’ height did not necessarily fall in between the individual monkey points as the analyses were independent from each other. See Table 5 for statistical comparisons.
Pseudopopulation decoding of reward-related task variables across cytoarchitectonic areas. A, Average decoding performance of reward receipt and chosen flavor when rewarded or not (left and right, respectively) during the reward period, using pseudopopulations of neurons from each isolated cytoarchitectonic area and against the size of the neuronal population considered. Dashed lines represent the fitted saturating functions on decoding performance, while thin lines represent the average decoding performance across permutations. Black and gray arrows represent the numbers used for panels B and C, respectively. B, Average decoding performance using 200 neurons for the three considered variables reported on a ventral view of the macaque frontal cortex. C, Average decoding performance using 100 neurons for the three considered variables, for each individual monkey (Monkey M, circles; Monkey X, triangles) as well as both combined (bars). Note that bars’ height did not necessarily fall in between the individual monkey points as the analyses were independent from each other. See Table 5 for statistical comparisons.
Notably, we found that the population of Area 12l neurons was unique compared with that of other areas in that it reached the highest level of decoding performance for all considered variables during the stimulus period (compare dark green lines to all others, Fig. 6). After 12l, decoding performance was higher than other areas in 11m/l and 13m for the chosen flavor, while pseudopopulations of 12m and AI neurons more robustly represented the chosen side. Neurons in 12o had decoding performance that was higher than other areas for chosen probability, although at a level far lower than the performance of the 12l population (compare dark and light green lines for chosen probability, Fig. 6A).
During the reward period, we found a strong representation of reward delivery in all areas, although the 13m population required more neurons to reach the highest level of performance compared with other areas (Fig. 7). In contrast, pseudopopulations of neurons in Area 13m as well as 11m/l had the highest decoding performance of the chosen flavor when the reward was delivered. Interestingly, among areas exhibiting strong chosen flavor decoding performance (Areas 11m/l, 12l, 12r, and 13m), all of them except for Area 13m represented the chosen flavor to the same extent whether monkeys experienced the outcome and when they did not (mixed-effect linear regression, factor rewarded/not rewarded; Areas 11m/l, 12l, and 12r, F < 0.023; p > 0.8; Area 13m, F = 15.6; p = 3.7 × 10−4). Representation of chosen flavor in all other areas was however weaker in nonrewarded compared with that in rewarded trials (F > 8.1; p < 8.9 × 10−3). This suggests that Areas 11m/l, 12l, and 12r encoded the expected flavor irrespective of the outcome, while other areas, including 13m, were more strongly modulated by the flavor of experienced outcomes. Furthermore, 12l pseudopopulations had the lowest decoding performance for outcome flavor during the reward period compared with that during the stimulus period. Such a distinction between stimulus and reward periods in Area 12l potentially also supports a role of this area in the representation of value for choice, not the specific reward delivered, a point that we take up below.
Given the variation of representation across the different subdivisions of VFC, we next investigated how the variables that predominantly guided animals’ choices, chosen flavor and probability, were dynamically represented within the activity of pseudopopulations of neurons in each subdivision. Prior work has suggested that the diversity and dynamics of neural encoding are key components of the adaptive representation of task-relevant variables and that this is required for flexible decision-making (Fusi et al., 2016). For this analysis, we took the first three principal components from the neural activity of pseudopopulations of 100 randomly selected neurons from each subdivision and plotted the dynamic trajectories of activity associated with chosen probability and flavor 0–700 ms after the stimulus onset.
Supporting our previous observations, VFC subdivisions exhibited qualitatively different neural population trajectories after the presentation of the visual stimuli (Fig. 8A). For example, the trajectory of activity from the top three principal components in 12o was clearly separated across the different levels of chosen probability (light to dark colors, Fig. 8A, top left) from shortly after the stimulus onset. In contrast, there was very little separation between the trajectories for chosen flavors (pink vs green). This indicates that dynamic representations in Area 12o discriminate well between the different levels of chosen probability but did not discriminate reliably the chosen flavor, which is consistent with results highlighting the representation strength (effect sizes and decoding performance; see Fig. 6 for example). Area 11m/l or 13m trajectories on the other hand displayed strong separation of the chosen flavor with only a moderate separation of chosen probabilities (Fig. 8A, bottom row). Finally, 12l showed very distinct trajectories across both chosen flavor and probabilities such that all combinations of probability and flavor were clearly distinguishable [note the separation between different colors (flavor) and shading (probability); Fig. 8A, top right]. Taken together, these observations echoed the differences in decoding performance between subdivisions that we previously found (Figs. 3, 6, 7). These trajectories also indicate that chosen probability and flavor qualitatively span distinct neural spaces within each subdivision.
Orthogonal representations of the chosen flavor and probability at the level of pseudopopulation. A, Top three principal components for pseudopopulations of 100 neurons recorded in 12o, 12l, 11m/l, and 13m during the stimulus period. Green and pink curves represent the activity during chosen Flavors J1 and J2 (respectively), while light to dark colors represent the increasing chosen probabilities (from 30 to 90%). Dots indicate the time of the stimulus onset. B, Standard and cross-condition decoding performances of the chosen flavor (left) and probability (right) using pseudopopulations of 25–500 neurons across all subdivisions. Dots represent raw cross-condition decoding performances, while the lines represent the fitted saturated functions on either standard performances (dash) or cross-condition (plain). Gray arrows represent the number of included neurons used for panel C. C, Median (±25% CI) percentage change between standard and cross-conditions decoding performance (standard—cross-condition) when using 100 neurons across area and measures. Positive values indicate higher levels of performance for standard compared with cross-condition decoding.
To quantify the degree to which chosen probability and flavor were represented within distinct activity subspaces, we performed cross-condition decoding on pseudopopulations of neurons from each subdivision (Fig. 8B,C). Here, we trained LDA classifiers to decode a given variable (e.g., chosen probability) only using trials of a given class for another variable (e.g., chosen Flavor J1). Classifiers were then tested on the trials of the alternative class (e.g., chosen Flavor J2), and we extracted the average decoding performance of our variable of interest. Considering chosen probability decoding, high cross-condition decoding performance will only be observed if the weights associated with chosen probability representation for a given chosen flavor generalized from the alternative flavor, meaning chosen probability decoding is independent of the chosen flavor representation. Thus, high cross-condition could only be achieved if both chosen probability and chosen juice are represented within distinct neural subspaces (Bernardi et al., 2020; Wójcik et al., 2023).
We found that cross-condition decoding accuracies were often comparable with the performance of standard decoding across subdivisions for both primary task-relevant variable, chosen probability or flavor (Fig. 8B). We found a greater decrease in performance for decoding chosen flavor using 12l pseudopopulation compared with most other subdivisions. This difference between subdivisions is likely related to the fact that 12l is unique in that it exhibited the strongest representation of all three stimulus-related variables, chosen flavor, probability, and side (Fig. 6). Nevertheless, the average difference in performance between the two approaches was close to zero across all subdivisions (Fig. 8C, e.g., median change in the chosen flavor decoding ranged from a 4.5% increase in performance for 11m/l to a 5.1% decrease in performance for 12m). Such an observation suggests that the representations of flavor and probability were orthogonal to each other.
Dynamic encoding within VFC
Our last analysis reveals that relevant decision variables are represented through dynamic neural population trajectories (Fig. 8A). We also previously showed that neurons that were classified as encoding decision variables during the stimulus period often exhibited similar levels of encoding at multiple times during each trial (Fig. 3E). For instance, neurons in Areas 12o and 12l that encoded outcome probability during the stimulus period also had a peak of encoding during the reward period (note the double peaks in dark and light green lines in Fig. 3E). Such dynamic representations might relate to the role of VFC in contingent learning (Chau et al., 2015) as representations of stimulus–reward predictions are reactivated around reward delivery.
To explore this type of dynamic encoding, we first looked at whether single neurons in VFC encoded outcome flavor and/or probability at multiple points during each trial. The activity of the example neuron shown in Figure 9A illustrates this type of dynamic encoding. The firing rate of this neuron was related to the chosen probability not only shortly after the stimulus onset but also during the response, feedback, and reward parts of the trial. Mirroring the pattern of effects of this example neuron, we found that 20–70% of neurons encoding chosen probability or flavor during the stimulus period were “reactivated” during other parts of the trial, most notably during the feedback and reward periods (Fig. 9B).
Temporally similar representations of chosen flavor and probability representations over time. A, Raster plots and peristimulus time histograms of a 12l neuron aligned to multiple event onsets (stimulus, response, feedback, and reward) and for different chosen outcome probabilities (light to dark purple represent low to high probabilities, respectively). Green/gray bars represented the time windows considered for the different events in the following panels. B, The percentage of significantly tuned neurons during the stimulus period which were also encoding the considered variables (chosen flavor, probability, or side, from left to right) during the response fixation, feedback, and reward periods (light to dark gray, respectively). Colored stars represent a significant difference in proportion (at FDR-corrected p < 0.05) between two areas, the area where the star is located showing a greater proportion than the area indicated by the color of the star (e.g., the first brown star in the left panel on top of the proportion during response period for 12r means that this proportion is significantly greater than AI proportion). C, Example correlations of the average beta coefficients related to chosen probability and obtained at different time window (left panel, stim vs FB; right panel, Stim vs Rew) across 12l neurons. Gray arrows highlight the coefficients associated with the neuron illustrated in panel A. D, Correlation coefficients of beta coefficients for all areas and time window comparisons, as shown in panel B. Dashed square represents the correlations shown in panel C. Stars represent a significant correlation coefficient (at FDR-corrected p < 0.05).
Regarding representations of chosen flavor, we found that a third of neurons maintained their encoding properties after monkeys made a response to select one of the options, while this proportion increased to 40–50% during the feedback and reward periods (Fig. 9B). Not all subdivisions exhibited the same rate of maintained chosen flavor encoding (mixed-effect logistic regressions, factor area, F(7,2,615) > 5.9; p < 7.5 × 10−7 across all time comparisons; Fig. 9B). Notably, there was a high proportion of neurons that represented flavor in both stimulus and reward periods within 12l and 12m (FDR-corrected post hoc: 12l vs 13m/12o/AI, W > 2.7; p < 0.02; 12m vs 12o/AI, W > 3.1; p < 0.008). Within each area, there were also notable differences when the highest proportion of neurons showed similar encoding. For instance, there was a marked increase in the proportion of neurons in Areas 13l and 13m that also encoded chosen juice in the reward period compared with the response or feedback periods. This indicates that there is a population of neurons in these areas which dynamically encode the chosen flavor over the course of the trial and that these neurons become active again around the time when the reward is likely to be delivered.
For chosen probability, a larger proportion of stimulus-related neurons continued to encode this decision variable later into the trial, reaching 50% across OFC and AI subdivisions and >70% within vlPFC subdivisions (mixed-effect logistic regressions, factor area, F(7,3,276) > 10.8; p < 1.6 × 10−13 across all time comparisons; Fig. 9B). Here we found that 12l showed the highest proportion of neurons encoding chosen probability at both the stimulus and feedback/reward periods compared with all other subdivisions (FDR-corrected post hoc: 12l vs all areas, W > 2.3, p < 0.03 for both periods), closely followed by 12m neurons (stimulus-feedback periods; 12m vs 12r, W = 1.9; p = 0.07; 12m vs others, W > 2.3; p < 0.03).
The fact that the same neurons represent information across both stimulus and reward periods could be important for assessing whether the received reward meets expectations. It is nevertheless possible that neurons represent each task feature with either the same encoding scheme throughout these periods (as the example neuron in Fig. 9A) or using different encoding schemes. To assess this, we correlated the beta coefficients associated with a given task feature during the stimulus period and each of the other periods of the trial (examples for Area 12l are shown in Fig. 9C). If a population of neurons is using the same encoding scheme across different parts of the trial, then there should be a positive correlation between the beta coefficients. If, however, different encoding schemes are being used, then there should be a negative correlation or no correlation at all. Such negative correlation, often referred to as “sign flip”, is a classic marker of temporal difference learning (Kennerley et al., 2011; Muller et al., 2024).
Overall, we found that most of the neurons represented the chosen flavor in similar ways across periods, with significant positive correlations ranging from 0.4 to 0.7 (Fig. 9D, left panel). Of note, neurons in 11m/l and 12o, which represented the chosen flavor at stimulus and reward periods to a high degree (note the high omega square for 11m/l and 12o in Figs. 3, 4), showed high correlation between stimulus and response/feedback periods (R > 0.45; p < 3.9 × 10−5), but not between stimulus and reward periods (R < 0.1; p > 0.22). In contrast to chosen flavor, correlations between beta coefficients over time periods were noticeably lower for chosen probability. We did, however, observe the same temporal correlation pattern in the two areas showing the best representations of chosen probability, namely, 12l (Fig. 9C) and 12m (Fig. 9D, right panel; stimulus vs response/feedback, R > 0.23; p < 1.2 × 10−4; stimulus vs reward, R < 0.07; p > 0.15).
In summary, the differences in representations of task-related variables over time indicate that the encoding properties of neurons within VFC is dynamic. Indeed, we found that neurons maintained their tuning properties to different aspects of expected reward but did so with distinct activity patterns over time. The lack of negative correlation between the beta coefficients found during stimulus and reward periods across VFC extends what was found in OFC (Muller et al., 2024). Such a pattern of “reactivation” of encoding potentially indicates that a given neuron could be involved in the assessment and updating of the choice options.
Shared neural subspaces during stimulus and reward periods
At the level of single neurons, large proportions of vlPFC neurons represented chosen probability during both stimulus and reward periods, although with different encoding schemes over time. It remains an open question whether population activity subspaces identified during the stimulus period for chosen probability relate to the subspaces that were apparent during the reward period. One possibility is that representations of reward/no reward correspond to a binary transformation of the stimulus-related chosen probability representations, which would both exist within a similar neural space. Such a subspace might be important for updating specific reward probability representations. Alternatively, chosen probability and reward representations could coexist within distinct and separable neural activity subspaces, facilitating the readout of both types of information by other areas. To assess the temporal evolution of these representations and test our two hypotheses, we projected the pseudopopulation activity over time onto the first component of (1) a chosen probability subspace extracted during the stimulus period and (2) a reward subspace extracted during the reward period (Fig. 10A). We then computed the time-resolved average Euclidean distance between every pair of conditions (four chosen probabilities by two reward flavor condition) depending on which subspace the activity was projected onto (Fig. 10B). Here, larger distances represent a greater ability to discriminate probabilities and/or reward flavors.
Temporal evolution of chosen probability and reward information across pseudopopulations. A, Example of trial-averaged pseudopopulation activity in 12o (top) and 12l (bottom) over time when projected onto the stimulus-related probability subspace (green) or the reward subspace (gray). Chosen probability levels are coded using light to dark colors, while reward/no reward conditions are in blue/red, respectively. Each pseudopopulation was composed of a random selection of 100 neurons. Green and black bars on x-axis represent the time window used to derive the chosen probability and reward subspaces, respectively. B, Average Euclidean distance between all chosen probabilities and reward conditions across time and depending on whether the pseudopopulation activity was projected onto the chosen probability (green) or reward (gray) subspace. Thick lines represent the average of the 100 neuron-selection permutations, while shadings represent the standard deviation. Thin lines represent the average Euclidean distance when trial labels were permuted. Dots represent significant differences between the true distance and permutations, using a threshold at p < 0.01 in >50% of the random neuron-selection sets.
In the pseudopopulation of 12o neurons, probability and flavor information were contained within overlapping subspaces. This was evident by the presence of two peaks of discrimination at the time of the stimulus and reward across each subspace (Fig. 10A,B). While this pattern was also seen in 12r, 12m, and 13m, it was absent in 12l, 13l, and AI (note the lack of two peaks for stimulus and reward periods). In these latter subdivisions, the chosen probability subspace was prominent during the stimulus (green peak in stimulus period) while the information content shifted toward the reward subspace during the reward (gray peak in reward period). Such a pattern implies that the chosen probability and reward representations were largely separate or “disentangled” in 12l and AI. Overall, the population representations within VFC subdivisions evolved differently over time and appear to indicate that different subdivisions are supporting different computations.
Discussion
Combining high-density single-neuron recordings with precise anatomical reconstructions, we report significant variation of neural representations within VFC. Specifically, neurons in OFC Area 11m/l were highly tuned to the outcome flavor (Figs. 3, 6). Neurons in more posterior OFC subdivisions 13m or 13l were more weakly tuned to outcome flavor although Area 13m had the strongest representations of flavor during reward delivery (Figs. 4, 7). In contrast, neurons in vlPFC Area 12l not only encoded outcome flavor but also exhibited strong representations of outcome probability and choice direction (Figs. 3, 6). In addition, neurons in Area 12l also exhibited a high degree of mixed selectivity during stimuli presentation (Fig. 5), and this encoding was associated with distinct neural subspaces at the population level (Figs. 8, 10). This suggests a role of 12l in the integration of multiple decision variables during valuation to guide choice. In contrast, vlPFC Area 12o showed selective representations of outcome probability (Figs. 3, 5) as well as reward delivery (Figs. 4, 7). Notably, stimulus- and reward-based representations existed within a shared neural subspace in 12o (Fig. 10), indicating a putative neural mechanism for the role of this area in stimulus–reward association learning. Taken together, our findings provide evidence that representations within VFC are not spatially or functionally uniform and indicate that each subdivision might contribute to specific cognitive processes (Fig. 11).
Schematic representation of the specific neural representation found across VFC and proposed process involved.
OFC and the coding of the specific outcome qualities
We previously reported that OFC neurons exhibited stronger representations of outcome flavor compared with vlPFC neurons (Stoll and Rudebeck, 2024). However, neurons across OFC subdivisions did not represent information uniformly. We observed stronger chosen flavor representations in 11m/l neurons during both stimulus and reward periods, whether it was received or not, while 13m more strongly represented the chosen flavor when subjects received the reward. Comparatively, OFC Area 13l and AI exhibited low levels of tuning across both time periods.
Prior work has highlighted a role of OFC in valuation processes, especially when subjects are making value-based decisions (Kennerley et al., 2009). Regarding reward properties, neurons in both OFC Areas 11 and 13 have been reported to respond to reward-predicting stimuli, both during reward expectation and after receipt, often discriminating between different outcome flavors in relation with animals’ preferences (Thorpe et al., 1983; Tremblay and Schultz, 1999; Padoa-Schioppa and Assad, 2006). However, these seminal studies did not specifically look for whether distinct subdivisions of OFC represented value or flavor information differently. This was either because recordings focused on a specific subdivision or because the cytoarchitectonic area where recordings were made was not precisely characterized.
Although unrelated to reward flavor, a small body of work supports the existence of anteroposterior gradient in OFC (Sescousse et al., 2010; Klein-Flügge et al., 2013; Rich and Wallis, 2017). For example, Rich and Wallis (2017) reported stronger encoding of the expected reward size in high-gamma activity in more anterior compared with posterior parts of OFC. This is somewhat consistent with our observation that OFC Area 11m/l better represented the reward flavor during the stimulus period compared with Areas 13m and 13l. The difference in encoding that we found also potentially concords with an inactivation study highlighting a functional dissociation between Areas 11 and 13 (Murray et al., 2015). Specifically, inactivation of Area 13, but not Area 11, impaired the updating of the sensory-specific values associated with a stimulus. In contrast, inactivation of Area 11, but not Area 13, impaired the current value of a specific reward to appropriately guide goal selection. In the present task, subjects had previously learned the specific stimulus–flavor associations and used juice flavor to guide their choices (Fig. 1). Thus, stronger encoding of the chosen stimulus in Area 11m/l compared with that in Area 13 would fit with this role of Area 11 in goal selection (Fig. 11). This specific role in utilizing, as opposed to updating, reward flavor is further reinforced by the findings that representations of juice flavor in Area 11m/l during the stimulus and reward periods exhibited low correlations (Fig. 9), occupied largely separable neural subspaces (Fig. 10), and was observed irrespective of whether reward was experienced (Fig. 7).
In contrast to Area 11m/l, neurons in 13m were more specifically tuned to the flavor of the chosen outcome when the reward was delivered. Such a pattern of encoding in this area could represent a mechanism for the updating of specific stimulus–values based on the flavor of the received reward (Murray et al., 2015; Fig. 11). Furthermore, if Area 13m is involved in updating stimulus–reward associations, then it could be expected that neural activity in this area at the time of reward should be more related to representations at the time of stimulus presentation. This is exactly what we found. In Area 13m, but not 11/ml, the subspaces of activity activated at stimulus presentation overlapped with those during the reward period (Fig. 10).
Further determining how representations differ between parts of OFC and what encoding in Area 13l and AI are most related to are questions that we were not able to fully address here, potentially as our task design was not diverse enough. For example, few studies characterized the tuning of AI neurons, but anatomical and functional works support a role of AI (and more generally insula) in relaying bodily state information to higher-order areas for the regulation of cognitive and emotional processing (A. D. Craig, 2002; A. D. B. Craig, 2009). Previous analyses of this dataset indeed revealed that the activity of AI neurons was weakly tuned to the various task components but strongly modulated by local changes in preference (Stoll and Rudebeck, 2024). Regarding 13l, the activity of many neurons in this area were related to different aspects of the task but exhibited poor levels of encoding or decoding across variables. This part of OFC has been closely associated with face processing (Tsao et al., 2008), suggesting that 13l may integrate social information about faces with other decision-related variables (Elorette et al., 2021). As our task did not involve faces or other social stimuli, representations were not differentiable from other parts of VFC. Irrespective of this, our findings add to the growing body of work in humans and animals highlighting the diversity of representations in OFC related to guiding choices and updating representations (Howard and Kahnt, 2017).
Dissociable representations within vlPFC subdivisions
Few studies have highlighted differences in encoding within vlPFC. We found that neurons in 12l were highly likely to encode task-relevant variables during the stimulus period (Fig. 3), often representing more than a single variable (Fig. 5). Such encoding is maybe not surprising given that Area 12 receives projections from visual areas in the inferior temporal cortex and a higher density of somatosensory inputs compared with other VFC subdivisions (Barbas, 1988). Despite this, we found that the representation of decision variables existed within distinct neural subspaces (Figs. 8, 10). Compared with other vlPFC subregions, 12l is more strongly connected, anatomically and functionally, to the lateral prefrontal cortex, notably Areas 45 and 46v (Saleem et al., 2014; Rapan et al., 2023). These more dorsal parts of the lateral frontal cortex are closely associated with guiding attention and actions toward the goal of a decision and are directly connected to premotor areas (Miller and Cohen, 2001; Kennerley et al., 2009; Cai and Padoa-Schioppa, 2014). Thus, such separate but high-dimensional representations in 12l could be used by downstream areas to bias motor behavior and promote the selection of specific action plans. Interestingly, many neurons in Area 12m were also tuned to stimulus and reward variables and exhibited a high degree of mixed selectivity (Figs. 3–5). However, effect sizes in 12m neurons (Figs. 3, 4) and population decoding metrics (Fig. 6) revealed poor discrimination levels within this subdivision, apart from representations of chosen side. This pattern could be indicative of a role, jointly with Area 12l, in biasing attention or motor behavior to a specific choice option.
Compared with neighboring vlPFC subdivisions, representations in 12o were characterized by highly selective stimulus-related probability representations (Figs. 3, 5, 6) and coding of reward delivery (Figs. 4, 7). Notably, stimulus probability and reward delivery subspaces overlapped in this area (Fig. 10). Such a pattern potentially indicates that 12o may be the most specialized for reward-based learning across ventral frontal areas. Our findings thus provide single-neuron and mechanistic insight into prior neuroimaging (Zald et al., 2005; Chau et al., 2015; Jocham et al., 2016) and interference approaches (Noonan et al., 2017; Rudebeck et al., 2017; Folloni et al., 2021) that has emphasized a role of Area 12o in contingent learning. While the present task does not specifically require stimulus–outcome associations to be updated, local fluctuations in reward history as well as changes in preference for the juice flavors (Stoll and Rudebeck, 2024) mean that subjects were actively tracking the probability of reward to make their choices. It is this active tracking and updating of reward probability that is likely reflected in the dynamics of 12o neural population activity.
Anatomically, 12o is not only part of Carmichael and Price's orbital network, as it has connections to other VFC subdivisions, but is also interconnected with areas within the medial frontal cortex (Carmichael and Price, 1996; Price, 2007). In particular, 12o has monosynaptic connection with the anterior cingulate cortex (ACC; Jezzini et al., 2021; Trambaiolli et al., 2022), an area closely associated with guiding information seeking and exploratory behavior (Kolling et al., 2012, 2016; Stoll et al., 2016; Jezzini et al., 2021), and vlPFC, ACC, and amygdala are part of a network of strongly connected areas (Ghashghaei et al., 2007; Zeisler et al., 2023). If reward probability representations are prominent in Area 12o, it is possible that these regulate exploratory behavior by signaling the likelihood of receiving a reward when pursuing a specific stimulus in the current local environment or whether a new reward context should be sought out (Monosov and Rushworth, 2022). Area 12o also directly projects to parts of ventromedial PFC (Carmichael and Price, 1996), a region associated with attribute integration and option comparison (Boorman et al., 2013; Strait et al., 2014; Noonan et al., 2017) and internally driven motivational processes (Bouret and Richmond, 2010; San-Galli et al., 2018). Thus, understanding the dynamic interaction between Area 12o and ventromedial PFC as well as other subdivisions of VFC during decision-making would shed light on how options are compared before a choice.
Finally, neurons within the most rostral part of vlPFC Area 12r were relatively unmodulated by the various decision variables in our task, except for when it came to representing reward receipt. Although little is known regarding the activity of neurons within this subdivision, such observations are consistent with previous work that proposed that Area 12r is important for controlling goal-directed hand and mouth actions (Borra et al., 2011). This is based on the unique anatomical connectivity of Area 12r with ventral premotor areas (Borra et al., 2011) and the fact that neurons within and around Area 12r showed selectivity to object features and faces (Wilson et al., 1993; Ó Scalaidhe et al., 1997) and that lesions encompassing this area impacted object-based decision-making (Passingham, 1975; Mishkin and Manning, 1978).
The influence of cytoarchitecture and connectivity on ventral frontal representations
Even though our results identified clear distinctions in how cytoarchitecturally defined subdivisions of VFC represent information, it is unlikely that a given representation, or its associated function, entirely stops at anatomical boundaries. It is more likely the case that the precise patterns of inputs and outputs to the constituent neurons shape an area's function, whereas the cytoarchitecture of an area constrains how that information is processed. For example, our study revealed that the two neighboring vlPFC Areas 12o and 12l exhibited very different representations and dynamics. Despite this, connections between vlPFC and ACC span both 12o and 12l (Trambaiolli et al., 2022), and prior neuroimaging work reported representations of probability in both subdivisions (Kaskan et al., 2017). Even though it is not possible to extract the full anatomical connectivity maps from animals undergoing neurophysiological recording, considering the anatomical connectivity of where neurons were recorded from is therefore critical when trying to understand the precise role of a given area. Only with this information in hand will it be possible to fully characterize how VFC contributes to reward-guided behaviors.
Footnotes
This work was supported by a National Institute of Mental Health BRAINS Award to P.H.R. (R01s MH110822; MH132064), a young investigator grant from the Brain and Behavior Foundation (National Alliance for Research on Schizophrenia and Depression) to P.H.R., a Philippe Foundation 100014269 Award to F.M.S., and seed funds from the Icahn School of Medicine at Mount Sinai to P.H.R. We thank Marques Love and Dr. Patrick Hof for their help in defining neuroanatomical boundaries.
The authors declare no competing financial interests.
- Correspondence should be addressed to Frederic M. Stoll at frederic.stoll{at}mssm.edu.