Abstract
Performance monitoring that supports ongoing behavioral adjustments is often examined in the context of either choice confidence for perceptual decisions (i.e., “did I get it right?”) or reward expectation for reward-based decisions (i.e., “what reward will I receive?”). However, our understanding of how the brain encodes these distinct evaluative signals remains limited because they are easily conflated, particularly in commonly used two-alternative tasks with symmetric rewards for correct choices. Previously we used a motion-discrimination task with asymmetric rewards to identify neural substrates of forming reward-biased perceptual decisions in the caudate nucleus (part of the striatum in the basal ganglia) and the frontal eye field (FEF, in prefrontal cortex). Here we leveraged this task design to partially decouple estimates of accuracy and reward expectation and examine their impacts on subsequent decisions and their representations in those two brain areas. We identified distinguishable representations of these two evaluative signals in individual caudate and FEF neurons, with regional differences in their distribution patterns and time courses. We observed that well-trained monkeys (both sexes) used both evaluative signals, infrequently but consistently, to adjust their subsequent decisions. We found further that these behavioral adjustments had reliable relationships with the neural representations of both evaluative signals in caudate, but not FEF. These results suggest that the cortico-striatal decision network may use diverse evaluative signals to monitor and adjust decision-making behaviors, adding to our understanding of the different roles that the FEF and caudate nucleus play in a diversity of decision-related computations.
Significance Statement
Effective decision-making often requires the evaluation of current decisions to guide adjustment of future decisions. We used a behavioral task with separate manipulations of visual evidence uncertainty and reward size to disentangle two types of evaluative signals with theoretical importance: accuracy and reward expectation. We found that well-trained monkeys used these signals infrequently but consistently to adjust subsequent decisions. Neurons in the caudate nucleus in the basal ganglia and frontal eye field (FEF) in the prefrontal cortex encoded both types of evaluative signals, with substantial regional differences. Caudate activity, but not FEF activity, was linked to the monkeys’ decision adjustments. These results suggest different involvements of these two regions in decision evaluation and adjustment.
Introduction
Effective learning can depend on comparisons between expected and experienced outcomes (Sutton and Barto, 1998). These expectations have been studied under terms such as confidence, choice uncertainty, choice accuracy, and reward expectation. For perceptual decisions based on unreliable or noisy sensory evidence, these expectations typically involve the assessment that a choice is correct given the evidence (Kiani et al., 2014). This assessment can support adaptive strategies in changing environments and account for other forms of sequential behavioral adjustments including post-error slowing (Yu and Dayan, 2005; Nassar et al., 2012; Purcell and Kiani, 2016). For reward- or value-based decisions, reward expectation is the expected benefit (and/or cost) given a choice. This expectation is a critical component of reinforcement learning and is commonly used to evaluate value-based decisions (Sutton and Barto, 1998; Samejima et al., 2005; Daw and Doya, 2006; Rangel et al., 2008; Schultz, 2015). In more complex behavioral contexts, confidence, accuracy expectation, and reward expectation may become intertwined (Locke et al., 2020; Caziot and Mamassian, 2021).
Neural signals consistent with either of these forms of expectation have been reported in many brain regions, including the caudate nucleus of the basal ganglia and the frontal cortex (Kawagoe et al., 1998; Schultz, 1998; Roesch and Olson, 2003; Padoa-Schioppa and Assad, 2006; Kepecs et al., 2008; Lau and Glimcher, 2008; Kiani and Shadlen, 2009; Basten et al., 2010; Ding and Gold, 2010; Nomoto et al., 2010; Kennerley et al., 2011; Middlebrooks and Sommer, 2012; Teichert et al., 2014; Yanike and Ferrera, 2014a; Hebart et al., 2016; So and Stuphorn, 2016; Lak et al., 2017, 2020b). However, our understanding of the neural representations of these evaluative signals has been limited by the fact that these quantities are easily conflated under conditions in which they are typically examined. For example, for value-based decision tasks, choice confidence can be based on a comparison of reward expectations for the chosen versus the unchosen options. Likewise, for many perceptual decision tasks, the reward expectation for the chosen option is the product of accuracy and the magnitude of reward associated with a correct choice. When the reward magnitude is fixed, choice confidence, accuracy expectation, and reward expectation are all perfectly correlated.
Given these confounds, only a few studies have used task manipulations that were effective at identifying distinguishable neural representations of these quantities. For example, one study identified distinct neural representations of choice confidence and reward expectation in the rat orbitofrontal cortex (OFC), along with reward expectation-modulated activity in striatum-projecting OFC neurons (Hirokawa et al., 2019). Another study identified representations of choice confidence but not reward expectation in the supplemental eye field (So and Stuphorn, 2016). To advance our understanding of how the brain implements decision evaluation, we focused here on two quantities: (1) accuracy expectation, which estimates the probability of a choice being correct; and (2) reward expectation, which estimates the expected value of a choice (i.e., the product of accuracy expectation and expected reward size). We examined if and how accuracy expectation and reward expectation have distinguishable representations in two brain areas that play key roles in both value-based and perceptual decision-making, the caudate and frontal eye field (FEF).
We leveraged a behavioral task with separate manipulations of evidence strength and reward-choice associations (Fig. 1A) to uncouple the estimated accuracy expectation and reward expectation, thus allowing us to differentiate neural representations of the two quantities at the single-neuron level in the caudate and FEF. We previously showed that neurons in these two areas play similar, but distinguishable, computational roles in forming these decisions that require balancing uncertain sensory evidence with asymmetric-reward expectations (Fan et al., 2020). Here we show that these regions may also play similar, but distinguishable, roles in monitoring current decisions and adjusting future decisions, by keeping track of both accuracy expectation and reward expectation and using those signals to guide subsequent behavior.
Materials and Methods
Experimental design and statistical analyses
The data sets for the present study were obtained from three monkeys (two males and one female) and identical to those reported previously (Fan et al., 2020). The original report focused on neural activity during decision formation (i.e., after motion onset and before the saccadic response). The present study focused on neural activity around saccade onset that can encode evaluation of the decision. Details of subjects, the behavioral task, data acquisition, and fitting with a drift-diffusion model (DDM) with collapsing bounds can be found in three previous reports (Fan et al., 2018, 2020; Doi et al., 2020) and are summarized here. All training, surgery, and experimental procedures were performed in accordance with the NIH's Guide for the Care of Use of Laboratory Animals and were approved by the University of Pennsylvania Institutional Animal Care and Use Committee (protocol #804726).
The numbers of neurons for each animal are reported in Results. Statistical tests related to neural and behavioral analyses are detailed in “Neural data analysis” and “Measurement of sequential effects” subsections, respectively, with controls for multiple comparisons when applicable.
Behavioral task, data acquisition, and model fitting
Briefly, a trial began with presentation of a central fixation point (Fig. 1A). Once the monkey acquired and maintained fixation on this point, two choice targets were presented to indicate the two motion directions to be discriminated. After a random delay, the fixation point was dimmed, and a random-dot kinematogram was shown (“motion onset”) with randomly interleaved motion direction and motion strength (coherence). The monkey reported the perceived motion direction by making a self-timed saccade to the corresponding choice target. Two asymmetric-reward contexts were alternated in a block design. In the Right-LR blocks, the rightward choice was paired with a large juice reward (LR). In the Left-LR blocks, the leftward choice was paired with the large reward. The other choice was paired with a small juice reward. The reward context for the current block was signaled to the monkey at the first trial. Three monkeys were extensively trained on this task. Single-unit recordings were obtained in the FEF and caudate nucleus (in separate sessions) while monkeys performed the task. DDM model fitting was performed, separately for each session, using the maximum a posteriori estimate method and prior distributions suitable for human and monkey subjects (Wiecki et al., 2013). The same fitting results were reported previously (Fan et al., 2020).
Computation of accuracy expectation and reward expectation
Following previous literature (Kiani and Shadlen, 2009), we defined accuracy expectation as the estimation of accuracy on average given the current choice and decision time (DT), as follows:
In our task design, each coherence had an equal chance of appearance, except that Coh = 0 happened twice as often as the other coherences:
To standardize across sessions with different juice volumes, we normalized reward size by the volume of the smaller reward for each session. That is, for each session the small reward was assigned a reward size of 1, and the large reward was assigned a value equal to the large–small reward ratio.
Neural data analysis
We focused on neural activity between 200 ms before saccade onset (i.e., near decision commitment) and 400 ms after saccade onset (i.e., before feedback delivery).
Joint modulations by reward size, DT, and coherence
For each single unit, we computed the average firing rates in three task epochs: (1) a pre-saccade 100 ms window beginning at 100 ms before saccade onset, (2) a peri-saccade 300 ms window beginning at 100 ms before saccade onset, and (3) a post-saccade 400 ms window beginning at saccade onset (all epochs end before reward delivery). For each unit and epoch, we performed two multiple linear regressions (Eqs. 9, 10), focusing on coherence and RT dependencies, respectively, and including only correct trials.
The signs of
Correlation between neural activity and evaluative signals
For each neuron, we measured the average firing rates in 300 ms time windows with 10 ms steps. For each time window, we performed two partial (Spearman) correlations: (1) between firing rates and accuracy expectation while removing the effect of reward expectation, and (2) between firing rates and reward expectation while removing the effect of accuracy expectation. Significance was assessed at p = 0.05. Chi-square tests were performed to compare fractions of significant modulation at each time window between conditions, with corrections for multiple comparisons. We report here the results based on data from correct trials only. Similar results were obtained including all trials (not shown).
We tested the effects of two potential confounds. First, because accuracy expectation and reward expectation are both affected by reward biases, it is possible that reward context modulation alone may cause measurable correlations between firing rate and accuracy expectation or reward expectation. To minimize such a possibility, we imposed an additional criterion that modulation by accuracy expectation or reward expectation must be accompanied by modulation by DT. For each time window and choice, we computed the correlation between firing rates and DT for the two reward contexts separately and jointly. We considered a significant modulation by DT to be present if any of the three correlation coefficients were non-zero (p < 0.05).
Second, we assessed whether a subjective reward ratio, different from the actual ratio of juice volume, may provide a more accurate measurement of reward expectation and significantly affect the prevalence of reward expectation modulation of neural activity. We computed new reward expectation with reward ratio ranging from 1 to 2.5 and operationally defined the “best” reward ratio as the value associated with the largest correlation between firing rate and reward expectation (Fig. 8).
Measurement of sequential effects
We measured how monkeys’ choice and RT may be influenced by evaluative signals from the previous trial. To measure sequential effects on choice, we performed logistic regressions using the following function:
To measure sequential effects on RT, we performed multiple linear regressions using the following function:
For the choice data, the logistic regression was fitted via generalized linear model assuming Binomial distribution for the response variable. Each session data was fitted separately. To reduce the possibility of over-fitting, we used two methods of regularization: Elastic Net and LASSO regressions. Operationally, the fits were obtained using lassoglm function in MATLAB, setting the alpha parameter to 1 and 0.5 for LASSO and Elastic Net regressions, respectively. For each fitting, a fivefold cross validation was performed, and the coefficients were chosen as the ones corresponding to the minimum cross-validation error plus one standard error.
We assessed whether it was more likely to encounter evaluative signal-related modulation in neurons recorded in sessions with sequential effects, using Chi-square test with a criterion of p = 0.05 (Fig. 11B). To assess the relationship between neural modulation by accuracy expectation and sequential effects related to accuracy expectation, we performed a linear regression for all neurons:
Results
We trained three monkeys to perform a response-time (RT), asymmetric-reward, random-dot visual motion direction-discrimination saccade task (Fig. 1A; Fan et al., 2018). The monkeys made saccades to indicate their judgments about the global motion direction of a motion stimulus. Motion direction and strength were varied across trials, and reward context (Fig. 1, table below the timeline) was varied in blocks of trials. As we documented previously, the three monkeys showed consistent behavioral strategies such that their choice and response time (RT) depended on both the reward context and motion strength (Fig. 1B), and their reward-biased decision strategy can be captured with a combination of drift-rate and bound biases in a DDM framework (Fan et al., 2018; Doi et al., 2020). Here we re-analyzed behavioral and neural data from 140 sessions with caudate recordings (n = 17, 45, and 70 from monkey A, C, and F, respectively) and, separately, 149 sessions with FEF recordings (n = 75, 23, and 33 from monkey A, C, and F, respectively).
Post-decision accuracy expectation and reward expectation exhibit distinct relationships with reward size, DT, and coherence
We computed accuracy expectation and reward expectation (values for an example session are shown in Fig. 1C) by adapting methods used by others (see Fig. 2A and Materials and Methods for details; Kiani and Shadlen, 2009; Fetsch et al., 2014; Kiani et al., 2014). Briefly, we computed accuracy expectation as the estimated probability that the monkey made a correct choice, as follows. First, we estimated the monkey's decision process by fitting their choice and RT data to a DDM and used these fits to obtain the likelihood of each stimulus state (i.e., signed coherence) given a choice and the RT associated with that choice. We then computed the (posterior) belief of a stimulus state from the likelihood values and priors, using Bayes’ rule. Finally, we converted the belief into the probability of a correct choice and marginalized this probability over states (signed coherence) to obtain the subjective assessment of the probability that the current choice is correct. We then computed reward expectation as the product of accuracy expectation and the reward size associated with the choice.
As shown previously, accuracy expectation and reward expectation for this kind of task both depend on stimulus strength (motion coherence) and DT (Fig. 2C; Kiani and Shadlen, 2009; Fetsch et al., 2014). Moreover, because the monkeys in our study showed different choice and RT behaviors for the two reward contexts, the fitted DDM parameters differed between reward contexts, giving rise to additional dependencies on the interactions among reward size, DT, and coherence. That is, because the likelihoods of stimulus states for the same DT and choice differed between when a large and a small reward was expected, the resulting belief of stimulus state and accuracy expectation also depended on reward context in non-linear, DT- and coherence-dependent manners (Fig. 2B,C). For these reasons, we computed both quantities separately for each reward context in each session.
The similarities and differences between accuracy expectation and reward expectation are best illustrated by considering their relationships with reward size, DT, and coherence. For the example session in Figure 1C, accuracy expectation tended to be higher for smaller reward (purple relative to orange in both panels), shorter DT (left panel), and higher coherence (right panel). In contrast, reward expectation tended to be higher for larger reward, shorter DT, and higher coherence (Fig. 1D). Consistent with these illustrations, these measures of accuracy expectation and reward expectation were no longer perfectly correlated (e.g., because they were affected differently by reward magnitude), but could still be partially correlated (e.g., because both tended to decrease with increasing DT and increase with coherence) across all sessions (Fig. 3A). The exact correlation coefficient depended on experimental parameters, such as the ratio between large and small rewards, and the monkey’ performance (Fig. 3B,D). For example, the correlation coefficient tended to decrease, sometimes reaching negative values, with increasing reward ratios (Fig. 3B). The correlation also tended to decrease when the monkey was more biased by reward contexts (Fig. 3C). The dependency patterns were more complex for DDM parameters (Fig. 3D) because multiple parameters can interact to alter likelihood estimation. Most critically, their correlation was significantly <1 (Wilcoxon signed-rank test, p < 0.05/6 for all the monkeys and brain areas), which allowed us to probe their potentially different relationships to neural activity and behavior, as detailed below.
Accuracy expectation and reward expectation are reflected in post-decision activity of FEF and caudate neurons
Previously, we reported in passing that a substantial proportion of neurons in both caudate and FEF exhibited post-decision activity patterns that were modulated by a combination of reward, DT, and coherence (Doi et al., 2020; Fan et al., 2020). Above we showed that these three factors also jointly modulate accuracy expectation and reward expectation. Therefore, we examined whether and how post-decision activity in the caudate and FEF represent accuracy expectation, reward expectation, or both.
The example caudate neuron depicted in Figure 4A–C exhibited modulation patterns that resembled accuracy expectation. Specifically, the neuron was more active when decisions were to the small reward option, decision times were short, and coherence was high (Fig. 3B,C bottom panels), similar to accuracy expectation estimated from the monkey's behavior in this session (Fig. 4B,C bottom panels). The neuron depicted in Figure 4D–F exhibited modulation patterns that resembled the negative of accuracy expectation: the neuron was more active when the decisions were to the large-reward option, decision times were long, and coherence was low. Accuracy expectation for this session followed the opposite patterns. In contrast, the example caudate neuron depicted Figure 4G–I exhibited modulation patterns that resembled reward expectation. Specifically, the neuron was more active when reward size and coherence were high and less active with increasing decision times. The neuron depicted in Figure 4J–L showed the opposite activity pattern, resembling the negative of reward expectation. Similar examples and subpopulations were found in FEF (Fig. 5).
These neural modulation patterns did not emerge from a random mix of reward, DT, and coherence sensitivity but instead reflected a robust representation of evaluative signals. We examined neural activity in three peri-decision epochs: pre-, peri-, and post-saccade (−100 to 0 ms, −100 to 200 ms, and 0 to 400 ms from saccade onset, respectively). For each epoch, we counted the number of neurons showing one of eight possible combinations of modulation by the three factors (positive or negative coefficients in multiple linear regressions defined by Eqs. 9, 10). Figure 6 documents the distributions of neurons in these eight categories, with red and blue fractions representing neurons with modulation patterns consistent with accuracy expectation and reward expectation, respectively. For almost all combinations of brain region, epoch, and choice identity, the distributions were not uniform across the eight categories (blue asterisks: Chi-square test p < 0.05/12), arguing against a random mixture of sensitivity in the population. Rather, the majority of neurons showed modulation patterns consistent with evaluative signals (red/blue vs. gray). These results suggest that substantial portions of FEF and caudate neurons encode either accuracy expectation or reward expectation.
To assess more directly the relationship between neural activity and these evaluative signals, we computed two partial correlations between firing rate and each quantity, while accounting for the other. We chose the Spearman correlation to capture any non-linear, but monotonic, relationship. We used partial correlations to account for the potential confound of non-zero correlations between the model-derived measures of accuracy expectation and reward expectations that we found for many sessions (Fig. 3A). We observed significant non-zero partial correlation coefficients between accuracy expectation or reward expectation and the activity of many caudate and FEF neurons (p < 0.05). Some of these neurons showed reliable choice selectivity in their activity around saccade onset, as tested previously using multiple linear regression (100 ms before saccade onset to 200 ms after) (Fan et al., 2020), whereas others did not. The within-trial time courses of these correlation coefficients for neurons in each brain area separated by their choice selectivity are shown in Figure 7.
Accuracy expectation and reward expectation are represented differently in caudate and FEF populations
Previously, we reported differences between caudate and FEF populations in their involvement related to decision formation (Ding and Gold, 2010, 2012a; Fan et al., 2020). Here we assessed whether and how these regions also differ in their involvement related to decision evaluation. We observed several regional differences in the distributions of partial correlation coefficients. First, modulation by evaluative signals showed different choice dependencies for the two regions. In the choice-selective caudate subpopulation, modulation by reward expectation appeared more often in trials ending with the neurons’ preferred choices (Fig. 8A, second panel). In the other caudate subpopulation, modulation by reward expectation appeared more often in trials ending with the ipsilateral choice (Fig. 8E, second panel). In both FEF subpopulations, the prevalence of accuracy expectation or reward expectation modulation did not depend on choice (Fig. 8B,F, first two columns).
Second, the relative prevalence of modulation by the two evaluative signals differed for the two regions. In the caudate, the fraction appeared higher for accuracy expectation throughout the peri-saccade period, although this difference reached significance only in a short time window for the subpopulation without choice selectivity (Fig. 8A,E, third column). In the FEF, the fractions of neurons showing either accuracy expectation or reward expectation modulation were similar (Fig. 8B,F, third column).
Third, modulation by evaluative signals was generally more common for caudate neurons (Fig. 8C,G). Modulation by accuracy expectation was more prevalent in caudate than FEF, for the preferred choice in choice-selective neurons and contralateral choice in other neurons. Modulation by reward expectation was also more prevalent in caudate for the preferred choice in choice-selective neurons.
Fourth, the dominant signs of the partial correlation coefficients (positive/negative values imply that neural activity increased/decreased with increasing accuracy expectation or reward expectation) differed between the two regions. For neurons with choice-selective activity, the coefficients for accuracy expectation were primarily negative before saccade onset and positive afterward for FEF (Fig. 8D, top row). The opposite time course was observed for caudate neurons. The time course of the sign for reward expectation modulation was similar for the two regions for the preferred choice, with quantitative differences in the actual fractions (Fig. 8D, bottom row). For the null choice, both regions showed roughly equal distribution of positive and negative modulation before diverging around saccade onset. Because only a small number of FEF neurons showed no choice selectivity and evaluative signal modulation, we could not reliably compare their sign distributions with those of caudate neurons.
Note that for these comparisons, we imposed an additional criterion that neurons encoding evaluative signals must be also sensitive to DT. We used this criterion to filter out neurons that simply encoded reward context or reward size alone in a way that might appear to be modulated by either evaluative signal. Removing this filter did not qualitatively change the patterns described above. For example, caudate representations of accuracy expectation and reward expectation remained more prevalent than FEF representations (compare Figs. 8C, 9C).
Our finding of a relatively high prevalence of signals encoding accuracy expectation versus reward expectation comes with a potential caveat: the above analyses assumed that reward expectation was based on the objective reward asymmetry, but the monkeys might have had different subjective preferences (e.g., when we doubled the juice reward, a given monkey in a given session might have preferred it less or more than twice as much). We conducted additional analyses to show that our results were robust to any (unknown) variability in their subjective reward ratios. Specifically, for each monkey and session, we identified the subjective reward ratio that would maximize the correlation between neural activity and reward expectation (examples are shown in Fig. 9A). This procedure thus provides an upper bound on our estimate of the number of neurons that encode reward expectation. Across neurons and three task epochs (pre-saccade, peri-saccade, and post-saccade), the estimated best reward ratio was often close to 1 (Fig. 9B), which is consistent with our finding that many neurons were sensitive to accuracy expectation (which is equivalent to a reward ratio of 1). More generally, this new analysis did not change the greater prevalence of neurons encoding accuracy expectation versus reward expectation representation in the caudate population, nor the greater prevalence of neurons encoding accuracy expectation in caudate versus FEF populations (Fig. 9C). Together, these results suggest that the two regions encode evaluative signals differently.
Accuracy expectation and reward expectation differently influence subsequent decisions
To assess the behavioral relevance of these neural representations of evaluative signals in caudate and FEF, we next characterized how these signals related to the trial-to-trial adjustments the monkeys made in their choice and RT behavior. All three monkeys were well trained on the task and therefore made choices whose accuracy and speed could be well accounted-for via the DDM; that is, they were based primarily on a decision process that combined the accumulated sensory evidence on the current trial with certain reward context-dependent biases (Fan et al., 2018; Doi et al., 2020). Nevertheless, the monkeys occasionally adjusted their behavior from trial to trial based on evaluations of the previous choice. We assessed these potential sequential effects using (1) logistic regression testing for effects on staying or switching on the subsequent choice (Eq. 16) and (2) linear regression testing for effects on speeding up or slowing down the subsequent decision (Eq. 20). To account for the possibility that the monkeys’ sequential adjustments were a result of simpler outcome-driven (i.e., reinforcement learning-like) effects than the complex accuracy expectation- or reward expectation-driven effects, we also included regressors for whether the previous trial was correct and whether the monkey received a large reward. We used Elastic Net regularization to reduce overparameterization.
Even though the monkeys were well trained, we still observed sequential effects driven by accuracy expectation, and/or reward expectation, or both in many sessions. As shown in Figure 10A, all three monkeys showed sequential effects on choice in above-chance fractions of sessions. Sequential effects on RT were less frequent and more variable across monkeys and for caudate and FEF recording sessions. Specifically, the monkeys showed consistent tendencies to repeat the same choice after receiving a large reward or after a high-reward expectation trial (especially if the high-reward expectation was followed by an error outcome) (Fig. 10B, second, fourth, and sixth columns, respectively). In contrast, they tended to switch to the other choice after a high-accuracy expectation trial (third column). Their responses to an error outcome alone or with the accuracy expectation interaction varied across monkeys and sessions and may also depend on their overall experience on the task (first and fifth columns, respectively). The sequential effects based on previous large reward, accuracy expectation, and reward expectation were especially robust when we used Lasso regression as an alternative regularization method (Table 1).
These behavioral results suggested that the monkeys made online adjustments to their decision behavior based on accuracy expectation and/or reward expectation on the previous trial. The adjustments were in opposite directions after high-accuracy expectation and high-reward expectation trials.
Neural representations of evaluative signals were related differently to the monkeys’ sequential behavioral effects for caudate and FEF neurons
To test whether and how the neural representations of evaluative signals were related to the monkeys’ sequential behavioral adjustments, we performed two tests. First, we reasoned that such a relationship would predict that neural representations of an evaluative signal would be more likely to occur in sessions in which the monkeys showed evaluative signal-dependent sequential effects. We defined such sessions by the presence of non-zero beta coefficients in Elastic Net regressions for sequential effects on either choice or RT. We measured the prevalence of neural representation of evaluative signals by counting, for each time bin, the number of neurons showing significant non-zero partial correlation coefficients (Fig. 11A,B). During caudate recording sessions, neural modulation by accuracy expectation was more likely when the monkeys used accuracy expectation to guide sequential behavioral adjustments (Fig. 11A,B, first column). A qualitatively similar, but quantitatively much weaker, effect was observed for reward expectation (second column). During FEF recording sessions, the probability of encountering modulations by either accuracy expectation or reward expectation was similar regardless of whether monkeys made accuracy expectation or reward expectation-dependent sequential adjustments (third and fourth columns).
Second, we tested whether the coefficient of neural modulation was correlated with the coefficients of sequential effects across sessions. We used a linear regression, with the neural correlation coefficient (as in Fig. 10A) as the dependent variable and the corresponding sequential effect coefficients (as in Fig. 9) as the regressors. We found that, in the caudate population, neural modulation by accuracy expectation before saccade onset was related positively to whether the monkeys tended to repeat the same choice with a high-accuracy expectation on the previous trial (Fig. 10C, first column). Neural modulation by accuracy expectation after saccade onset was related negatively to whether the monkeys tended to repeat the high-accuracy expectation, but wrong, choice on the previous trial (Fig. 11D, first column). The post-saccade modulation by reward expectation was related positively to the monkeys’ tendency to repeat a choice with a high-reward expectation on the previous trial (Fig. 11C, second column). The same relations were observed in an alternative linear regression analysis that included all coefficients for sequential effects (i.e., both choice and RT). These results suggest that the contributions of post-decision, pre-feedback caudate representation of accuracy expectation to future decision adjustments depended on the correct/error feedback. The different time courses of the regression coefficients for accuracy expectation and reward expectation (compare Fig. 11C first and second columns) also implied that the neural representations of these two evaluative signals might be involved in different computations for future decision adjustments. We did not observe any significant relationship for the FEF population (Fig. 11C,D, third and fourth columns).
Discussion
Accuracy expectation and reward expectation are both important quantities for evaluating a decision after it has occurred, but their distinct roles are not well understood because they are perfectly correlated in many commonly used decision tasks. We addressed this challenge by manipulating sensory uncertainty and reward sizes to partially decorrelate and therefore identify distinguishable representations of these two conceptually distinct quantities. We focused on post-decision activity in previously recorded FEF and caudate neurons (Doi et al., 2020; Fan et al., 2020) and observed that: (1) accuracy expectation and reward expectation were represented in both brain regions; (2) these representations were more prevalent in caudate than FEF neurons, especially for accuracy expectation; (3) the monkeys used accuracy expectation and reward expectation from the previous trial to adjust their decision on the current trial; and (4) these behavioral adjustments were more closely linked to evaluative signals represented in caudate than in FEF. These results provide new perspectives on previously reported cognitive signals in post-decision FEF and caudate activity and further demonstrate functional differences between these two regions in decision evaluation and adjustment.
Previous studies have shown that post-decision FEF and caudate neural activity are sensitive to various cognitive signals, including choice value (Kawagoe et al., 1998; Lau and Glimcher, 2008; Seo et al., 2012), task difficulty (Ding and Gold, 2010, 2012a; Teichert et al., 2014), confidence (Middlebrooks and Sommer, 2012; Yanike and Ferrera, 2014a), and accuracy-related risk (Yanike and Ferrera, 2014b). There are two common hypotheses regarding the diverse modulation patterns. One hypothesis is that these different signals reflect the same underlying computations but are expressed differently under different task contexts. Our results, using a single task design, argue against this simple hypothesis by demonstrating that neural representations of at least two conceptually distinct signals co-exist in two brain regions that are well known to be involved in decision making. Extrapolating from these results, it seems likely that even more diverse types of evaluative signals are present in the decision network, which includes other cortical areas, midbrain dopamine neurons, and superior colliculus (Kepecs et al., 2008; Kiani and Shadlen, 2009; Zariwala et al., 2013; So and Stuphorn, 2016; Lak et al., 2017, 2020a,b; Odegaard et al., 2018; Hirokawa et al., 2019). In principle, these signals can be flexibly employed to adapt a decision-maker's strategy to diverse decision goals. For example, the accuracy-related signals can be more readily used to maximize accuracy, detect a change in environments (Yu and Dayan, 2005; Nassar et al., 2012), implement multi-stage decisions (van den Berg et al., 2016; Desender et al., 2019a), or seek more information (Desender et al., 2019b). In contrast, reward expectation/risk-related signals can be more readily used to maximize reward rate (Bogacz, 2007; Feng et al., 2009; Simen et al., 2009; Fan et al., 2018) and for implementing reinforcement learning algorithms (Sutton and Barto, 1998). The other hypothesis is that some patterns reflect precursor quantities that are not directly relevant to behavior. For example, the accuracy signal in caudate neurons may be used to compute reward expectation in loco but does not directly affect the monkeys’ behaviors. Arguing against this hypothesis, the monkeys’ sequential adjustments were linked to both accuracy and reward expectation signals in caudate. In addition, generalizing from a rodent study of OFC neurons (Hirokawa et al., 2019), the caudate may receive already-computed reward expectation signals from the cortex and thus does not need to encode accuracy expectation unless it is functionally relevant.
Given the extensive projection from the FEF to the caudate, it is not surprising that the two regions share many functional similarities, particularly for decision-making. For example, we and others have shown previously that both the FEF and caudate carry information related to decision formation, such as uncertain sensory evidence (Kim and Shadlen, 1999; Ding and Gold, 2010, 2012a; Ding, 2015), values for potential outcomes (Kawagoe et al., 1998; Lauwereyns et al., 2002b,a; Roesch and Olson, 2003; Samejima et al., 2005; Ding and Hikosaka, 2006; Lau and Glimcher, 2008), and the combination of them in complex decisions (Fan et al., 2020). The pre-decision activity in both regions is linked causally to decision behavior (Moore and Fallah, 2001; Ding and Gold, 2012b; Santacruz et al., 2017; Bollimunta et al., 2018; Doi et al., 2020). The similarity also extends to decision evaluation, as we show here that both regions carry information about accuracy expectation and reward expectation.
Despite these similarities, it is also clear that the caudate is not simply a relay station for FEF output. There are many notable regional differences even when the two regions are compared on the same task and in the same animals. For example, for a simple saccade task with reward manipulations, reward expectation-related information tends to be multiplexed with choice-selective activity in FEF, whereas it is encoded directly by a subset of caudate neurons (Ding and Hikosaka, 2006). FEF and caudate activity encoding reward context information also shows different temporal dynamics (Ding, 2015). For a visual motion-discrimination task, pre-decision FEF activity reflects motion evidence accumulation until a threshold level that is related to decision commitment, whereas caudate activity follows evidence accumulation only in the earlier phase of decision process (Ding and Gold, 2010, 2012a; Ding, 2015). For the asymmetric-reward motion-discrimination task used here, FEF activity is more directly linked to monkeys’ reward biases in evidence accumulation (Fan et al., 2020). Our new results document additional regional differences in decision evaluation and adjustment. Specifically, the greater prevalence of accuracy expectation signals in caudate activity and the closer link between caudate activity and the monkeys’ sequential behavioral adjustments support the idea that the caudate is more directly involved in tuning the decision process. This idea is further supported by previous observations that post-action caudate microstimulation can gradually bias RTs of a specific saccade (Nakamura and Hikosaka, 2006; Williams and Eskandar, 2006) and that caudate microstimulation during decision formation induces behavioral effects that mimics the monkeys’ voluntary reward bias strategies (Doi et al., 2020).
Further arguing against a direct relay scheme, the direct excitatory FEF→caudate projection contradicts the opposite directions of how accuracy expectation-related encoding in FEF and caudate neurons evolves over the course of a trial (Fig. 8D). The “sign flip” may be mediated by striatal inhibitory interneurons. Because these neurons are sparse relative to the striatal projection neurons that we recorded, future recordings using cell-type-specific sampling techniques are needed to determine the involvement of striatal interneurons in decision-related computations. The “sign flip” may also reflect additional sources of evaluative signals to the caudate. For example, the supplementary eye field has projection fields in the caudate that overlap with those of FEF, and its neural activity is mostly negatively correlated with confidence on a value-based decision task (Parthasarathy et al., 1992; So and Stuphorn, 2016). Striatum-projecting OFC neurons may provide a negative reward expectation signal to caudate (Hirokawa et al., 2019).
The present results and our previous documentation of pre-decision activity in FEF and caudate neurons, indicate that both regions are involved in both the formation and evaluation of decisions. We did not observe any relationship between activity related to decision formation and evaluation at the single-neuron level. For example, neurons with and without modulation in their pre-decision activity (during motion viewing) were similarly likely to show modulation by evaluative signals in their post-decision activity. The sign of a neuron's post-decision modulation by accuracy expectation or reward expectation also appeared unrelated to its pre-decision (during motion viewing) modulation by choice, reward context, or motion coherence. These results suggest that overlapping neural substrates may mediate decision formation and evaluation.
For our study, we used mathematically-derived estimates of accuracy expectation and reward expectation. Our results show that these quantities relate to both behavior and neural activity, lending credence to our premise that these quantities are a useful starting point for understanding how the brain uses expectations to evaluate and adjust behavior. Nevertheless, how the quantities we computed relate to the actual quantities used in the brain remains a challenging question. A major hurdle is the lack of a paradigm that can distinguish different forms of evaluative signals and are amenable to neurophysiological studies. For example, monkeys can be trained on post-decision wager tasks, but it is difficult to ensure that the wagers are based strictly on accuracy or reward expectation. Human subjects may be instructed carefully to report accuracy expectation, reward expectation, or choice confidence, but invasive neural recordings in normal subjects are unethical. The advancement of intracranial recordings in certain patient populations may offer unprecedented opportunities to understand how decision evaluation is implemented in the human brain.
In summary, we used a task design with independent manipulations of sensory evidence and reward associations to decouple accuracy and reward expectations. We found that a substantial fraction of caudate and FEF neurons encode these two different evaluative signals in their post-decision activity, but with regional differences in their prevalence, time course, and associations with behavior. These results highlight the diversity of signals and brain regions that contribute to how decisions are formed, evaluated, and adjusted to achieve particular goals.
Footnotes
We thank Jean Zweigle for animal care. This work was supported by NIH National Eye Institute (R01-EY022411; L.D. and J.I.G), University of Pennsylvania (University Research Foundation Pilot Award; L.D.), and Hearst Foundations Graduate student fellowship (Y.F.).
The authors declare no competing financial interests.
- Correspondence should be addressed to Long Ding at lding{at}pennmedicine.upenn.edu.