Perceptual decision making is a complex process that requires multiple computations, including the accumulation of sensory evidence and an ongoing evaluation of the accumulation process to use for prediction and adjustment. Implementing these computations likely involves interactions among many brain regions. For perceptual decisions linked to oculomotor actions, neural correlates of sensory evidence accumulation have been identified in several cortical areas, including the frontal eye field and lateral intraparietal area, and one of their direct, subcortical targets, the superior colliculus. These structures are also connected indirectly, via the basal ganglia. The basal ganglia pathway has been theorized to contribute to perceptual decision making, but the nature of this contribution has yet to be examined directly. Here we show that in monkeys performing a reaction-time visual motion direction-discrimination task, neurons in a primary input structure of the basal ganglia, the caudate nucleus, encode three aspects of decision making: evidence accumulation, evaluation, and choice biases. These results indicate that the basal ganglia pathway can provide important signals to influence and assess perceptual decisions that guide oculomotor behavior.
To form perceptual decisions, the brain must evaluate how incoming sensory information supports or opposes the alternative hypotheses under consideration (Gold and Shadlen, 2007). Under certain conditions, this process involves the temporal accumulation of sensory evidence, correlates of which have been identified in several brain regions, including the lateral intraparietal area (LIP) and frontal eye field (FEF) of cortex and downstream superior colliculus (SC) in the brainstem (Shadlen et al., 1996; Horwitz and Newsome, 1999; Kim and Shadlen, 1999; Shadlen and Newsome, 2001; Huk and Shadlen, 2005). However, evidence accumulation alone is insufficient to account for effective and flexible decision making, which also depends on other factors like preferences and expectations. Little is known about how and where in the brain these multiple factors are integrated to shape decision making.
In this study we examined the role of the basal ganglia pathway in this integrative process. The basal ganglia provide an indirect link between cortical (LIP, FEF) and brainstem (SC) structures that encode evidence accumulation for saccadic decisions and thus are well positioned to contribute to this visuomotor decision process. Its specific computational roles in the decision process have been hypothesized but not yet verified experimentally (Lo and Wang, 2006; Bogacz and Gurney, 2007). Within this pathway, we targeted the caudate nucleus, which provides evaluative signals and reward-driven biases to influence associative learning and reward-modulated behaviors in monkeys performing saccade tasks with simple, suprathreshold stimuli (for review, see Hikosaka et al., 2006; Nakamura and Hikosaka, 2006a,b; Williams and Eskandar, 2006; Lau and Glimcher, 2007, 2008).
We focused on three distinct computational elements that contribute to flexible decision making (Gold and Shadlen, 2007). The first is evidence accumulation, which reflects the strength of the sensory evidence, the time elapsed during the decision process, and the final choice (Roitman and Shadlen, 2002; Smith and Ratcliff, 2004). The second is an ongoing estimate of confidence in or expectations about impending reward delivery or other outcomes (Sutton and Barto, 1998; Kepecs et al., 2008; Kiani and Shadlen, 2009). This kind of evaluative quantity can be used to terminate or adjust the decision process and should reflect the quality of the evidence but not necessarily the particular choice. The third element uses the ongoing evaluation of the decision process and other task-related factors to influence choice behavior, for example by introducing a bias into the initial or final value of the accumulation process (Carpenter and Williams, 1995; Voss et al., 2004; Bogacz et al., 2006; Diederich and Busemeyer, 2006).
We recorded from individual caudate neurons in monkeys performing a reaction-time (RT) version of a random-dot motion direction-discrimination task (Fig. 1A). We found that caudate encodes all three computational elements described above. These novel results suggest that the basal ganglia play a variety of roles in promoting effective and flexible visual-oculomotor decisions, by providing necessary signals to evaluate and influence the decision process.
Materials and Methods
Two adult male rhesus monkeys (Macaca mulatta) were used. All training, surgery, and experimental procedures were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and were approved by the University of Pennsylvania Institutional Animal Care and Use Committee.
The motion discrimination (dots) task (Fig. 1A) has been described in detail previously (Roitman and Shadlen, 2002). Briefly, after the monkey fixated for a random time (between 0.3 and 3.0 s and between 0.4 and 2.5 s for monkeys C and F, respectively, picked from a truncated exponential distribution to minimize anticipation of the end of this period), the fixation point dimmed and the motion stimulus, a random-dot kinematogram, was shown in a 5° aperture centered on the fixation point, with a fixed velocity of 6°/s. Motion strength of the stimulus was specified as the percentage of dots moving coherently in one direction (coherence). Six levels of coherence (0, 3.2, 6.4, 12.8, 25.6, or 51.2%) and two motion directions were randomly interleaved. The monkey was rewarded with a drop of juice for making a saccadic eye movement and then maintaining fixation for >400 ms on the target in the direction of coherent motion (or chosen at random on 0% coherence trials) at any time following the stimulus onset. The stimulus was immediately turned off when the monkey's gaze left the fixation window (4° and 3.5° diameter for monkey C and F, respectively). Error feedback was given by extinguishing the error target while simultaneously turning on the correct target, followed by no reward and a time-out of 4–8 s. For monkey C, a minimum of 1.5 s from stimulus onset to reward delivery was imposed to discourage attempts to get rewarded quickly by simply guessing. For monkey F, no minimal stimulus–reward delay was needed. Eye position was monitored using a video-based system (ASL) sampled at 240 Hz.
We used a standard memory-guided saccade (MGS) task to search for neurons and test for any relationship in caudate activity between the MGS and dots tasks. For the MGS task, the monkey fixated for 500–1000 ms, after which a peripheral cue was flashed for 100 ms at one of eight possible locations. The monkey maintained fixation until the fixation point was turned off (GO) ∼600–1000 ms after cue onset and was rewarded with a drop of juice 400 ms after a correct saccade to the memorized cue location.
Each monkey was implanted with a head holder and recording cylinder that provided access to the right caudate. Before recordings began, magnetic resonance images (MRIs) were taken of each monkey's head, using an enclosed tube that was filled with a copper-sulfate solution and placed inside the recording cylinder to enhance brightness. Based on the location and orientation of the recording cylinder relative to the brain, the depth and grid coordinates of each recording site were converted to coordinates on the MRI (Fig. 2). For better visualization of the lateral boundary of caudate, MR images were thresholded at mean intensity + 0.4 SD and edges were identified in the binary images using the MATLAB image processing toolbox.
Neural activity was recorded using glass-coated tungsten electrodes (Alpha-Omega) and sorted off-line (Plexon). The caudate was identified based on estimated locations based on MRIs (Fig. 2) and characteristic low background activity at >10 mm below the dural surface. We recorded only from putative projection neurons (also called phasically active neurons, or PANs), which we identified by their low spontaneous activity and activation in bursts of spikes (Hikosaka et al., 1989). We searched for caudate neurons while the monkeys performed the dots task (monkey F) and/or MGS task (monkey C). We attempted to record from all cells that showed modulated activity, identified by visual inspection of ∼10–20 trials, on the dots task. For monkey C, the direction axis for the dots task was determined by finding the target location that elicited the largest responses, assessed qualitatively, on both the MGS task and different configurations of the dots task. For monkey F, the direction axis was always directly horizontal. Only cells with stable recordings of more than five correct trials per condition were included in the analyses below. In a subset of sessions, we also recorded the same neurons on the MGS task, with either the eight possible cue locations or only two cue locations identical to the choice target locations in the dots task.
Behavioral data analysis.
We measured RT as the time from stimulus onset to saccade onset. Saccade onset was identified offline with respect to velocity (>40°/s) and acceleration (>8000°/s2). We fit choice (T1 or T2) and RT data as functions of signed motion coherence (Coh, positive for toward T1, negative for toward T2) simultaneously to a drift-diffusion model (Palmer et al., 2005; Hanks et al., 2006). According to this model, momentary motion evidence is assumed to follow a Gaussian distribution (μ, 1), the mean of which, μ, scales with coherence: μ = k × Coh, where k is a fit parameter that governs the coherence-dependent drift. A decision variable is computed as the temporal accumulation of this momentary motion evidence. A decision (T1 or T2) is reached when the value of the decision variable reaches a decision bound (+A or −B, respectively). Decision time is defined as the interval between stimulus onset and crossing of either decision bound. RT is the sum of decision time and nondecision time (T01 for a T1 choice and T02 for a T2 choice, each of which is a free parameter in the model). Within this framework, the probability of choosing T1 (i.e., the probability that the decision variable reaches bound +A first) is (e2μB − 1)/(e2μB − e−2μA). The average decision time is [(A + B)/μ]coth(μ(A + B)) − (B/μ)coth(μB) for T1 decisions and [(A + B)/μ]coth(μ(A + B)) − (A/μ)coth(μA) for T2 decisions. Best-fit parameters (A, B, k, T01, and T02) were obtained using maximum-likelihood methods. Threshold was estimated from the choice function as one-half the difference in coherence corresponding to 25 and 75% T1 choices (Klein, 2001). Bias was defined as the percentage coherence corresponding to 50% T1 choices.
Neural data analysis.
For each neuron, we computed the average firing rates in five task epochs: from 300 ms before stimulus onset to 150 ms after stimulus onset (“Pre”); from 200 ms after stimulus onset to 100 ms before the median RT at 51.2% coherence for a given session (“Stim”) (note that this epoch was fixed across trials for a given neuron, but could vary across neurons); from 200 ms before to 100 ms after saccade onset (“Sac”); from 100 to 500 ms after saccade onset (“Post”); and from reward onset to 500 ms afterward (“Rew”). For activity in each epoch except Pre, we examined the choice and coherence dependence of spike rate from correct trials with nonzero coherence. If there was significant choice dependence (Wilcoxon rank-sum test for H0: equal median spike rates in the given epoch from correct trials corresponding to the two choices, p < 0.05), the choice associated with larger median activity was designated IN and the other OUT. For choice-independent activity, we arbitrarily assigned the choice associated with contralateral or up target as the IN choice and the other as the OUT choice, for display purposes only. An additional analysis tested for coherence dependence by computing a linear regression, with the coherence level and a constant term as the regressors, separately for IN and OUT choices. The significance of the coherence dependence was assessed using an F test (H0: slope = 0, p < 0.05). The slope values from the linear regression and their significance were used to categorize response types (Fig. 3B).
Based on these results, we categorized the activity after stimulus onset into two types. Evidence accumulation activity depended on choice and was modulated by coherence with different signs for the two choices (Fig. 3B, blue circles). Evaluation activity was modulated by coherence with the same sign for the two choices (Fig. 3B, red circles). Evaluation activity may show additional choice modulation such that the average activity across coherence levels and consequently the strength of coherence modulation may still depend on choice. Note that these definitions were used for convenience in referring to the different response patterns but are not meant to imply exclusive functional associations. We used the sign, instead of the strength, of coherence modulation in these definitions because the majority of caudate activity depended on choice in a manner that affected the coherence modulation (choice-sensitive coherence modulation was observed in 150/152 epochs with evidence accumulation activity and 54/70 epochs with evaluation activity; F test comparing nested linear models with choice-dependent and choice-independent coherence modulation, p < 0.05).
For each neuron with evidence accumulation activity in the Stim epoch, we tested for choice dependence (significant difference between IN and OUT trials) in 100 ms sliding windows at 10 ms steps starting at stimulus onset. We estimated the onset of choice-dependent activity, tchoice, as the beginning of the first 100 ms window showing significant choice dependence (Wilcoxon rank-sum test, p < 0.05). We computed the unfiltered spike-density function between 200 ms before tchoice and median saccade reaction time, excluding activity 100 ms before saccade onset and onward in each trial. We identified the time of highest/lowest activity, tpeak, for IN/OUT choice trials using a running average of the spike density function (10 ms running window). With the same running average, we measured the rate of change during the Stim epoch by estimating the slope between tchoice and tpeak. The slope of rate of change was computed using a linear regression with coherence level and a constant term as the regressors.
We defined bias activity as activity in the Pre epoch that was predictive of monkey's choice, in a way consistent with the choice preference of that neuron's activity during the Stim epoch (Fig. 3A,B, black circles). To identify bias activity, we focused on cells with choice-dependent activity in the Stim epoch. For each cell, we combined 3.2% and 6.4% coherence trials and grouped them by choice. We estimated how well the Pre epoch firing rate in single low-coherence trials could predict monkey's subsequent choice, by computing a receiver operating characteristic (ROC)-based predictive index (Kim and Shadlen, 1999). We tested whether this predictive index was significantly greater than 0.5, based on the 95% confidence interval estimated using a bootstrap method with 2000 iterations. Bias activity was defined as Pre epoch activity with a predictive index significantly greater than 0.5.
For the MGS task, we computed the average firing rates in four task epochs (see Fig. 12A, below): visual response (V, 0–300 ms after cue onset), memory period (M, 300 to 0 ms before GO signal), saccade response (S, −100–400 ms around saccade onset), and reward response (R, 500 ms after reward onset). Task-modulated activity was defined as activity different from baseline (Base, 300 ms before cue onset; Wilcoxon rank-sum test, p < 0.05). Directional selectivity was assessed using activity associated with the same two directions as in the dots task (Wilcoxon rank-sum test, H0: equal medians for the two directions, p < 0.05).
We trained two monkeys on the dots task (Fig. 1A) (Roitman and Shadlen, 2002). For both monkeys, performance accuracy and RT changed systematically as a function of motion strength (coherence) (Fig. 1B,C). At 51.2% coherence, both monkeys nearly always made the correct choice with the shortest mean RT. At lower coherences, the probability of choosing the correct target decreased, while mean RT increased. Both the psychometric and chronometric functions were well described by a drift-diffusion model with asymmetric bounds (Fig. 1B,C) (Palmer et al., 2005; Hanks et al., 2006). In general, the fits indicated low thresholds for both monkeys and a slight bias toward T1 (rightward direction) for monkey F (Table 1).
To determine the caudate's role in perceptual decisions about motion direction, we examined 129 putative projection neurons whose activity was modulated during the dots task (n = 87 and 42 from monkeys C and F, respectively). These neurons showed a diversity of response properties with respect to motion strength, choice, and the timing of task-related activation. Most neurons were sensitive to choice in at least one task epoch (the presence of at least one nonwhite entry for almost all cells in Fig. 3A). Choice-dependent activity was more likely to be larger for choices associated with target locations contralateral to the recording sites (the dominance of warm- over cool-color entries in Fig. 3A). Most neurons were also sensitive to motion strength in at least one task epoch (the presence of at least one nonwhite entry for almost all cells in Fig. 3B). The sign and strength of coherence modulation, quantified as the sign of the slope value from a linear regression of average firing rate with coherence as the regressor and visually represented by different colors (Fig. 3B), varied from cell to cell (in rows), between trials with different choices for the same cell (first and second columns in the same panel), and across epochs within the same cell (across panels).
Of these diverse responses, we focused on three types of activity that relate to three computational elements necessary for effective perceptual decision making: evidence accumulation, evaluation, and choice bias. For all subsequent analyses, data are combined from the two monkeys, in which we found similar frequencies of occurrence of each response type: Fisher's exact test, p = 0.1116 and 0.1781 for the first two types of activity, respectively, and very low numbers of occurrence for bias activity in both monkeys.
Evidence accumulation: neural activity modulated by motion strength and choice
The first type of task-dependent caudate activity was modulated by both choice and motion strength, including choice-dependent modulation by motion strength (Fig. 3B, blue circles). This activity was evident in several task epochs, from after stimulus onset to after reward delivery, for different subsets of neurons. This activity likely contributes to generating the current decision and possibly choice-specific evaluation.
An example neuron with evidence accumulation activity in the Stim epoch is shown in Figure 4. When aligned on stimulus onset, the neuron's activity was larger when the visual motion was leftward than when the motion was rightward (Wilcoxon rank-sum test, H0: equal median responses for the two directions, p < 0.0001). Thus for this cell in the Stim epoch, trials in which the monkey correctly chose the left choice target were designated as IN trials, and trials in which monkey correctly chose the right choice target were designated as OUT trials. When collapsed across coherence levels, the activity began to differentiate between IN and OUT trials soon after stimulus onset (tchoice = 210 ms). On IN trials, the activity tended to build up more rapidly for higher coherences (Fig. 4E, solid lines) (linear regression slope of the rate of rise in activity vs coherence = 3.24 spikes/s2/%coh, H0: slope = 0, F = 7.87, p = 0.0025; slope of a linear regression of average firing rate during the Stim epoch vs coherence = 0.31 spikes/s/%coh, H0: slope = 0, F = 19.09, p < 0.0001). In contrast, on OUT trials, activity was less extensively modulated but tended to be smaller for higher coherences (Fig. 4E, dashed lines) (slope of a linear regression of average firing rate vs coherence = −0.14 spikes/s/%coh, F = 8.00, p = 0.005). When aligned on saccade onset, the neuron's activity did not converge to any particular value for either type of trials (Fig. 4F).
Across the population, 47 neurons (n = 37 and 10 for monkeys C and F, respectively) showed evidence accumulation activity in the Stim epoch (Fig. 5). Their activity became selective for choice with a median latency (tchoice) of 170 ms (interquartile range, or IQR: 142.5–225 ms) after stimulus onset. This latency is slightly shorter than comparable data from area LIP, and, also unlike LIP neurons, these caudate neurons did not show a noticeable dip in activity after stimulus onset (Roitman and Shadlen, 2002). Consistent with how these neurons were selected, the average firing rate during a 100 ms window before the median RT for each coherence level was positively modulated by coherence on IN trials and negatively modulated on OUT trials (slopes = 0.042 and −0.05 spikes/s/%coh, F = 8.90 and 15.85, p = 0.0031 and 0.0001, respectively) (Fig. 5C). In addition, the time course of activity on IN trials showed gradual, coherence-dependent changes, consistent with temporal accumulation of evidence. Across neurons, the rate of change of activity during the Stim epoch on IN trials was positively modulated by coherence with a median slope of 0.79 spikes/s2/%coh (two-sided sign test, p < 0.0001) (Fig. 5E). When tested in individual neurons, this effect was also significant in 26 out of 47 neurons with evidence accumulation activity in the Stim epoch. For OUT trials, the rate of change tended to be negatively modulated by coherence (median slope = −0.09 spikes/s2/%coh, p = 0.018), but the effects were variable and reached significance for only four neurons (Fig. 5F).
According to the drift-diffusion model, incoming motion information and noise together govern the rate of rise of the decision variable, which, in turn, determines RT (Gold and Shadlen, 2007). Thus, the model predicts that for IN trials, the rate of rise should be inversely related to RT, regardless of coherence, and directly related to motion strength, but only insofar as coherence is related to RT. Evidence accumulation activity in caudate was roughly consistent with these predictions. Specifically, when tested separately for each coherence, evidence accumulation activity during the Stim epoch was negatively correlated with RT on IN trials and positively correlated with RT on OUT trials, for all but the highest coherence [which had a more restricted range of RTs (Table 2)]. Moreover, when tested separately for restricted ranges of RTs, the activity was not correlated with coherence, with a single exception for IN trials with long RTs (Table 3). Thus, the sensitivity of evidence accumulation activity to coherence and RT was consistent with a role in decision formation and not simply a reflection of motion stimulus itself.
Consistent with this idea, evidence accumulation activity tended to reflect the direction of the monkey's choice and not simply the direction of the motion stimulus, which differed on error trials (Fig. 6). When the monkey chose the IN target (Fig. 6A,C, solid lines), activity was high regardless of whether the stimulus was moving toward (black lines, corresponding to correct trials) or away from (gray lines, error trials) that target. In contrast, when the monkey chose the OUT target (dashed lines), activity was lower regardless of whether the actual stimulus direction was toward (black lines, correct trials) or away from (gray lines, error trials) that target. To quantify these observations, we computed an ROC index comparing activity for IN versus OUT trials in a 400 ms window, separately for individual cells and for correct and error trials (using only trials with 3.2% or 6.4% coherence, which provided sufficient numbers of error trials). The value of this index tended to be >0.5, implying that for both correct and error trials, responses were greater for IN than for OUT choices (Fig. 6B,D). Furthermore, the value of the ROC index tended to be smaller on error than on correct trials (Fig. 6B,D) (Wilcoxon paired signed rank test, H0: equal median, p = 0.0265 and 0.0002, for 3.2% and 6.4% coherence, respectively), which is consistent with the idea that the evidence that drives the decision variable in a drift-diffusion-type process tends to be noisier and therefore weaker on error trials.
Unlike the drift-diffusion models or LIP activity just before saccade onset, however, the average activity of these caudate neurons did not appear to rise to a common value just before saccade onset (Ratcliff and Rouder, 1998; Roitman and Shadlen, 2002). This observation was confirmed by several quantitative measures. First, population activity in the Sac epoch remained weakly modulated by coherence for IN trials (slope = 0.03 spikes/s/%coh, F = 5.04, p = 0.026; for OUT trials: slope = −0.0065 spikes/s/%coh, F = 0.28, p = 0.60) (Fig. 5B,D). Second, activity of both the population and individual neurons tended to be positively modulated by coherence throughout the Stim epoch, regardless of whether the data were aligned to stimulus or saccade onset (Fig. 7A,C). The drift-diffusion model, on the contrary, predicts that activity at higher coherences rises faster to the bound, resulting in a positive correlation with activity when aligned to stimulus onset but a negative correlation with activity when aligned to response onset [for an example of neural activity reaching a bound, see Roitman and Shadlen (2002), their Fig. 8A]. Third, we conducted the same analyses but with data grouped by RT, not coherence. We found a primarily negative relationship between RT (which is inversely related to coherence) and Stim epoch activity aligned to response onset, contrary to the model prediction (Fig. 7B,D). Fourth, to detect the presence of a bound crossing in the activity of individual neurons, we used the following criteria: a negative slope in the relationship between coherence and neuronal activity at −300 to −200 ms relative to saccade onset and zero slope at −100 to 0 ms relative to saccade onset. Only 4 out of 47 neurons met these criteria, not significantly above a 5% chance level (binomial cumulative probability, p = 0.0845). Thus, although a subset of caudate neurons reflected the process of evidence accumulation, they did not appear to reflect the commitment to a categorical decision (i.e., threshold crossing), as has been reported for other brain areas like LIP and FEF (Hanes and Schall, 1996; Roitman and Shadlen, 2002).
We also observed evidence accumulation activity (i.e., coherence and choice dependent) in the Sac, Post, and Rew epochs in 76 neurons (n = 54 and 22 for monkeys C and F, respectively) (Figs. 3B, 8). In our sample, across all epochs, positive modulation was more prevalent on IN trials and negative modulation was more prevalent on OUT trials (χ2 test, p = 0.0001 and 0.0031 for IN and OUT trials, respectively). Within the Stim and Post epochs, positive modulation was more prevalent on IN trials than on OUT trials (χ2 test, p < 0.0001 and p = 0.0076, respectively), whereas in the Sac and Rew epochs, the balance between positive and negative modulation was similar for both choices (p = 0.73 and p = 1.00, respectively). Thus, even after the decision was formed, the caudate continued to encode both the decision and the strength of the sensory evidence used to form the decision, until after the trial outcome was completed.
Evaluation: neural activity modulated by motion strength similarly for both choices
The second type of activity was also modulated by coherence, but unlike the first type, had similar coherence modulation for the two choices. This activity also appeared in several task epochs, from motion viewing to reward delivery. This activity likely contributes to predicting and evaluating outcomes, possibly independent of choice.
An example neuron is shown in Figure 9. This neuron was responsive from soon after stimulus onset until after reward delivery. On correct trials (Fig. 9A,B), Stim, Sac, and Rew epoch activity was slightly larger for contralateral than for ipsilateral choices (two-sided rank sum test, H0: equal median responses for the two choices, p < 0.0001), but not during the Post epoch (p = 0.09). Stim epoch activity was modulated positively by coherence for both contralateral and ipsilateral choices (slope = 0.59 and 0.43 spikes/s/% coh, H0: slope = 0, F = 103.70 and 121.24, respectively, p < 0.0001). This choice-independent, positive coherence modulation continued through the time of the saccade (slope = 0.18 and 0.45 spikes/s/% coh, F = 7.65 and 65.70, p = 0.0061 and p < 0.0001 for the two choices, respectively) and into the postsaccade period (slope = 0.60 and 0.52 spikes/s/% coh, F = 154.24 and 121.78 for the two choices, respectively, p < 0.0001). After reward onset, the sign of coherence modulation reversed, with stronger responses on low-coherence trials (slope = −0.25 and −0.14 spikes/s/% coh, F = 120.85 and 69.38 for the two choices, respectively, p < 0.0001) (Fig. 9E).
This coherence-dependent activity at the end of the trial (in the Rew epoch) reflected both the predicted and actual outcome. For correct (rewarded) trials, this activity was inversely related to the probability of reward, computed from the coherence-dependent performance in the same recording session (Fig. 9F). The activity for error trials was comparable to the activity on correct trials in the Stim and Sac epochs (Fig. 9C,D). However, after visual error feedback indicating that no reward would be delivered, activity was briefly suppressed. The relationship between postfeedback activity and motion coherence/reward probability was not consistent (Fig. 9E,F, open symbols), possibly due to smaller number of error trials and/or floor effect. Nevertheless, these features are consistent with a representation of reward prediction error: reward expectation was computed during motion viewing based on the strength of the sensory evidence, maintained during the delay after a decision was made, and then compared to the actual outcome (Sutton and Barto, 1998).
Across the population, 48 neurons (n = 36 and 12 for monkeys C and F, respectively) showed evaluation activity, in a single or multiple task epochs, with the exact pattern varying considerably (Fig. 3B, red circles). Both positive and negative coherence modulations were observed with similar overall prevalence across task epochs (n = 30 and 40, respectively) (Fig. 10A). However, the distribution of each kind of modulation in particular task epochs differed (χ2 test, p = 0.028): positive modulation was observed slightly more frequently in the Stim epoch, whereas negative modulation was observed more frequently in the Post and Rew epochs.
Consistent with a possible evaluative role, neurons with this kind of activity tended to respond differently to the actual feedback (correct or error) received at the end of the trial (Fig. 10B). To control for possible differential responses due to more trivial reasons, such as licking, auditory solenoid/tone onset, and the presence of return saccades, we compared the proportion of neurons with a significant difference between reward and feedback responses between two groups: those with evaluation activity (ALL) and those without evaluation activity (CTRL). This analysis revealed a significantly larger proportion of neurons with evaluation activity showing different reward and feedback responses (χ2 test, p = 0.0005). The proportion was also significantly larger if we limited our analysis to neurons with evaluation activity only in the Sac, Post, or Rew epochs (p = 0.0049, 0.0006, and 0.0277, respectively).
Bias: neural activity before motion onset that is predictive of choice
The third type of activity emerged before motion stimulus onset and was predictive of monkey's choice, especially when motion evidence was weak. This activity likely represents an initial choice bias that combines with incoming sensory evidence to form a decision.
An example neuron with bias activity is shown in Figure 11, A and B. This neuron was classified as exhibiting evidence accumulation activity in the Stim epoch. Here we show that its activity before motion onset (in the Pre epoch) was also choice dependent. On 3.2% coherence trials (Fig. 11A, blue), the neuron tended to be more active when monkey ultimately chose the IN choice target than when he chose the OUT choice target (compare solid and dotted blue lines). In other words, on trials in which prestimulus activity of this neuron was high, the monkey was predisposed to make an IN choice. In contrast, prestimulus activity did not distinguish between choices on high-coherence trials (Fig. 11A, red). Thus, when sorted by choice, Pre epoch activity on trials in which weak sensory evidence was presented tended to be higher when the monkey ultimately made an IN choice versus an OUT choice.
To assess quantitatively the Pre epoch activity's choice selectivity, we computed an ROC-based predictive index (Fig. 11B). For this quantity, a value of 0.5 implies that the activity is not predictive of the monkey's choice (e.g., at stimulus onset on 51.2% coherence trials). A value of 1.0 implies that the activity fully predicts the monkey's choice (e.g., before saccade onset on 51.2% coherence trials). Consistent with a representation of choice bias, the Pre epoch activity of the example neuron had predictive indices larger than 0.5 for low-coherence trials (0–12.8% coherence, bootstrap method, p < 0.05). In contrast, predictive indices were ∼0.5 for high-coherence trials, in which the strong motion evidence dominates the decision process and thus overwhelms any relationship between prestimulus activity and the monkey's final choice. The average predictive index in the 400 ms window before stimulus onset was negatively related to coherence (slope from a linear regression with coherence as the regressor = −0.0021/%coh, H0: slope = 0, F = 10.84, p = 0.046).
In our samples, 79 neurons showed statistically significant choice-dependent activity in the Stim epoch, out of which 9 neurons had a predictive index in the Pre epoch that was significantly greater than 0.5 on trials with low, but not high, coherence (5 in monkey C and 4 in monkey F, bootstrap method, p < 0.05) (Figs. 3, 11C). Although these neurons were encountered infrequently (11%), their occurrences were significantly above a 5% chance level (binomial cumulative probability, p = 0.006). Furthermore, six of these neurons also showed evidence accumulation activity in the Stim epoch (Fig. 3B), supporting the idea that the bias activity contributes to actual decision formation. The remaining three neurons showed choice-dependent, but not coherence-modulated, activity in the Stim epoch. None showed evaluation activity in any epoch.
Caudate activity is task dependent
In LIP, the only other neural structure examined with the RT version of the dots task used here, neural activity reflecting evidence accumulation was observed exclusively in neurons with memory-period activity during a MGS task (Roitman and Shadlen, 2002). In other words, for LIP, neural responses on the MGS task are highly predictive of neural responses on the dots task. We did not observe such a tight link for caudate neurons.
On the contrary, we found that caudate activity can differ considerably between the two tasks. We encountered a substantial number of neurons that responded on only one task. In monkey C, for which the MGS task was also used to search for neurons, we encountered 11 neurons with activity that was modulated on the MGS, but not dots, task. Of the 74 neurons that were recorded on both tasks, 15 neurons showed modulated activity on the dots, but not the MGS, task. Of the 59 neurons modulated on both tasks, 41 had responses that were selective for the spatial location of the saccade target. In most of these cases, the spatial preferences tended to be congruent across tasks. For example, the choice target/motion direction associated with larger activity in the Stim epoch of the dots task also tended to elicit larger responses during the visual (“V”) epoch of the MGS task (Fig. 12B, similar colors in the two corresponding columns).
We found no clear relationship between particular patterns of activation on the MGS task and evidence-accumulation and evaluation activity on the dots task. Figure 12, C and D, shows response profiles from the MGS task plotted separately for neurons with the two types of activity in different dots task epochs. In this display, direction-modulated activity is shown in red and task-modulated but nondirectional activity is shown in black. Rows with only white entries across columns visually represent a subset of cells that were modulated on the dots, but not the MGS, task. The ratio of red-to-black entries visually represents the prevalence of spatial selectivity in task-modulated activity. This ratio tended to be higher for neurons with evidence accumulation activity (1.6, 1.3, 1.6, and 2.2 for epochs V, M, S, and R, respectively) (Fig. 12C) than for neurons with evaluation activity (0.7, 0.5, 1.0, and 0.4, respectively) (Fig. 12D), suggesting that the former neuron group was more likely to show spatially selective activity on the MGS task. Beyond this general trend, there was no other clear pattern to the relationship between MGS responses and dots responses, consistent with a previous report of the context dependence of caudate activity (Hikosaka et al., 1989).
Distribution of neurons in caudate
The caudate receives inputs from multiple cortical areas, including LIP, FEF, and SEF (Stanton et al., 1988; Saint-Cyr et al., 1990; Parthasarathy et al., 1992; Calzavara et al., 2007). The terminal fields of these inputs cover partially overlapping regions within the caudate, but with a coarse topographical pattern along the anterior–posterior (AP) axis. We examined whether our neuronal data were sensitive to this topography by comparing spatial distributions of recording sites corresponding to two groups of neurons: those with either evidence accumulation or evaluation activity, versus those without either category of activity (Fig. 13). We found that these distributions almost completely overlapped for the Stim, Sac, and Post epochs. In contrast, decision-related activity during the Rew epoch was observed in neurons located slightly more posterior than neurons without such activity, but the distributions still overlapped strongly (Fig. 13D) (median: AC+1 and AC+3, respectively; Wilcoxon rank sum test, p = 0.0004). The median AP locations associated with evidence accumulation and evaluation activity across epochs were not significantly different (data not shown, Wilcoxon rank sum test, p = 0.2146). Thus, evidence accumulation and evaluation activity was present in neurons distributed widely along the AP axis.
We trained two monkeys to perform a direction-discrimination task that required them to decide the direction of random-dot motion and indicate their decision with a saccadic eye movement. As has been reported previously, their behavior was well described by a model of decision making that assumes sensory evidence is accumulated over time until reaching a fixed bound (Palmer et al., 2005; Hanks et al., 2006). Correlates of certain aspects of this decision process have been identified in single-neuron activity in several parts of the brain involved in preparing the saccadic response, including areas LIP and FEF in cortex and the SC in the brainstem. Here we show that the caudate nucleus, a primary input structure to an important basal ganglia pathway that connects these cortical and subcortical brain regions, encodes an even broader diversity of decision-related signals.
For convenience, we categorized these decision-related signals into three types. “Evidence accumulation” activity was modulated by both choice and motion strength and was, at least in part, consistent with a process of converting the incoming sensory evidence into a categorical choice. “Evaluation” activity was modulated by motion strength similarly for the two choices, consistent with an ongoing measure of choice confidence or reward prediction. “Bias” activity emerged before motion stimulus onset, predicted the monkey's final choice when motion evidence was weak, and likely reflected a choice bias. These diverse types of activity suggest that the basal ganglia pathway can provide necessary signals for integrating sensory and nonsensory factors to facilitate flexible decision making.
Evidence accumulation activity during the Stim epoch showed several similarities with previously demonstrated neural correlates of evidence accumulation in LIP, FEF, and SC (Shadlen et al., 1996; Horwitz and Newsome, 1999; Kim and Shadlen, 1999; Shadlen and Newsome, 2001). All showed spatial selectivity, positive modulation by motion strength on IN choice trials, and negative modulation on OUT choice trials. The relative roles of these different brain regions in this process are not yet understood, at least in part because they have not all been tested under identical conditions. Accordingly, in this study we used a reaction-time task to facilitate direct comparisons with data from LIP measured in monkeys performing the same task (Roitman and Shadlen, 2002). The most striking difference we found was the nature of evidence accumulation activity just before saccade onset. In LIP, decision-related activity tends to reach a common level at this time, independent of the strength of evidence or amount of time used to form the decision. In contrast, caudate activity did not converge to a particular activity level and instead tended to peak during the Stim epoch, well before saccade onset, then maintained coherence modulation until the saccade. In the accumulation-to-bound framework, the common, peak activity level in LIP before saccade onset was interpreted as a neural representation of a decision bound associated with the IN choice, which when reached represented a commitment to this choice. The lack of a similar bound suggests that caudate does not encode the commitment to a decision, despite possibly contributing to the evidence-accumulation process via the basal ganglia pathway to downstream SC. Activity in FEF and SC has not yet been tested under comparable conditions.
The coherence-dependent responses we found likely play evaluative roles in the goal-directed decision process. In a positive reinforcement-based paradigm like we used, evaluation likely encodes reward expectation or uncertainty (Sutton and Barto, 1998; Fiorillo et al., 2003; Kepecs et al., 2008; Kiani and Shadlen, 2009). Using experimental paradigms with explicit manipulations of the association of suprathreshold visual stimuli, actions, and reward outcomes, previous studies of oculomotor caudate have demonstrated two neural representations of reward expectation, one sensitive and the other insensitive to the spatial location of the visual and/or saccade target (Kawagoe et al., 1998, 2004; Ding and Hikosaka, 2006; Lau and Glimcher, 2008). Evidence accumulation and evaluation activity in our study are reminiscent of these two forms of reward expectation signals, respectively. Extending these previous results, we show for the first time that these evaluation signals can be derived from mappings between reward outcome and uncertain sensory inputs. The diverse timing of evidence accumulation and evaluation activity throughout a trial suggests that information about the accumulated sensory evidence is monitored and evaluated continuously. Such information could be provided by cortical regions such as LIP, FEF, and other parts of the prefrontal cortex, all of which project directly to caudate (Stanton et al., 1988; Saint-Cyr et al., 1990; Parthasarathy et al., 1992; Calzavara et al., 2007). It would be interesting to examine these structures at the same time to elucidate the temporal relationship between their task-driven responses.
The bias activity represents a correlate of perceptual choice bias at the single-neuron level. In a simple accumulation-to-bound model with two alternatives, choice behavior is typically governed by three parameters: the starting value at the onset of accumulation, the rate of accumulation, and the height of the decision bounds (adjustments of which can, under some conditions, be equivalent to adjustments of the starting value). Changes in these parameters have been used to explain speed–accuracy tradeoffs and some reward-biased decisions (Reddi and Carpenter, 2000; Voss et al., 2004; Palmer et al., 2005; Diederich and Busemeyer, 2006; Feng et al., 2009). Bias activity in caudate, emerging before motion stimulus onset, can be thought of as representing a nonzero starting value at the onset of accumulation. This mechanism effectively reduces the amount of the evidence needed to reach one decision bound, while increasing the amount of evidence needed to reach the other bound, thereby creating a choice bias.
This signal may be related to previous reports of bias-related activity in caudate and FEF that are present before an unequivocal reward-predicting cue (Coe et al., 2002; Lauwereyns et al., 2002a,b; Ding and Hikosaka, 2006), or to activity in SC and FEF that reflects preparation for a saccade that will occur with a certain probability (Basso and Wurtz, 1997; Dorris and Munoz, 1998). We extended these results by showing that such activity can be present when a noisy sensory cue must be used to predict reward, implying interactions between perceptual decision-making and reward-prediction processes. Because our task used equal reward for all correct decisions and balanced presentation of motion stimuli with different directions, maintaining a choice bias was not an optimal strategy, which likely contributed to the low occurrences of bias-encoding neurons in our sample. Examination of the bias-related signals under conditions that induce stronger behavioral biases, for example, with unequal rewards (Feng et al., 2009; Rorie et al., 2010) or prior probability, could help to elucidate how the brain optimizes the decision process and whether different factors bias decisions by common mechanisms.
In summary, we observed single-neuron activity in caudate that encodes multiple computations related to perceptual decision making: evidence accumulation for a choice, evaluation of the quality of evidence, and a bias in the initial state of the decision process. These results suggest that interactions between both sensory and nonsensory factors used to generate and evaluate perceptual decisions might involve both the corticofugal and corticobasal ganglia pathways. The two parallel pathways could serve synergistically to shape and maintain a decision process that is appropriate to help achieve particular behavioral goals.
This work was supported by the National Institutes of Health (K99EY018042 to L.D.) and the McKnight Endowment Fund for Neuroscience, Sloan Foundation, and Burroughs-Wellcome Fund (J.I.G.). We thank Jeff Law, Matt Nassar, Takahiro Doi, and Okihide Hikosaka for helpful advice and comments; Rishi Kalwani for assistance with MR images; Benjamin Heasly for hardware and software support; Timothy Hanks and Michael Shadlen for assistance with the drift-diffusion model; Jamie Roitman for advice on behavioral training; and Jean Zweigle for excellent animal care.
- Correspondence should be addressed to Long Ding, Department of Neuroscience, 1167 Johnson Pavilion, 3610 Hamilton Walk, University of Pennsylvania, Philadelphia, PA 19104-6074.