Abstract
It has been proposed that whenever an animal faces several action choices, their neural representations are processed in parallel in frontoparietal cortex and compete in a manner biased by any factor relevant to the decision. We tested this hypothesis by recording single-unit activity in dorsal premotor cortex (PMd) while a monkey performed two delayed center-out reaching tasks. In the one-target task, a single target was presented and its border style indicated its reward value. The two-target task was the same except two targets were presented and the value of each was varied. During the delay period of the one-target task, directionally tuned PMd activity showed no modulation with value. In contrast, during the two-target task, the same neurons showed strong effects of the value associated with their preferred target, always in relation to the value of the other target. Furthermore, the competition between action choices was strongest when targets were furthest apart. This angular distance effect appeared in neural activity as soon as cells became tuned, while modulation by relative value appeared much later. All of these findings can be reproduced by a computational model which suggests that decisions between actions are made through a biased competition taking place within a sensorimotor map of potential actions.
Introduction
Classical theories (Tversky and Kahneman, 1981) consider decision-making to be separate from the sensorimotor processes that implement the chosen response (Fodor, 1983). However, recent neurophysiological studies have shown neural correlates of decision variables within brain regions implicated in sensorimotor control (for review, see Glimcher, 2003; Gold and Shadlen, 2007; Cisek and Kalaska, 2010). For example, neural correlates of decision variables have been found throughout the saccade system, including the lateral intraparietal area (Platt and Glimcher, 1999; Dorris and Glimcher, 2004; Sugrue et al., 2004; Yang and Shadlen, 2007), the frontal eye fields (Schall and Bichot, 1998; Coe et al., 2002), and the superior colliculus (Basso and Wurtz, 1998; Horwitz et al., 2004), raising the question of why a putatively cognitive process should involve the sensorimotor system.
Such results appear less surprising if we consider that many of our everyday decisions are decisions between actions, such as choosing a path through a crowd or the target for a reach. It has been proposed that in such situations, the brain specifies several potential actions in parallel, and selects between them through a process of biased competition within the sensorimotor system itself (Cisek, 2007; Cisek and Kalaska, 2010). Recent computational models have suggested how multiple potential movements can be simultaneously encoded in parietal and premotor cortex (Tipper et al., 2000; Erlhagen and Schöner, 2002; Cisek, 2006; Furman and Wang, 2008), and how a competition between them can be biased by decision variables (Cisek, 2006).
This hypothesis makes several predictions. First, it predicts that neural activity can simultaneously represent several potential actions, as shown in the reaching (Cisek and Kalaska, 2005; Scherberger and Andersen, 2007) and grasping systems (Baumann et al., 2009), as well as in the saccade system (McPeek and Keller, 2002; Glimcher, 2003), where the influence of decision variables is already well established. Second, neural activity in sensorimotor regions will not represent any single decision variable in isolation, but will integrate all factors that influence choices. This implies that the variables associated with a given action will always be expressed relative to those associated with alternative actions. Third, the strength of competition between potential actions will depend on the similarity between them. This is motivated by simple facts of geometry: when choosing between two nearby targets, the nervous system can mix their neural representations and start moving between the targets. However, choosing between two targets in opposite directions implies that the choice has to be all-or-none. Here, we test these predictions through neural recordings in the dorsal premotor cortex (PMd) of a monkey performing a reach decision task, and compare the results to simulations of a biased competition model (Cisek, 2006). Some of these results have been presented previously in abstract form (Pastor-Bernier and Cisek, 2010).
Materials and Methods
A male monkey (Macaca mulatta) performed a planar center-out reaching task illustrated in Figure 1A (see supplemental Methods, available at www.jneurosci.org as supplemental material). After a 350–650 ms center-hold-time (CHT), one or two cyan targets appeared, with border styles indicating their value in drops of juice (Fig. 1A, inset). The reward was determined probabilistically to encourage the monkey to explore available options (Herrnstein, 1961). A “low-value” target (L, thick border) had a 60% chance of yielding 1 drop, a 30% chance of yielding 2 drops, and a 10% chance of yielding 3 drops [expected value (EV) = 1.5]. A “medium-value” target (M, no border) was worth 2 (60%), 1 (20%), or 3 drops (20%) (EV = 2). A “high-value” target (H, thin border) was worth 3 (60%), 2 (30%), or 1 drop (10%) (EV = 2.5). The non-monotonic relationship between border thickness and value was used to dissociate motivational factors from physical properties of stimuli. The monkey held the cursor in the center for an instructed delay period (700–1300 ms) until a go signal was indicated by a change in target color and disappearance of the central circle. To receive the reward, the monkey had to move to a target within a maximum 550 ms movement time (MT) and hold the cursor there [target-hold-time (THT) 500 ms].
When cells were isolated, we first ran a block of 90 trials in which only one target was presented (1T), to identify the delay-period preferred target (PT) of each cell. Next, we ran a block of 180 two-target trials (2T), including ones where the PT target was present and low-, medium-, or high-valued, while the other target (OT) appeared at 60°, 120°, or 180° away and was low-, medium-, or high-valued. Each block also included 30 trials in which the targets were 120° apart, but neither was in the direction of the PT. These trials allowed us to analyze the activity of simultaneously recorded cells with different PTs. All analyses shown here use trials in which at least one of the targets presented was the cell's PT. In 33% of 2T trials (free), the monkey was free to move to either target after the go signal. In 67% of 2T trials (forced), one of the targets disappeared at go and the monkey had to move to the remaining target. Free and forced trials were randomly interleaved to encourage the animal to keep both options partially prepared.
To assess relative value effects, we compared delay-period activity during trials with targets 120° apart in which the OT was medium-valued while the PT value varied (n ≥ 60 trials), as well as those in which the PT was medium-valued while the OT value varied (n ≥ 60). To assess distance effects, we examined trials in which the PT was present and the OT was 60° (n ≥ 30), 120° (n ≥ 120) or 180° away (n ≥ 30). Significance (p < 0.05) was assessed using two-tailed t tests and ANOVA with post hoc Tukey–Kramer tests. Latency of effects was calculated as the time when the difference in activity between compared conditions exceeded 2 SDs in a sliding window (size, 10 ms; step, 2 ms) beginning at cue onset (Sato and Schall, 2003).
To compare neural activity to model predictions (Cisek, 2006), we ran simulations of the same task and used similar analysis procedures. The model was identical to that previously described (Cisek, 2006), without any changes of parameters except that the model's “prefrontal” activity was scaled by a signal related to the absolute value of each target (low = 0.3, medium = 0.7, high = 1.0).
Results
Behavior
In 1T trials the monkey's success rate was 96%, in 2T free it was 96%, and in 2T forced it was 94% (in all cases n > 60,000). In 2T free trials the monkey selected the more valuable target 85% of the time, indicating that he understood the meaning of the stimulus cues.
Reaction times were similar across conditions because of the delay period. However, we observed a small but significant increase in movement speed to higher-valued targets: in the 1T task, mean MT was 400 ms to high-value and 416 ms to low-value targets [Kolmogorov–Smirnov (KS) test, p < 0.01].
Neural activity in PMd
Activity was recorded from 327 cells from the arm area of PMd (supplemental Fig. 1, available at www.jneurosci.org as supplemental material), of which 226 (69%) had significant directional tuning during at least one epoch (delay, MT, THT) and were considered task-related. Here, we focus on cells with delay-period tuning (112 of 226, 49%). Approximately half of these (50 of 112, 45%) were isolated long enough to collect data across all angular distances (“distance-complete” cells). Figure 1, B–D, shows the neural activity of three example cells, from trials in which each cell's PT was one of the targets presented. During the 1T task (first column), directionally tuned delay-period activity showed no effect of PT value. However, in the 2T task, when a second target was present and medium-valued (second column), the neural activity of all three cells now showed strong modulation with the relative value of the PT, firing more when their PT was more valuable than the OT (second column). This effect was also observed when the PT was medium-valued and the OT value was varied (third column). In this case, the cell activity was lower when the OT was more valuable than the PT. This finding suggests that the nature of the value effect is always relative to the other option presented.
Importantly, delay-period activity was also modulated as a function of the angular distance between the targets (Fig. 1B–D, fourth column). In most cases, activity was weaker when the targets were further apart (180°) than when they were closer to each other (60° or 120°). Another interesting finding is the difference in latency between relative value and angular distance effects. For example, the cell shown in Figure 1B exhibited effects of angular distance 102 ms after target onset (fourth column), while the effects of expected value emerged significantly later, at 220 ms (third column).
Population analyses
The population of 112 delay-tuned cells was tested for relative value effects, and distance-complete cells were additionally tested for distance effects. From the entire tuned population of 112 cells, 49 (44%) showed significant effects of relative value in the 2T task (t test, p < 0.05), with activity increasing with PT value and decreasing with OT value. Importantly, no effects were ever observed in the 1T task (t test, p > 0.05 for all comparisons). Across the group of distance-complete cells, 38 of 49 (78%) showed some effect of relative value or distance. Thirty-five cells (71%) showed relative value effects and 22 (45%) showed angular distance effects (supplemental Table 1, available at www.jneurosci.org as supplemental material). Congruent results were obtained with t tests and ANOVA with post hoc Tukey–Kramer tests (p < 0.05, see supplemental materials, available at www.jneurosci.org).
Figure 2A compares the mean delay-period activity of individual cells (n = 112) during the 1T task when the PT was low-valued (x-axis) versus when it was high-valued (y-axis). The means were not statistically different (Wilcoxon signed-rank test, p = 1). In contrast, most cells had higher delay activity in the 2T task when the PT was more valuable than the OT (Fig. 2B, Wilcoxon signed-rank test, p < 10−6) and lower when the OT was worth more than the PT (Fig. 2C, p < 10−6). Approximately half (19 of 35, 54%) of the distance-complete cells with relative value effects also had stronger activity when the targets were 60° apart than when they were 180° apart (Fig. 1D, p < 10−3). Importantly, the same trends were observed across the entire population of cells with and without individually significant effects (p > 0.9 in 1T; and p < 10−5 in 2T for all comparisons). No significant effects of overall target value were found for cells that were not tuned during the delay (p = 1).
The latency of relative value and distance effects was calculated for all distance-complete cells with any effect (n = 38). Figure 3A shows a cumulative distribution of the time at which a cell becomes tuned in the 1T task, the time at which it exhibits a distance effect in the 2T task, and the time at which it exhibits a relative value effect in the 2T task. Across the population, effects of angular distance appeared at approximately the same time as cells became tuned, while the effect of relative value appeared 50–200 ms later. The relative-value and distance-effect distributions were statistically different (Kolmogorov–Smirnov test, p < 0.024), as were the relative-value and tuning-onset distributions (KS test, p < 0.024). The difference between tuning-onset and distance-effect distributions was not statistically significant (KS test, p > 0.98).
Gain effect of distance over relative value
Figure 3B shows the mean delay-period activity of three example cells (Fig. 1B–D) as a function of OT value when the PT is medium-valued, separately for trials with targets 60°, 120°, or 180° apart. Note that all slopes are negative and steeper when targets are further apart. This suggests an interaction between angular separation and relative-value effects. Figure 3C compares the slopes of all distance-complete cells with any effect (n = 38) when the targets are 60° (x-axis) versus 180° (y-axis) apart. The further apart the targets are, the more negative becomes the slope of activity versus relative value (t test, p < 0.003).
A biased competition model reproduces the results
Cisek (2006) described a model of action selection in which populations of cells along the dorsal stream form a distributed representation of potential actions, which compete against each other through lateral inhibition (supplemental Fig. 2, available at www.jneurosci.org as supplemental material). The same model can simulate our neural recording results without any changes of parameters, except the addition of an absolute value signal into the prefrontal cortex (PFC) layer. As shown in Figure 4A, the model chooses the more valuable target when values are unequal and chooses randomly when they are equal. When targets are 60° apart, the model often chooses the direction in-between the targets (Ghez et al., 1997). Figure 4B shows an example of a simulated PMd neuron. Just as in real neurons, the simulated cell exhibits no sensitivity to value in the 1T task. This is because the model continuously renormalizes activity across the population, and with one target it always produces one hill of activity that is similar regardless of biasing. However, the cell shows strong sensitivity to relative value in the 2T task, in which the balance between two hills of activity can be influenced by biasing factors from PFC. The model also exhibits sensitivity to distance, with stronger activity when targets are 60° apart than 120° or 180° apart. Finally, as in the data, the effect of distance is evident in the model almost immediately, but the effect of relative value takes longer to influence PMd activity because of the slow dynamics of model PFC (Fig. 4; note arbitrary time units).
Discussion
Recently, many studies have shown that decision variables influence neural activity throughout the sensorimotor system. These findings have sometimes been interpreted as the neural encoding of formal quantities such as uncertainty (Basso and Wurtz, 1998), expected gain (Platt and Glimcher, 1999), local income (Sugrue et al., 2004), or accumulated sensory evidence (Yang and Shadlen, 2007). We suggest that such findings do not necessarily imply that decision variables are explicitly encoded in neural activity (in the sense that they can be decoded), but may instead reflect their influence on a competition between potential actions taking place within the sensorimotor system. This predicts that any factor relevant for the monkey's choice will influence activity, including reward value, which was explicitly manipulated here. Importantly, however, our data show that the effect of value was always relative, and therefore never appeared when there was no choice to make. Our PMd results are therefore more naturally interpreted as motor-related activities that specify potential reach directions, which are modulated by relative subjective desirability (Dorris and Glimcher, 2004), a general term that includes all factors relevant to the choice.
While we found PMd activity to always reflect the relative values of actions, activity related to absolute values has been reported in the striatum (Samejima et al., 2005; Lau and Glimcher, 2008). It is possible that the basal ganglia are a major source of the biasing signal which influences premotor activity (Redgrave et al., 1999; Leblois et al., 2006; Cisek, 2007). In saccade tasks, activity related to absolute value has been reported in the parietal cortex (Platt and Glimcher, 1999; Seo et al., 2009) and in the ventral premotor cortex (PMv) (Roesch and Olson, 2003). The fact that we did not find reward-related modulations in PMd during our 1T task may be attributable to differences between eye versus arm control or to differences in recording locations. For example, since PMv has response properties different from those of PMd (Boussaoud and Wise, 1993; Hoshi and Tanji, 2007), as well as distinct anatomical connections (Rizzolatti and Luppino, 2001), it may be more involved in representing sensory and reward information than PMd, which is more concerned with motor information. An earlier study using a saccade task (Roesch and Olson, 2004) found that PMd activity increased when either the reward or the penalty for one of the targets was increased. Although it is difficult to directly compare our results with those of a saccade task, in which PMd cells were not strongly directionally tuned, it is plausible that that effect was also related to relative subjective desirability.
One could argue that our findings are related to selective attention, which has also been described as biased competition (Desimone and Duncan, 1995). From the traditional perspective of cognitive psychology, one may wish to dissociate processes related to selective attention from those related to action selection. However, in our view (Cisek, 2007; Cisek and Kalaska, 2010), these may not be functionally distinct. It has been suggested that selective attention serves as an early mechanism for action selection (Allport, 1987; Neumann, 1990; Tipper et al., 1998), and that both are facets of the same biased competition occurring throughout the dorsal visuomotor stream (Duncan, 2006; Cisek, 2007). Indeed, it has been shown that microstimulation in a putatively motor region of frontal cortex can influence processing in visual cortex (Armstrong et al., 2006), demonstrating a strong link between attention and action selection.
Another important implication of our findings concerns the site of the competition that determines choices. Decision-related modulations in the sensorimotor system do not themselves necessarily imply that decisions are made within sensorimotor circuits. They could instead be made “upstream” in regions such as PFC, which are clearly involved in decisions (Tanji and Hoshi, 2001; Wallis and Miller, 2003) and project into sensorimotor regions. However, our results argue against this traditional view. First, we found that the dynamics of the competition that determines decisions are dependent on spatial variables. These are irrelevant for the abstract economics of cognition, but are important for the motor system, which selects between physical actions where geometrical relationships matter. Second, these effects of distance appear in cell activity as soon as cells respond to the stimuli, implying that the competition between potential actions takes place all throughout the fast sensorimotor “dorsal” visual stream (Cisek, 2007; Cisek and Kalaska, 2010). All of these results are remarkably well captured by a simple computational model (Cisek, 2006) which suggests the following conclusion: that although decisions between actions are influenced by variables supplied by higher cognitive regions, they are determined by a competition which takes place within sensorimotor circuits.
Footnotes
This work was supported by research grants from the Canadian Institutes of Health Research and the EJLB Foundation, a Groupe de Recherche sur le Système Nerveux Central doctoral fellowship to A.P-B., and an infrastructure grant from the Fonds de la Recherche en Santé du Québec. We thank Marie-Claude Labonté for technical support, and Pascal Poisson-Fortier and Trevor Drew for valuable comments regarding the manuscript and analyses.
- Correspondence should be addressed to Dr. Paul Cisek, Département de physiologie, Université de Montréal, C.P. 6218 Succursale centre-ville, Montréal, QC, H3C 3J7, Canada. paul.cisek{at}umontreal.ca