Abstract
Recordings in the lateral intraparietal area (LIP) reveal that parietal cortex encodes variables related to spatial decision-making, the selection of desirable targets in space. It has been unclear whether parietal cortex is involved in spatial decision-making in general, or whether specific parietal compartments subserve decisions made using specific actions. To test this, we engaged monkeys (Macaca mulatta) in a reward-based decision task in which they selected a target based on its desirability. The animals' choice behavior in this task followed the molar matching law, and in each trial was governed by the desirability of the choice targets. Critically, animals were instructed to make the choice using one of two actions: eye movements (saccades) and arm movements (reaches). We recorded the discharge activity of neurons in area LIP and the parietal reach region (PRR) of the parietal cortex. In line with previous studies, we found that both LIP and PRR encode a reward-based decision variable, the target desirability. Crucially, the target desirability was encoded in LIP at least twice as strongly when choices were made using saccades compared with reaches. In contrast, PRR encoded target desirability only for reaches and not for saccades. These data suggest that decisions can evolve in dedicated parietal circuits in the context of specific actions. This finding supports the hypothesis of an intentional representation of developing decisions in parietal cortex. Furthermore, the close link between the cognitive (decision-related) and bodily (action-related) processes presents a neural contribution to the theories of embodied cognition.
Introduction
Parietal cortex is crucially involved in the selection and planning of actions to objects in space. Recordings from neurons in parietal cortex, the lateral intraparietal area (LIP) in particular, suggest that parietal neurons are implicated in spatial decision-making (Platt and Glimcher, 1999; Shadlen and Newsome, 2001; Roitman and Shadlen, 2002; Dorris and Glimcher, 2004; Sugrue et al., 2004; Gold and Shadlen, 2007; Yang and Shadlen, 2007; Rorie et al., 2010; Leathers and Olson, 2012). LIP activity is modulated by variables that quantify the evidence for choosing a target in space or the desirability of a target in space (Platt and Glimcher, 1999; Shadlen and Newsome, 2001; Roitman and Shadlen, 2002; Dorris and Glimcher, 2004; Sugrue et al., 2004; Gold and Shadlen, 2007; Leathers and Olson, 2012).
In regard to spatial decision-making, there have been two main propositions of parietal function. First, parietal cortex may operate as a purely cognitive module that performs a general function of spatial selection. An example of such general spatial function are processes related to shifts in spatial attention (Goldberg and Bruce, 1985; Gottlieb et al., 1998; Colby and Goldberg, 1999; Bisley and Goldberg, 2003). Second, parietal cortex may subserve decision-making to select targets of particular actions (Gold and Shadlen, 2003; Marotta et al., 2003; Maunsell, 2004; Shadlen et al., 2008; Andersen and Cui, 2009; Kable and Glimcher, 2009). These possibilities are difficult to discern because recordings in LIP in decision tasks have been performed in saccadic contexts, i.e., animals at a certain moment during a trial make an eye movement to indicate their choice (Platt and Glimcher, 1999; Shadlen and Newsome, 2001; Roitman and Shadlen, 2002; Dorris and Glimcher, 2004; Sugrue et al., 2004; Maimon and Assad, 2006; Gold and Shadlen, 2007; Yang and Shadlen, 2007; Rorie et al., 2010; Bennur and Gold, 2011; Leathers and Olson, 2012). The neural code in LIP could therefore represent the cognitive process of target selection, but it could also represent a developing plan to make an eye movement and the variables associated with achieving that plan (Gold and Shadlen, 2003; Maunsell, 2004; Shadlen et al., 2008; Andersen and Cui, 2009; Kable and Glimcher, 2009).
Here we address this long-standing question by recording from two regions of parietal cortex, LIP and the parietal reach region (PRR), in tasks in which monkeys are instructed to make a spatial choice in one of two choice contexts: an eye movement (a saccade) or an arm movement (a reach). This design allows us to test whether parietal cortex has a general, action-independent function in regard to spatial decision-making, or whether individual parietal circuits subserve decisions implemented using particular actions.
Materials and Methods
Subjects.
Two adult male rhesus monkeys (Macaca mulatta: Monkey S, 7 kg; Monkey B, 8 kg) participated in this study. The animals sat head-fixed in a custom designed monkey chair (Crist Instrument) in a completely dark room. Visual stimuli (squares of 2.3° by 2.3°) were back-projected by a CRT projector onto a custom touch panel positioned 25 cm in front of the animals' eyes. Eye position was monitored by a scleral search coil system (CNC Engineering). Monkey S was trained to reach with its right arm and Monkey B with his left arm. In each monkey, we recorded from the hemisphere that was contralateral to the reaching arm. All procedures conformed to the Guide for the Care and Use of Laboratory Animals and were approved by the Washington University Institutional Animal Care and Use Committee.
Task.
The monkeys first fixated on and put their hand on a central target. After 120 ms, two white targets simultaneously appeared in the periphery, one in the response field (RF) of the recorded cell and the other in the opposite location. At the same time, the central fixation point changed color to either red or blue, instructing the monkeys to select a target using a saccade or a reach, respectively. After a variable delay interval (0.8–1.6 s), the fixation point disappeared, cueing the monkey to execute a movement to one or the other target. If the monkeys failed to make the instructed movement to within 7° of one of the two targets within 1.5 s, then the animal received no reward and the start of the next trial was delayed by 2 s. Otherwise, a new trial started immediately after the reward delivery of the preceding trial was completed. When selecting a peripheral target using a saccade, the animal had to keep his hand on the central target; when selecting a target using a reach, the animal had to keep fixating the central target until the reach was completed.
Each target was associated with a reward on each trial. The reward consisted of a primary reinforcer—a drop of water, delivered by the opening of a valve for a particular length of time—combined with a secondary reinforcer—an auditory tone of the same duration. Reward durations for the two targets had a ratio of either 3:1 or 1.5:1. The ratio was held constant in blocks of 7–17 trials (exponentially distributed with a mean of 11 trials) and then changed to either 1:3 or 1:1.5. The time that the water valve was held open was drawn on each trial from a truncated exponential distribution that ranged from 20 to 400 ms. The mean of the exponential distribution differed for each target and depended on the reward ratio for that block. For a reward ratio of 1.5:1 (3:1), the mean valve open times for the richer and poorer target were 140 and 70 ms (250 and 35 ms), respectively. To help prevent animals from learning the absolute values of reward durations, we further randomized reward delivery by multiplying valve open times by a value between 80 and 120%. This value was changed on average every 70 trials (exponential distribution truncated to between 50 and 100 trials). The volume of fluid delivered was proportional to the valve opening times.
Electrophysiological recordings.
We lowered glass-coated tungsten electrodes (Alpha Omega, impedance 0.5–3 MΩ at 1 kHz) from 2.8 to 10.8 mm below the dura in LIP, and from 2.1 to 11.6 mm below the dura in PRR. We detected individual action potentials using a dual-window discriminator (BAK Electronics). A custom program ran the task and collected the neural and behavioral data. We characterized the response properties of each isolated single unit by running a standard memory task in which we randomly interleaved saccade and reach trials (Snyder et al., 1997). Areas LIP and PRR were first identified using anatomical MR scans, to ensure we were in the lateral/medial bank of the intraparietal sulcus, respectively. Their localization was confirmed by finding regions containing a high proportion of neurons with transient responses to visual stimulation and saccadic or reaching movements, and showing sustained responses throughout a delayed saccade or a delayed reach trial (Kubanek et al., 2013). The decision task was performed only on cells with maintained activity during the delay period of memory saccade or memory reach trials (approximately one-half of all cells encountered in LIP, and approximately one-half of all cells encountered in PRR). These criteria were identical in the two areas. The LIP and PRR recording were performed serially in Monkey S, and interleaved in Monkey B. In subsequent analyses, we found that the particular level of maintained activity during the delay period did not have a significant impact on the results (data not shown).
Target desirability.
In this reward-based task, we inferred the desirability of each target to describe animals' behavior in this task. To do so, we applied a reinforcement-learning model (Sutton and Barto, 1998; Seo and Lee, 2009). In the reinforcement-learning model, on every trial, t, the desirability of the RF target is defined as the difference between the value function assigned to the RF target, Vt(o), and the value function assigned to the opposite target, Vt(o′), as follows:
The value function of a selected option o on trial t, Vt(o), is updated according to a learning rule, as follows:
where Vt−1(o) is the value function of option o on previous trial, rt−1 is the reward received on the previous trial (solenoid open time), and α denotes the learning rate. The value function of the unchosen option, Vt(o′) is not updated.
The probability of choosing the RF option o is then a logistic function of the desirability, as follows:
Here, β is the inverse temperature parameter and ε is an intercept to account for fixed biases for one target over the other. We used separate intercepts for each effector.
The parameters α, β, and the two intercept terms were fitted to behavioral data obtained when recording from each cell using the maximum likelihood procedure, maximizing the log likelihood criterion (L):
where Pt(o(t)) is, as given above, the probability of choosing option o(t) on trial t [note that Pt(o(t)) = 1 − Pt(o′(t))].
We fitted separate reinforcement-learning model coefficients to account for the behavioral data obtained while recording from each of the parietal neurons. This gave α = 0.87 ± 0.086 (mean ± SD, across all neurons; Monkey S: α = 0.86 ± 0.080; Monkey B: α = 0.88 ± 0.095), and β = 0.021 ± 0.0063 (Monkey S: β = 0.024 ± 0.0056; Monkey B: β = 0.019 ± 0.0063).
The model faithfully accounted for the macroscopic choice behavior (Fig. 1D). The model also faithfully accounted for the microscopic (trial-wise) choice behavior (Fig. 1E). In these data of this example cell, the Pearson's correlation (r) between the model's prediction and the actual behavior is 0.77. The mean ± SD r over the data over all individual cells (n = 125) was r = 0.65 ± 0.07.
We also tested a model based on effector values instead of on spatial values, that is, a model in which the target desirability was computed separately for each effector. This effector-values model performed substantially worse (r = 0.41 ± 0.09, mean ± SD, n = 125) than the spatial-values model (r = 0.65 ± 0.07), and the difference was highly significant (p < 0.0001, paired t test; n = 125). We therefore used the former model.
We also tested a model that features divisive normalization, i.e., in which the value of the RF target is normalized by the sum of the values of both targets: DV =
We also tested how parietal neurons encode the value of the RF target alone, i.e., DV = Vt(o). Again, the effects of this DV were somewhat smaller than the effects of the original desirability. However, again, the same principal effects are observed and similar conclusions are supported.
Modulation of neural activity by RF target desirability.
We evaluated the effect of desirability in a multiple regression on neural activity. The regression featured the desirability along with two additional factors that were found to modulate parietal/frontal activity in choice tasks (Barraclough et al., 2004; Seo et al., 2009); the choice made on the previous trial and the amount of reward obtained for the choice on the previous trial:
The neural RESPONSE is the spike rate estimated in a 500 ms sliding window (computed every 100 ms). We fitted the above regression separately to RESPONSE in each of the windows, separately for each cell, separately for saccade and reach trials, and separately for choice of the RF and of the opposite target. Sorting the trials by the animals' choice (Platt and Glimcher, 1999; Shadlen and Newsome, 2001; Sugrue et al., 2004) was important to ensure that the neural effect of desirability was not confounded by the neural effect of choice. We reported the coefficients (weights) assigned to the individual factors in this regression. Before performing the regression, we normalized the values of desirability over all trials of all cells such that the central 95% of values lay between 0 (2.5 percentile) and 1 (97.5 percentile). This linear scaling allows us to report the effect of desirability in spikes per second over the dynamic range of the desirability but does not otherwise change the results.
In Figures 3 and 5, we reported the weights averaged over the choices of the RF target and of the opposite target. In Table 1, we list the RF choice weights, the opposite choice weights, and the two averaged.
Action-based decision-related signals in parietal cortex
In LIP, the mean effect of previous choice (value 1 for RF target choice and value 0 for the opposite target choice), in the 0.5 s period used throughout the paper, was −1.8 sp/s for saccade choices and −1.4 sp/s for reach choices. The difference of −0.4 sp/s was not significant (p = 0.47). The mean effect of previous reward, when previous reward was normalized, such that the central 95% of values lay between 0 and 1, was −0.4 sp/s for saccades choices and −1.6 sp/s for reach choices. The difference of 1.2 sp/s was nonsignificant (p = 0.44, t test). In PRR, the mean effect of previous choice was −0.6 sp/s and −0.8 sp/s for saccade and reach choices, respectively, and the difference of 0.2 sp/s was not significant (p = 0.67). The mean effect of previous reward is −0.8 sp/s for saccade choices and −3.9 sp/s for reach choices. The difference of 3.2 sp/s was significant (p = 0.019).
The variable previous reward was a component of desirability, and so there may be an interaction between these variables. However, the exclusion of previous reward as a factor in the regression had only minimal impact on the results. Furthermore, our behavioral and neural results remained similar when we considered other forms of reward-based decision variables, such as the fractional income or its variants (Sugrue et al., 2004; Corrado et al., 2005).
We further extended this linear model to include also the choice effector (binary variable indicating whether a trial is a saccade trial (value 1) or a reach trial (value 0)), and all effector interactions:
This linear model was again fitted separately for each neuron, and separately for choices of the RF target and choices of the anti-RF target.
Results
Task and choice behavior
Monkeys engaged in a task in which they selected one of two targets based on its desirability (Platt and Glimcher, 1999; Dorris and Glimcher, 2004; Sugrue et al., 2004). Specifically, one target was associated with a larger liquid reward than the other target, with mean payoff ratios of 1.5:1, 3:1, 1:1.5, or 1:3. The payoff ratio was held constant for 7–17 trials before changing to one of the opposite ratios (see Materials and Methods for details). To further challenge the animals, the volume of juice delivered on each trial was variable, drawn from a truncated exponential distribution (see Materials and Methods for details). Critically, the type of choice, a saccade or a reach, was instructed by a color cue (Fig. 1A).
Decision task and behavior. A, After acquiring a central spot, one white target appeared in the neuronal RF and one appeared outside. Following a delay, the animal selected a target and received a reward. The animal made the selection using a saccade or a reach if the fixation point was red or blue, respectively. On each trial, the reward was drawn from an exponential distribution with a particular mean payoff. The mean payoffs of the two targets were held constant at ratios of 3:1, 1.5:1, 1:1.5, or 1:3 for 7–17 trials before changing. B, Proportion of choices of an option as a function of each payoff ratio, aligned on a transition. C, Frequency histogram of successive choices of one option. Dashed line: exponential fit. D, Proportion of choices of the RF target (±SEM) of an option as a function of the desirability of the RF target, separately for saccade (red) and reach trials (blue). The lines are logistic fits. E, Example of a sequence of an animal's choices of the RF target (black) and the model's estimate of the desirability of the RF target (green). The figure shows data for all trials obtained while recording from an LIP cell. The choices (black trace) were smoothed with a zero-delay Gaussian filter with SD equal to three trials. The correlation between the two traces is 0.77. The gray lines on the top mark the trials in which the RF target was the richer target.
The monkeys chose the richer option more frequently, but not stereotypically (Fig. 1B). On average, after each change of payoff ratio, the monkeys' behavior converged in approximately three trials to a new steady-state choice ratio. Interestingly, the animals' choices in this task followed the strict matching law (Herrnstein, 1961, 1979). Specifically, for a ratio of 1.5:1, the strict matching law dictates choosing the richer option on 60% of trials. Our animals chose the richer option on 61.8 and 61.3% of trials, respectively. For a ratio of 3:1, the matching law dictates choosing the richer option on 75% of trials. Our two animals chose this option on 74.7 and 71.6% of trials, respectively. The finding that animals showed matching behavior in this task is notable given that we did not impose specific constraints typically used to elicit the matching behavior, such as reward baiting or a change-over delay punishment of frequent switching (Herrnstein, 1961; Sugrue et al., 2004).
On average, the animals selected the same target in a row for approximately three consecutive trials (mean switch probability = 0.31). The distribution of stay durations was well fit by an exponential (Fig. 1C), consistent with an independent stay-switch decision being made on each individual trial.
We modeled each animal's choice behavior using a reinforcement-learning model (Sutton and Barto, 1998; Seo and Lee, 2009) (see Materials and Methods). The model computes a variable for each trial that quantifies the relative desirability of the target placed in the RF compared with the target outside the RF. We refer to this reward-based variable as “desirability of the RF target,” or “desirability” (Dorris and Glimcher, 2004). Figure 1D reveals that on average, the modeled desirability of the RF target was predictive of the monkeys' RF target choices. When the data are binned into 20 bins as shown in this figure, a logistic fit to the data accounts for 99.8% of the variance in mean choice probability on saccade trials and 99.6% on reach trials in Monkey S, and 99.7 and 99.6%, respectively, in Monkey B. In addition to accounting for the molar behavior (Fig. 1D), the model also faithfully captured the animals' choices in the individual trials (Fig. 1E). Thus, the inferred target desirability serves as a good descriptor of the animal's choice behavior in this task.
LIP recordings
While animals performed the task, we recorded the discharge activity of 60 LIP neurons. One target was placed inside the RF of the neuron being recorded, and the other target on the opposite side of the fovea. We first tested whether LIP activity in this task was modulated by the animals' choice, i.e., whether an animal was going to choose the RF target or the opposite target (Platt and Glimcher, 1999; Sugrue et al., 2004; Gold and Shadlen, 2007). The neural effects of the animals' choice are shown in Figure 2. To quantify the effect of choice, we measured firing rates in 500 ms windows sliding through the trial in 100 ms steps, and subtracted the discharge rate underlying a choice of the anti-RF target from that of the RF target, in each window. Indeed, this analysis confirms that LIP activity was modulated by the animals' choice in this task (Fig. 3A). The effect was strongest around the time of the go cue (Platt and Glimcher, 1999). Importantly, the effect of choice was sensitive to the instructed effector (Fig. 3A, left). Specifically, the modulation due to an animal's choice was substantially stronger for choices made using saccades (red) compared with choices made using reaches (blue), and the distinction (black) was strongest around the time of the go cue.
LIP firing rates as a function of the instructed effector and the animals' choice. Mean discharge rate (±SEM) over all LIP cells (n = 60), as a function of the instructed effector (red: saccades, blue: reaches), and as a function of the animals' choice (solid, RF target; dashed, anti-RF target). Activity is aligned (thick black vertical lines) to the effector cue and target onset (left), to the go cue (middle), and to movement onset (right). Because the targets remain at the same location while a given cell is being recorded from, and the reward contingencies remain constant for 7–17 trials, an animal can show a spatial bias that appears in the LIP firing rate even before the targets reappear (top left, time 0). In contrast, the effector to be used on each trial is not known before time 0, and therefore the divergence in activity on saccade compared with reach trials appears only after the effector cue is processed. To produce the plot, we discretized the time axis into 1 ms nonoverlapping bins and counted the number of spikes occurring in each bin. This signal was subsequently filtered using a 181 point low-pass digital filter with a transition band from 2 to 15 Hz and a −3 dB point at 9 Hz.
Decision-related neural signals in LIP are stronger when animals choose using a saccade. A, Left, The effect of an animal's choice as a function of time throughout the trial, separately for choices made using saccades (red) and reaches (blue). The figure gives the mean difference between the LIP discharge activity for choices of the RF target and choices of the anti-RF target. The activity was measured in 500 ms windows sliding through the trial in 100 ms steps. Activity is aligned (thick black vertical lines) to the effector cue and target onset (left), to the go cue (middle), and to movement onset (right). The black trace is the difference between the blue and red traces. The size of the dots (inset) indicates the significance of the effect in each window (paired t tests, n = 60). Right, The effects quantified in a 0.5 s period preceding the go cue (gray thick line overlying the abscissa of the middle part of the left plot) for each individual cell. The histograms summarize the population effects. The black triangle in each histogram marks the population mean; ****p < 0.0001, *p < 0.05; n.s. p > 0.05, t test, n = 60. B, Same format as in A, for the neural effects of desirability of the RF target. The figure shows the mean regression coefficient of desirability (see Materials and Methods) as a function of time throughout the trial, separately for saccade and reach choices. We made sure to account for the animals' choices by fixing the individual choice conditions before computing the effects of desirability (see text for details). This prevented an influence of the effect of choice (A) on the effect of desirability.
We quantified the neuronal effects throughout the paper in a 0.5 s period immediately preceding the go cue. We chose a period before the go cue to avoid movement-related effects. We chose the latest such period possible (a period ending at the go cue) to maximize the time for the targets and effector cue to be processed and to minimize effects from the initial sensory processing of these stimuli. Figure 3A, right, shows the effects of animals' choice separately for each neuron. The figure reveals that the neural effect of target choice was generally stronger during choices made with saccades compared with choices made with reaches [mean: 18.7 sp/s vs 12.0 sp/s (p < 0.0001, t test, n = 60); Monkey S: 20.2 sp/s vs 13.3 sp/s (p < 0.0001, n = 40); Monkey B: 15.6 sp/s vs 9.3 sp/s (p = 0.0071, n = 20)]. This effect replicates the previous finding that LIP is more active before a saccade than a reach into the RF (Snyder et al., 1997, 2000; Dickinson et al., 2003).
If LIP is involved not just in saccade preparation, but also in effector-nonspecific coding of target salience, then we would predict that LIP would code target desirability in both reach and saccade tasks. If, on the other hand, the neural mechanisms for computing target salience are different for saccades and reaches, then we might expect to find a much larger difference in the representation of target desirability in the two effector contexts. Previous studies have shown that LIP encodes RF target value or desirability (Platt and Glimcher, 1999; Dorris and Glimcher, 2004; Sugrue et al., 2004; Rorie et al., 2010), but because these studies all used only eye movements as the vehicle for the choice, it is unknown whether or not LIP codes desirability generically; that is, whether or not the encoding depends on the action to be performed.
To investigate the effects of the target desirability, we had to make sure that the effects of desirability were not influenced by the effects of the animal's choice (Fig. 3A). This is important, otherwise the effects could be relatively straightforwardly extrapolated from the previous findings that LIP neurons fire more strongly when animals are instructed to make a saccade into the RF compared with a reach into the RF (Snyder et al., 1997, 2000; Dickinson et al., 2003). To unambiguously overcome such confound, we split trials into two groups according to the animal's choice (i.e., choice of the RF target or choice of the opposite target). We first then, within each of these groups, investigated the modulation of the firing rates by target desirability (see Materials and Methods). We found that the effects of desirability were similar in the RF and out of RF choice groups (Table 1), and therefore averaged the effects computed separately in each group over the two groups. Figure 3B shows the result, in the same format as in Figure 3A.
Figure 3B shows that for saccade choices (red), the effect of desirability on LIP firing is greater than zero and significant (p < 0.01) from the onset of the targets to the time of the movement. This replicates previous findings that neuronal activity is modulated by value or desirability of the RF target in saccadic choice tasks (Platt and Glimcher, 1999; Dorris and Glimcher, 2004; Sugrue et al., 2004; Rorie et al., 2010). Importantly, we found that the desirability effect during reach choices (blue) is substantially weaker than the effect during saccade choices. As in previous studies (Platt and Glimcher, 1999; Dorris and Glimcher, 2004), target desirability affects LIP firing rates even before the appearance of the targets, since the location at which the preferred target will appear is completely predictable for a given cell. The desirability effect is necessarily identical before and shortly after the effector cue, which tells the animal which effector to use to make a choice (Fig. 3A, left, time 0). The desirability effect then diverges, with the difference in effects on saccade versus reach trials reaching significance (p < 0.05) in the window centered at 450 ms following cue onset. At this time, the effect of desirability during reach choices is 52% as large as the effect during saccade choices. The desirability effect for reach choices (blue) continues to drop and loses significance entirely in the window centered at 350 ms preceding the go cue. At this time, the reach effect is 43% of the saccade effect. In contrast, the desirability effect during saccades retains significance (p < 0.001) until the time of the movement.
We quantified the effects of target desirability on a cell-by-cell basis in Figure 3B, right. The mean effect of desirability for saccades is 9.8 sp/s (p < 0.00001, t test). In comparison, the effect for reaches is only 3.6 sp/s (p = 0.11). The difference of 6.2 sp/s (Monkey S: 5.1 sp/s; Monkey D: 8.5 sp/s) is significant (p = 0.024, paired t test, n = 60). The desirability effect for saccades is positive in 49 of 60 cells (82%, p < 0.00001, one-proportion z test). In contrast, the desirability effect for reaches is positive in only 37 of the 60 cells (62%, p = 0.07); the difference is significant (p = 0.015, two-proportion z test). Even when the data are split into the RF and anti-RF choice trials (Table 1), the influence of the desirability on LIP firing rate is consistently greater on saccade compared with reach trials. The variance of the effect of the desirability on firing rate (Fig. 3B, marginal histograms) was somewhat greater for reaches (SD = 16.9 sp/s) than for saccades (SD = 14.0 sp/s), although the difference was not significant (p = 0.16, F test for equal variance). There is no evidence for separate populations either increasing or decreasing firing with desirability on reach trials, i.e., the distribution is strongly unimodal with a peak close to zero.
Thus, the choice- and desirability-related neural effects in LIP are stronger during target selection in a saccade task compared with a reach task. A critical observation is that the effects of choice and the effects of desirability are independent of one another. It is conceivable that the stronger effect of desirability in LIP for saccades compared with reaches reflects no more than the well established fact that LIP is more active for planned saccades than for planned reaches. If this were the case, however, then we would expect that both the time courses and the magnitude ratio of the two effects would match one another. This was not the case. The difference in choice-related activity for saccades versus reaches grew slowly, starting only 450 ms after target onset and peaking at around the time of the go cue (Fig. 3A). In comparison, the difference in the effect of desirability appears within 150 ms of target onset, reaches a maximum value within 350 ms, and remains at approximately the same value up until the go cue (Fig. 3B). Thus, the two effects (choice and desirability) have very different time courses. The ratio of the magnitudes of the effects are also very different, with substantially greater effects for desirability than choice. In the 0.5 s period starting 100 ms after target onset, reach trials are associated with nearly as much activity as saccade trials (magnitude ratio of 97%). In comparison, the magnitude ratio for desirability is 53% in this same period. Similarly, in the last 0.5 s before the go cue, reaches are associated with almost two-thirds (64%) as much activity as saccades, yet the effect of desirability on reach trials is only just over one-third of that on saccade trials (37%). Thus, the ratio of effect sizes for reach compared with saccade trials is approximately two times larger for desirability compared with choice. Finally, we asked how correlated the choice and the desirability effects are on the cell-by-cell basis in the same interval, when the saccade and reach effects are averaged together. This analysis reveals a correlation coefficient between the desirability and the choice effects of only r = 0.19, which was not significantly different from zero (p = 0.13, n = 60 LIP cells). Thus, based on the differences in time courses, the effect magnitudes, and the lack of correlation between the desirability and the choice effects, we conclude that the greater modulation by desirability for saccades compared with reaches in LIP is independent of the previous finding that LIP is generally more responsive on saccade compared with reach trials.
PRR recordings
In this task, we also recorded the discharge activity of 65 PRR neurons. PRR neurons modulate their activity by an animal's choice more strongly in reach tasks compared with saccade tasks (Cui and Andersen, 2007; Scherberger and Andersen, 2007; Kubanek et al., 2013). Indeed, the modulation of PRR firing rates by the animals' choice is substantially higher during reach (blue) compared with saccade (red) choices (Figs. 4, 5A, left). The effect of choice in PRR (Fig. 5A, right) is 19.5 sp/s during reach choices (Monkey S: 20.8 sp/s; Monkey B: 18.1 sp/s) compared with 2.7 sp/s during saccade choices (Monkey S: 3.3 sp/s; Monkey B: 2.0 sp/s) during saccade choices. The difference of 16.8 sp/s is significant (both monkeys, p < 0.0001).
PRR firing rates as a function of the instructed effector and the animals' choice. Same format as in Figure 2 for the population of n = 65 PRR cells.
PRR neurons show decision-related neural signals predominantly when animals choose using a reach. Same format as in Figure 3, for the population of 65 PRR neurons. A, Effect of animal's choice (RF vs anti-RF target) on neuronal firing rate. B, Effect of desirability of the RF target on neuronal firing rate.
The neural effects of the RF target desirability have not yet been investigated in PRR, although there is some evidence that PRR neurons show value-based modulations (Musallam et al., 2004). The mean time course of the neural effect of desirability in PRR in our task is shown in Figure 5B, left. The figure demonstrates that PRR neurons encode the RF target desirability only during reach choices (blue). The effect during reach choices becomes significant (p < 0.05) in the window centered at 350 ms following the cue onset. The effect peaks (p < 0.001) at 550 ms following the cue onset, and then gradually diminishes. In contrast, the effect of desirability during saccade choices (red) never reaches significance; if anything, the effect is opposite to that of reaches. The effects are quantified for the individual neurons in Figure 5B, right. The mean effect of desirability for reaches is 2.6 sp/s (p = 0.04), compared with a nonsignificant −1.7 sp/s for saccades (p = 0.14). The difference of 4.3 sp/s (Monkey S: 1.6 sp/s; Monkey B: 7.3 sp/s) is significant (p = 0.015, paired t test, n = 65).
Thus, PRR neurons show selection-related neural signals specifically in the context of a reach task. These findings held even when the data were split into the RF and anti-RF choice trials (Table 1). Moreover, all previous findings held when we further normalized the activity of each neuron (Fig. 6).
Although both animals exhibit similar saccade–reach effects (Table 1), the animals show certain differences. These differences must be taken with care because the data are already analyzed from four perspectives (saccades, reaches; choices into RF, choices out of RF); splitting the data further (by animals) may lead to spurious effects. Nonetheless, Monkey S shows a significant effect of desirability for reaches out of RF in LIP, and exhibits a trend (albeit not significant) for reaches into RF in LIP. Yet the LIP saccade effects in this animal are almost twice as large as the LIP reach effects. Thus, this case does not change the conclusion that decision signals are differentially modulated by particular response effectors. Second, both monkeys show a positive desirability effect for reaches in PRR, but in neither monkey does this effect reach significance. This likely reflects the fact that the desirability effects in PRR (Fig. 5B) are weaker compared with LIP (Fig. 3B), and splitting the dataset of n = 65 into two subsets drops the statistical power below threshold. It is interesting to note that Monkey B shows in PRR a negatively signed effect for the nonpreferred effector, i.e., saccades. In general, this animal exhibits greater contrasts between the two effectors in each area compared with Monkey S. It is possible that this animal planned a given movement more autonomously, i.e., without forming a provisional plan to engage the other effector as well, compared with Monkey S. It is also worth noting that the effects of desirability for choices of the anti-RF target are stronger than the effects for choices of the RF target, for both LIP and PRR. A similar phenomenon has been observed previously in LIP (Sugrue et al., 2004). The nature of this effect is not yet clear.
It must be noted, however, that unlike the case of LIP, we cannot distinguish between the effect of desirability being an independent property of PRR, versus being a consequence of a more general effector specificity in PRR. Very little modulation can be attributed to choice on saccade trials in PRR (Fig. 5A). The lack of modulation due to desirability on saccade trials in PRR could be secondary to this finding.
It is possible that the stronger modulations by the RF target desirability for the preferred effector (saccades in LIP, reaches in PRR) can be due to higher firing rates for the preferred effector. To account for the effects of the effector, we extended Eq. 1 to include Effector (i.e., binary variable indicating whether a trial is a saccade trial or a reach trial), as well as all Effector interactions (Eq. 2). This linear model was fitted separately to data of each cell, and separately for choices of the RF target and choices of the anti-RF target; the results were averaged over the RF and anti-RF choices. To test how the RF target desirability is modulated by the response effector, we investigated the interaction of Desirability × Effector. If the effector-specific modulations of desirability were solely due to higher firing rates for a given effector, then the inclusion of Effector as a factor in the model should abolish any interaction of Desirability × Effector in a given area.
Yet, we found that even after accounting for Effector, there were significant interactions of Desirability × Effector in each area (Fig. 7). The interaction pointed in the expected direction given the previous effects shown in Figures 3B, 5B, 6, and Table 1. In particular, a positive interaction indicates a preference for encoding desirability in the saccade context, and this is observed in LIP (Fig. 7, magenta). A negative interaction indicates a preference for encoding desirability in the reach context, and this is seen in PRR (Fig. 7, cyan). These effects, quantified for each cell in the same pre-go-cue interval as elsewhere in this paper (Fig. 7, right, histograms) are significant (LIP: p = 0.024; PRR: p = 0.015; t statistic associated with the interaction term; n = 60 in LIP, n = 65 in PRR). Thus, the effector-specific decision-related effects reported in this study are not due to distinct levels of activity on saccade and reach trials.
The modulation of the reward-based decision variable by the choice effector in each area. Time course of the mean value of the interaction between the RF target desirability and the choice effector (Eq. 2, α3) as a function of time, separately for LIP (magenta; n = 60 neurons) and PRR (cyan; n = 65 neurons). Same format and alignment as in Figures 3 and 5. The factor values (weights) were fitted separately for choices of the RF target and choices of the anti-RF target and subsequently averaged together. A positive (negative) value of the interaction indicates a preferential coding of desirability for saccades (reaches). The value of the interaction term measured in the same pre-go-cue interval as elsewhere in the paper is shown separately for each neuron in the histograms on the right. Filled bars indicate cells in which the interaction is significant (p < 0.05, t statistic). The triangles denote the population mean; *p < 0.05, two-sided t test.
Thus far, we report effects over the LIP and PRR neuronal populations. Additional insights might be gained if we focus only on neurons that encode a particular factor significantly. To this end, we specifically investigated the response properties of neurons that significantly encode the Desirability × Effector interaction (Eq. 2, α3), Effector as main factor (α1), and Desirability as main factor (α2). The linear model (Eq. 2) was fitted using firing rates measured in the same pre-go-cue interval as elsewhere in this paper. The results are shown in Figure 8. Each plot includes the neurons that significantly encode each factor of interest. The bars indicate the proportion of the significant neurons that show a positive value of the factor of interest. Data are shown separately for LIP (magenta), PRR (cyan), and separately for RF choices, anti-RF choices, and effects averaged over the RF choices and anti-RF choices.
Specific analysis of the response properties of significantly coding neurons. This figure reports response properties of neurons that significantly (p < 0.05, t statistic) encode (A) the Desirability × Effector interaction (α3), (B) Effector as the main factor (α1), and (C) Desirability as the main factor (α2) in the linear model in Eq. 2. The model used firing rates measured in the same pre-go-cue interval as elsewhere in this paper. The number and percentage of the significant neurons is given below each bar. The bars indicate the proportion of the neurons that show a positive value of the factor of interest. Data are shown separately for LIP neurons (magenta) and PRR neurons (cyan).
Figure 8A, the Desirability × Effector interaction, recapitulates the histograms shown in Figure 7. In particular, for the effects averaged over the RF and anti-RF choices (left), in LIP (magenta), there were 12 neurons (20%) that encoded the interaction significantly (p < 0.05 for either RF or anti-RF choices). This proportion of significant neurons is modest but aligns with previous reports of the encoding of value-based decision-related variables in LIP (e.g., 18%; Seo et al., 2009). Of these, 75% showed a positive interaction, indicating a predominant saccadic desirability coding in LIP. In contrast, in PRR, only 25% of the significant neurons showed a positive interaction (i.e., 75% of the significant neurons showed a negative interaction), indicating a predominant reach desirability coding in PRR. The effects were similar when the data are shown separately for the RF and anti-RF choices (Fig. 8A, middle and right parts of the plot).
The effects of Effector as main factor (Fig. 8B) reproduce the findings of previous studies. These studies found that for RF target choices, LIP neurons fire more on saccade compared with reach trials (Snyder et al., 1997, 2000; Dickinson et al., 2003). In our data (Fig. 8B, middle part of the plot), this is demonstrated by 92% of the significant LIP neurons showing a positive effect (magenta bar), i.e., an increase of firing on saccade trials. PRR neurons discharge more vigorously on reach compared with saccade trials (Snyder et al., 1997; Calton et al., 2002; Cui and Andersen, 2007; Scherberger and Andersen, 2007). In our data (Fig. 8B, middle part of the plot, cyan), this is apparent from no neurons showing a positively signed effect, i.e., all neurons increase their activity on reach trials. The Effector effects for anti-RF choices (right part of the plot) when considered independently have not been investigated before. Previous studies only considered differences (RF minus anti-RF; Calton et al., 2002; Dickinson et al., 2003). The direction of the effect for anti-RF choices we report is consistent with increased modulation (RF minus anti-RF) for saccades in LIP and reaches in PRR (Calton et al., 2002; Dickinson et al., 2003). The effects averaged over the RF and anti-RF choices (left part of the plot) are in the same directions as the effects for the RF choices.
Finally, the effects of Desirability as main factor (Fig. 8C) recapitulate the previous findings that LIP neurons increase their activity with increasing desirability, or value, of the RF target (Platt and Glimcher, 1999; Dorris and Glimcher, 2004; Sugrue et al., 2004; Rorie et al., 2010). Interestingly, the same effect is obtained in PRR (cyan). This is a novel finding, although there was a previous suggestion (Musallam et al., 2004) that PRR neurons may encode reward-based decision-related variables in a way similar to LIP.
Discussion
In regard to spatial decision-making, it has been debated (Ro et al., 2001; Gold and Shadlen, 2003; Marotta et al., 2003; Maunsell, 2004; Shadlen et al., 2008; Andersen and Cui, 2009; Kable and Glimcher, 2009) whether parietal cortex helps to select a target in space regardless of how that target will be used, or whether target selection is implemented in specific parietal circuits, where the circuits that are used depend on the action that will be performed. We recorded neural responses from particular parietal circuits in a reward-based choice task to provide an answer. The results support the latter hypothesis.
Previous studies found that LIP neurons increase their firing rates with increasing value or desirability of the saccades directed into the neuronal response field (Platt and Glimcher, 1999; Dorris and Glimcher, 2004; Sugrue et al., 2004; Rorie et al., 2010). We reproduced this finding (Fig. 3B, red). Critically, it has been debated whether the value-based decision-related signals in parietal cortex reflect a generic cognitive decision process, or whether these signals are represented in an action-based, intentional framework, specifically in circuits subserving movements of particular effectors (Gold and Shadlen, 2003; Maunsell, 2004; Shadlen et al., 2008; Andersen and Cui, 2009; Kable and Glimcher, 2009; Cisek and Kalaska, 2010). Our paradigm allowed us to investigate how LIP neurons encode target desirability in a nonsaccadic context, specifically in the context of reaching arm movements. We found that LIP neurons encode target desirability much more strongly in the saccade context compared with the reach context (Fig. 3B, red). In contrast, PRR neurons encoded target desirability only in the reach context (Fig. 5B, blue). These findings support the hypothesis of an action-based, intentional representation of developing decisions in the parietal cortex (Gold and Shadlen, 2003; Shadlen et al., 2008). In this view, decisions evolve in circuits that are dedicated to the production of particular actions.
The decision-related signals in LIP are stronger in the saccadic context compared with the reach context, which suggests that LIP plays at least in part an oculomotor role in spatial choice. However, the data show two signs of LIP's general, effector-independent (possibly attentional) role in spatial selection. First, LIP neurons encode, albeit to a reduced degree, desirability-related and choice-related signals also during reach choices. This suggests a partially effector-independent function of LIP in spatial selection (Goldberg and Bruce, 1985; Gottlieb et al., 1998; Colby and Goldberg, 1999; Bisley and Goldberg, 2003; Liu et al., 2010; Bennur and Gold, 2011). However, an alternative is that LIP is saccade-specific, and that the reach effects may be due to animals' natural tendency to look at a reach target during a reach, despite our efforts to train the animals to keep fixating a central target during reach trials. This may lead to spurious neural effects in LIP during reaches. Saccade effects in PRR are small and so this confound would not affect PRR. Second, LIP encodes to a certain extent desirability even before the choice targets and the effector cue appear on the screen (Fig. 3B). This may also suggest a partially effector-independent role of LIP in spatial selection. However, an alternative is that this reflects a graded anticipation of a saccade trial (a saccade trial occurred with probability 50%). Such a process may not affect PRR which shows desirability effects only late in the trial.
There are differences in the decision-related effects in LIP and PRR that suggest that these parietal areas may serve distinct functions, beyond their effector specificity. First, as mentioned above, LIP is only partially saccade-specific, whereas PRR is almost entirely reach-specific. Second, as also already mentioned, LIP, but not PRR, shows an effect of desirability even before the targets and the effector cue appear. Although the LIP effects could be interpreted in an oculomotor framework (see previous paragraph), these differences may perhaps most parsimoniously be described by PRR's selecting targets as a reach-specific parietal circuit, whereas LIP reflects a mixture of attentional and oculomotor decision-related processes. With respect to the ongoing debate of LIP's role in spatial decision-making, this mixed proposition may not be satisfactory. However, our data contribute to this debate by establishing that the decision-related signals in LIP have a strong oculomotor component.
Related to the previous point, a notable further distinction between the two areas is that the effects in PRR are much more sluggish than in LIP. This is evident in the absence of transient responses to target onset in PRR (Fig. 4) in contrast to the clear transients in LIP (Fig. 2), in the relatively slow buildup of the desirability effects in PRR (Fig. 5B) compared with LIP (Fig. 3B), and in the relatively sluggish buildup of the interaction Desirability × Effector in PRR (Fig. 7, cyan) compared with LIP (magenta). The finding that the responses to the onset of the choice targets in PRR are slow compared with LIP is so striking that we report it in a separate study (Kubanek et al., 2013). Importantly, the distinction in the response dynamics is specific to a choice task; in a simple visually guided movement task, PRR neurons exhibit relatively fast transients, comparable to LIP (Kubanek et al., 2013). In that study, we concluded that the relatively fast response dynamics in LIP compared with PRR in a choice task reflect the notion that the selection of a saccade target should be fast, whereas the selection of a reach target should be more careful, more deliberate, and so slower. The dynamics of the decision-related neuronal effects specifically investigated in the present study further support that notion.
It has recently been shown that LIP neurons encode decision-related variables even before an endpoint of a saccade is specified to an animal (Bennur and Gold, 2011). Interpreted within our findings, because the brain in that task transforms a decision into a saccade, some aspects of this transformation might be observed in LIP already before the saccade endpoint is specified (Bennur and Gold, 2011). More generally, LIP neurons may reflect a collection of computations and variables that are necessary to eventually perform a saccadic eye movement, even when such computations are abstract or involve sequential processes (Colby et al., 1996; Sereno and Maunsell, 1998; Freedman and Assad, 2006; Oristaglio et al., 2006; Ipata et al., 2009). Nonetheless, an alternative that cannot be ruled out based on our findings is that LIP neurons may show generic, action-independent decision signals (Bennur and Gold, 2011), and that our tasks, in which an animal is explicitly cued to make its choice in the context of a specific action, highlighted the action-specific role of parietal circuits in regard to spatial decision-making.
Our findings provide evidence for an action-based, embodied neural architecture of decision-making, an alternative to a classical cognitive, action-independent architecture (Anderson, 2003). In the classical architecture, useful for making generic decisions, an abstract decision variable is first computed in a central, generic decision circuit (Padoa-Schioppa and Assad, 2006), and subsequently routed to a specific motor circuit for execution. In comparison, in the action-based, embodied architecture (Gold and Shadlen, 2003; Shadlen et al., 2008; Andersen and Cui, 2009; Kable and Glimcher, 2009), the decision process runs on circuits devoted to executing particular kind of movement. This architecture therefore does not require postdecision routing to trigger the desired movement. An intermediate possibility is that the signals we see in LIP and PRR may reflect the intermediate output of a decision circuit that “leaks out”, while the decision is still evolving, onto the movement planning circuit that will be used to implement that decision. Both the fully embodied architecture and the intermediate possibility have the advantage that they may allow animals to respond quickly and reduce errors associated with an abrupt choice of a movement effector. This can be evolutionarily advantageous when making urgent decisions such as where to turn, where to look, or which object to reach for. Embodied architecture of this sort may also be useful in complex motor tasks, such as when deciding to which side to move when a tennis opponent positions herself to play a volley. In other decision contexts, embodied architecture would not be applicable, for example, when choosing a meal from a menu (Padoa-Schioppa and Assad, 2006), or when choosing a spouse.
In summary, our data suggest that choices made using specific actions are reflected in specific parietal circuits. These findings support the view of embodied cognition in which decisions can be processed in the same circuits that are devoted to planning and execution of particular actions. This neural architecture can be advantageous in cases when humans or animals must make fast and specific decisions such as where to turn, where to look, or where to reach.
Footnotes
This work was supported by the grants from the NIH EY012135 and EY002687. We thank Jonathon Tucker for technical assistance, and Joshua Gold for helpful comments.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr Jan Kubanek, Department of Anatomy and Neurobiology, Washington University School of Medicine, 660 South Euclid Avenue, St Louis, MO 63110. jan{at}eye-hand.wustl.edu