Abstract
Several widely accepted models of decision making suggest that, during simple decision tasks, neural activity builds up until a threshold is reached and a decision is made. These models explain error rates and reaction time distributions in a variety of tasks and are supported by neurophysiological studies showing that neural activity in several cortical and subcortical regions gradually builds up at a rate related to task difficulty and reaches a relatively constant level of discharge at a time that predicts movement initiation. The mechanism responsible for this buildup is believed to be related to the temporal integration of sequential samples of sensory information. However, an alternative mechanism that may explain the neural and behavioral data is one in which the buildup of activity is instead attributable to a growing signal related to the urgency to respond, which multiplicatively modulates updated estimates of sensory evidence. These models are difficult to distinguish when, as in previous studies, subjects are presented with constant sensory evidence throughout each trial. To distinguish the models, we presented human subjects with a task in which evidence changed over the course of each trial. Our results are more consistent with “urgency gating” than with temporal integration of sensory samples and suggest a simple mechanism for implementing tradeoffs between the speed and accuracy of decisions.
Introduction
Research into the temporal aspects of decision making has provided support for a class of theories that may be called “sequential sampling,” or “bounded integrator” models (Stone, 1960; Laming, 1968; Ratcliff, 1978; Carpenter and Williams, 1995; Usher and McClelland, 2001; Wang, 2002; Mazurek et al., 2003; Reddi et al., 2003; Smith and Ratcliff, 2004; Wong and Wang, 2006; Bogacz and Gurney, 2007; Grossberg and Pilly, 2008). According to these models, simple decisions involve the sequential sampling of the sensory stimulus and temporally integrating the information present in that stimulus until a threshold (or “bound”) is reached, at which point the decision is taken (see Fig. 1 A). The rate of this integration is related to the quality of information present in the stimulus in favor of a given choice, and the threshold is related to motivational factors such as payoff, risk, and urgency (Reddi and Carpenter, 2000). Several variations of bounded integrator models exist, and they offer a remarkably successful account of behavior in simple decision tasks. In particular, assuming that the rate of integration is subject to variability, these models can explain error rates and distributions of reaction times (RTs) in a wide variety of tasks (Ratcliff, 1978; Carpenter and Williams, 1995; Reddi and Carpenter, 2000; Reddi et al., 2003; Smith and Ratcliff, 2004). Furthermore, recent neurophysiological data on decision tasks have provided evidence for accumulation processes in the superior colliculus (Munoz and Wurtz, 1995; Munoz et al., 2000; Ratcliff et al., 2003, 2007; Shen and Paré, 2007), the lateral intraparietal area (LIP) (Roitman and Shadlen, 2002; Leon and Shadlen, 2003), the frontal eye fields (Gold and Shadlen, 2000, 2003), and the prefrontal cortex (Kim and Shadlen, 1999). Finally, integration of samples toward a bound is reminiscent of the “sequential probability ratio test” (SPRT) (Wald, 1945; Bogacz et al., 2006), an optimal procedure for making decisions on the basis of information that arrives over time (Wald and Wolfowitz, 1948). Consequently, integrator models are now widely accepted as an explanatory mechanism for decision making in many simple tasks.
However, in most of the studies supporting integrator models, the information presented to subjects varied across trials but was constant during the course of each trial. As described below, in such conditions one cannot distinguish whether the buildup observed in neural activity (and inferred from behavioral data) is caused by temporal integration of sensory information or by a growing signal related to elapsed time. It is therefore conceivable that the neural and behavioral data from constantinformation tasks can be explained by an alternative model, in which neural activity is the product of current stimulus information and a growing signal related to the urgency for making a choice (see Fig. 1 B). Although this model, which we call “urgency gating,” behaves very similarly to integrator models in constantinformation tasks, it proposes a very different mechanism to explain neural activity and behavior. To distinguish the models, we presented human subjects with a decision task in which the information favoring one choice over another changed during each trial. Some of these results have previously been presented in abstract form (Puskas et al., 2007; Cisek et al., 2008).
Materials and Methods
Modeling formalism
Many variations of integrator models exist, but all may be formalized as follows: where x_{i} (t) is a putative neural variable corresponding to choice i, g is a scalar gain term, and E_{i} (τ) is some internal estimate of the sensory information, present at time τ, that constitutes evidence in favor of choice i. The variable x_{i} starts at some initial level x_{i} (0) related to the previous probability that choice i is correct and grows over time at a rate related to the sensory evidence until it hits a threshold T, at which time the subject commits to choice i. All integrator models share this basic set of assumptions. Where they differ is on how they define the evidence function E_{i} (t) and on how they determine the time at which the decision is taken (the “stopping rule”). For example, “independent race” models (Vickers, 1970) suggest that there exist separate variables x_{i} (t) for each option, each of which independently accumulates an estimate E_{i} (t) that is computed solely on the basis of sensory information in favor of that option. Whenever any of these independent processes reaches its threshold, the decision is made in favor of the corresponding option. In contrast, the “diffusion” model (Stone, 1960; Laming, 1968; Ratcliff, 1978; Ratcliff et al., 2003; Smith and Ratcliff, 2004) suggests that there is only a single variable x(t), which accumulates E(t), which is defined as the difference between sensory evidence for option A over option B. In other words, E(t) = E _{A}(t) − E _{B}(t). The decision is taken when x(t) either grows above a positive threshold +T, selecting option A, or falls below a negative threshold −T, selecting option B. Another variant (Shadlen et al., 1996) defines separate variables x _{A}(t) and x _{B}(t), each of which integrates independent evidence, and uses a stopping rule based on the difference of accumulated totals. Still another variant, called the “leaky competing accumulator” model (Usher and McClelland, 2001), proposes separate accumulators that mutually inhibit each other. Bogacz et al. (2006) showed that, under reasonable parameter choices, most of these models (with the exception of the independent race model) are functionally equivalent to the diffusion model. Furthermore, the diffusion model is formally equivalent to the SPRT and is very effective at reproducing human behavior. Below, we consider four variations of integrator models, all based on diffusion, using different definitions of what is meant by “sensory evidence.”
In addition to integrator models, we consider a different class of models, which are similar in that there is also a buildup of neural activity toward a threshold, but the buildup is caused by a timevarying gain (Ditterich, 2006), not by temporal integration. We call these urgencygating models, which can be expressed as follows: where u(t) is some function of time that is not related to evidence for any particular choice. This model (Fig. 1 B), suggests that neural activity x_{i} (t) is the product of the momentary evidence E_{i} (t) and a signal u(t), which reflects the growing urgency to make a response (Ditterich, 2006; Churchland et al., 2008). Although this model is very different from the integrator models (1), the two can be shown to be mathematically equivalent in the case of constant evidence tasks, as follows.
First, note that if E_{i} (t) is not a function of time, then it can be replaced by a constant E_{i} within each trial. Because a constant can be moved outside of an integral, Equation 1 can now be rewritten as follows: Similarly, for the urgencygating model, we can set E_{i} (t) to a constant and simply set urgency proportional to elapsed time, u(t) = t, and so Equation 2 becomes the following: To summarize, with the assumptions of constant evidence and linear urgency growth, the integrator and urgencygating models are equivalent. Therefore, for any task that presents subjects with constant evidence during each trial, the models make very similar predictions about neural activity and behavior. In both, the neural variable grows at a rate proportional to the subject's estimate of the strength of evidence, and reaches some threshold level of activity at the time the decision is made. In Equation 1, the growth is attributable to integration of sensory evidence over time (Fig. 1 A). In Equation 2, it is attributable to multiplication of momentary evidence by a growing urgency to respond (Fig. 1 B). Because both models propose a growth of activity at a rate proportional to evidence, both make similar predictions about how sensory evidence influences neural activity, error rates, and reaction time distributions.
The above derivation of Equation 3 does not address the issue of noise. However, it is clear that noise exists in the sensory signal as well as in the internal processes of sensory transduction and computation of decision variables. The presence of noise is one reason why temporal integration is seen as essential for decision making. However, it is not the only option. A lowpass filter can also deal with noise, without necessarily retaining all properties of pure integration such as a longlasting memory of past states. Therefore, we propose that the estimate of momentary evidence [E_{i} (t)] in the urgencygating model is lowpass filtered before it is multiplied by the urgency signal.
In the context of the comparison between temporal integration and urgency gating, it is useful to distinguish two kinds of noise in the decision process: (1) intratrial variability in E_{i} (t), which is attributable to momenttomoment neural activity fluctuations; and (2) intertrial variability in E_{i} , attributable to variations in levels of arousal, attention, etc., which are different from trial to trial but are relatively constant over the course of each trial. Analyses of reaction time distributions using “reciprobit plots” (Carpenter and Williams, 1995) suggest that the primary cause of variability in reaction times is intertrial variability (differences in arousal/attention) and that intratrial noise does not have a major impact at the behavioral level. This makes sense because during each trial, the brain can average across the uncorrelated activity fluctuations of many thousands of neurons (Shadlen et al., 1996), but it cannot average over changes in underlying baselines that vary between trials. Clearly, the presence of intertrial noise in E_{i} does not affect the derivation of Equation 3 and affects both kinds of models identically. That is not the case for intratrial noise, which results in a timedependent noise distribution in Equation 3 but a timeindependent distribution in Equation 4. Nevertheless, we conjecture that, if intratrial noise is relatively weak (as suggested by reciprobit plot analyses), and reaction times are relatively short, then this difference will be too subtle to distinguish in data from experiments that used constantevidence tasks.
How then can we more effectively distinguish between the integrator and urgencygating models? The key to doing so is to use experimental tasks in which the evidence for or against a given choice is changing over the course of an individual trial (Huk and Shadlen, 2005; Kiani et al., 2008). If evidence is changing, then E_{i} (t) is no longer a constant, cannot be taken outside of the integration, and therefore Equations 1 and 2 are no longer equivalent and now make distinct predictions about both behavioral and neural phenomena. The present experiment is aimed at testing some of these predictions using behavioral data from human subjects.
Experimental design
Twentytwo human subjects (ages, 18–60; 7 males, 15 females; 2 lefthanded) performed a reach decision task shown in Figure 2 A. All gave informed consent before the experiment, and the procedure was approved by the University of Montréal ethics committee.
Each trial began with a central circle (2.5 cm diameter) and two target circles (2.5 cm diameter) placed 180° apart at a distance of 5 cm from the center. Within the central circle, 15 small circular tokens were randomly arranged. Subjects began each trial by moving the cursor into the central circle. At this point, the tokens began to jump, onebyone every 200 ms (“predecision interval”), from the central circle to one or the other of the targets (50/50 chance). The subject's task was to move the cursor to the target that he/she believed would ultimately receive the majority of the tokens.
Importantly, the subject was allowed to make the decision as soon as he/she felt sufficiently confident. Once the choice was reported by moving the cursor into one of the targets, the remaining tokens jumped more quickly to their final targets. In separate blocks, this “postdecision interval” was set to either 20 ms (in “fast” blocks) or to 170 ms (in “slow” blocks). Subjects were asked to continue each block until they made a total of 70 correct choices, indirectly motivating them to optimize successes per unit time. Thus, the subjects were presented with a tradeoff: either be conservative and wait until all tokens have moved, when the decision can be made with confidence, or guess ahead of time, which is not as reliable but yields potential successes more quickly. For example, if the first five tokens all move into the same target, then the probability that that target is correct is 94.5%, so it is a good idea to take a guess at that point instead of waiting until the remaining tokens have moved. In fast blocks, such a guess would save the subject a total of 1800 ms over the strategy of waiting until the end, and would have only a 5% chance of failing.
Each subject completed four to six blocks, alternating between fast blocks, which encourage hasty behavior, and slow blocks, which encourage more conservative behavior. Subjects were told about the timing parameters for both blocks ahead of time and allowed to establish their own policies for trading off speed versus accuracy. On each trial, subjects were provided with feedback indicating correct or incorrect decisions, but there was no feedback or instruction regarding the timing of their choices.
The design of the task allowed us to calculate, at each moment in time, the “success probability” p_{i} (t) associated with choosing each target i (Fig. 2 B). If at a particular moment in time the right target contains N _{R} tokens, whereas the left contains N _{L} tokens, and there are N _{C} tokens remaining in the center, then the probability that the target on the right will ultimately be the correct one (i.e., the success probability of guessing right) is as follows: As far as the subjects knew, the correct target and the individual token movements were completely random. However, to test specific hypotheses about the dynamics of decision making, we interspersed among the fully random trials four specific classes of trials characterized by particular temporal profiles of success probability. Subjects were not told about the existence of these trials. For example, 15% of trials were socalled “easy” trials, in which tokens tended to move consistently toward one of the targets, quickly driving the success probability p_{i} (t) for each toward either 0 or 1. There were several variations of easy trials, and an example is shown in Figure 2 C, black line. Another 15% of trials were “ambiguous” (Fig. 2 C, gray line), in which the initial token movements were balanced, making the p_{i} (t) function hover near 0.5 until late in the trial. Another 10% of trials were called “biasfor” trials (Fig. 2 D, black line) in which the first three tokens moved to the correct target, the next three toward the opposite target, and the remaining ones resembled an easy trial. Another 10% were called “biasagainst” trials (Fig. 2 D, gray line), which were identical to biasfor trials except the first six token movements were reversed (i.e., during the early part of biasagainst trials, the bias was toward the wrong target). The remaining 50% of trials were fully randomized. Thus, the final distribution of trials was as follows: 50% random, 15% easy, 15% ambiguous, 10% biasfor, and 10% biasagainst. In all cases, even when the temporal profile of success probability of a trial was predesigned, the actual correct target was randomly selected on each trial. Over the four to six blocks of trials, each subject performed an average of 554 trials. Unless stated otherwise, analyses included both correct and error trials, and success probability was computed with respect to the target chosen by the subject.
Before the 15 token task described above, each subject also performed 20–40 trials of a simple choice reaction time task. This task was identical except there was only one token that moved from the center to one of the targets and the subject was instructed to respond as quickly as possible. We detected the time of movement onset and used that to determine each subject's mean RT. This provided us with an estimate of the sum of the delays attributable to sensory processing of the stimulus display as well as to response initiation, muscle contraction, etc. Then, in the 15 token task, we detected the time of movement onset and subtracted each subject's mean RT (from the choice reaction time task) to estimate the time actually used to make the decision—the “decision time”—as shown in Figure 2 B. We then used Equation 5 to compute the success probability at the time of the decision.
Of course, critical to interpretation of the data from this task is a precise definition of what is meant by sensory evidence. One reasonable definition is the following: “the information currently present within the visual stimulus that indicates which choice is more likely to be correct.” In our task, this evidence is related to the distribution of the tokens at a given moment in time, which determines the probability that one or the other target will ultimately be the correct one. This probability can be explicitly computed for a given choice i using Equation 5 to get p_{i} (t). Although we do not expect that subjects were able to explicitly compute Equation 5, it is conceivable that they were able to construct an approximation (see Results). An alternative definition of sensory evidence in our task is the new information provided by each token movement, in other words, “the change in the stimulus that favors one choice over another.” This can be explicitly computed as dp_{i} (t)/dt, and again we expect that subjects can construct a reasonable approximation.
With a given definition of sensory evidence, we can discuss several variations of integrator and urgencygating models that use sensory information to arrive at a decision.
Simulations
We compared human behavior to six putative models of decision making: four variations of integrator models and two variations of the urgencygating model. All of these were presented with the same kinds of trials that were presented to the human subjects (focusing on slow blocks only), and the same analyses were applied to their performance.
Model 1: pure diffusion, integration of currently available sensory information.
In this model, sensory evidence for a choice i is related to the success probability of that choice given the current distribution of tokens, and this quantity is integrated over time without any additional leak. The gain g is set to 1.5 and the term N(t) represents Gaussian noise with mean zero and SD of 3. The threshold was set at ±500. This model is equivalent to several previously described diffusion models (Stone, 1960; Laming, 1968; Ratcliff, 1978; Mazurek et al., 2003), which integrate the information that is present in the stimulus at each moment in time. It is also equivalent to the leaky competing accumulator model of Usher and McClelland (2001), in which the net leakage parameter k is equal to the competition strength parameter β (Bogacz et al., 2006).
Model 2: diffusion with leak, integration of currently available sensory information.
In this model, sensory evidence is defined as above, but there is a leak term (with parameter L = 0.0005, producing a strong leak) in the dynamic equation for x_{i} (t). This model is equivalent to the leaky competing accumulator model in which the leak k is stronger than the competition strength β.
Model 3: diffusion without leak, integration of novel sensory information.
In this model, sensory evidence is defined as the change in the success probability for a given choice. In other words, E_{i} (t) is nonzero only at the moment when a token moves and is zero inbetween token movements regardless of the current distribution of tokens. The gain g is set to 1.
Model 4: diffusion with leak, integration of novel sensory information.
Like model 3, above, this model also defines evidence as the change in probability, but includes an additional leak term as in model 2. However, the value of L is reduced to 0.0003, since any larger values make it nearly impossible for neural activity to ever reach the threshold.
Model 5: urgencygating model without filtering.
In this model, unlike models 1–4, the growth of activity is attributable entirely to the urgency term u(t), which for simplicity is here defined simply as elapsed time. The gain g is set to 0.4 and the noise SD is 0.2.
Model 6: urgencygating model with a lowpass filter.
Clearly, model 5 is very susceptible to noise, especially late in each trial. Thus, we propose a final model, which is similar except that sensory evidence is lowpass filtered before gating by urgency, as follows: This model is similar to model 5, except that sensory evidence is lowpass filtered using a linear differential equation with a time constant of τ = 200 ms. The gain g is set to 3 and the noise SD is 0.7.
Results
Behavioral results
In the choice reaction time task, the mean reaction times of subjects ranged from 214 to 416 ms (mean, 279 ms; SD, 45 ms). The mean RT of each individual subject was used to calculate that subject's decision times in the 15 token task. No significant decision time differences were found for choices made toward the right versus left target.
One of our first questions was to investigate whether subjects modify their decision policy as the timing parameters of the task are varied. To address this, we compared each subject's behavior in two conditions, each performed in separate blocks of ∼100 trials. In fast blocks, the interval between token movements was 200 ms before any of the targets was reached, and accelerated to 20 ms afterward. This encouraged guessing because subjects saved a lot of time by taking a chance and could try again almost right away. In slow blocks, the token interval accelerated to 170 ms after a target was reached. This encouraged more conservative behavior because the benefit of choosing early was reduced. As expected, most subjects behaved more hastily in the fast blocks and more conservatively in the slow blocks. This is shown in Figure 3 A for one subject, whose decision times in fast blocks (mean, 1105 ms) were significantly shorter (Kolmogorov–Smirnov test, p < 10^{−25}) than in slow blocks (mean, 1664). Furthermore, the success probability at decision time was lower in fast blocks than in slow blocks (Fig. 3 B) (KS test, p < 10^{−9}). Most subjects (20 of 22) showed significantly faster responses in fast versus slow blocks (Fig. 3 C) and nearly onehalf (9 of 22) had lower success probability in fast than in slow blocks (Fig. 3 D). This suggests that, in general, subjects adjusted their guessing policy to tradeoff speed versus accuracy.
Next, we asked whether the specific pattern of token movements observed during a particular trial has an effect on the subject's behavior. To do this, we first focused on comparing behavior in easy versus ambiguous trials. As expected, 19 of 22 subjects made decisions significantly later in ambiguous trials than in easy ones (Fig. 4 A,C) (KS test, p < 0.05). More interesting is that all of them also made decisions at a significantly lower level of success probability in ambiguous trials (Fig. 4 B,D) (KS test, p < 0.05). That is, subjects appeared more willing to guess in ambiguous trials than in easy trials.
Next, we compared behavior during biasfor and biasagainst trials (Fig. 2 D), focusing on trials in which decisions were made after the first six token movements. This comparison is of interest because the two classes of models (integrator vs urgency gating) make distinct predictions about the timing of decisions in these trials. In particular, because integrator models retain a “memory,” they suggest that, after the first six token movements, neural activity related to the correct target will be higher in biasfor trials than in biasagainst trials, and therefore closer to threshold. Therefore, these models predict faster decision times in biasfor trials than biasagainst trials. In contrast, the urgencygating models do not predict a significant difference.
Surprisingly, in agreement with urgencygating models, there was no significant difference between decision times in the biasfor and biasagainst trials (Fig. 5 A,C). This was found for 21 of 22 subjects (KS test, p > 0.05). The success probability at decision time also was similar in the two kinds of trials (Fig. 5 B,D) for all subjects (KS test, p > 0.05).
Figure 6 summarizes trends in comparing correct trials versus errors during the slow blocks. Across all trials, the mean decision time in error trials was longer than in correct trials for 11 of 22 subjects (KS test, p < 0.05), and the opposite was true for one subject (Fig. 6 A). Figure 6 B–E shows the same analysis restricted to the specific trial types. No consistent trends were found in these restricted analyses, except for the trivial observation that errors in biasagainst trials tended to occur before the seventh token movement (Fig. 6 E). More interesting was an analysis of the success rate as a function of the number of tokens that moved before a subject made their decision. Across all trials, the success rate was low for very fast decisions, increased later in the trial, and then decreased again (Fig. 6 F). This was partially attributable to the fact that subjects generally waited for >10 tokens only in trials that were more difficult (for example, the ambiguous trials) and in which success was closer to chance levels. Figure 6 G–J shows the same analysis restricted to the specific trial types. As expected, in all cases the success rate is clearly dependent on the pattern of token movements. For example, in biasfor trials most errors occur between the third and seventh token, and in biasagainst trials most errors occur before the seventh token.
Control analyses and simulations
Before interpreting our results with respect to particular decision models, it is important to consider whether subjects could have been using an explicit cognitive strategy to make their decisions. For example, could they have discovered the presence of the special trial types (easy, ambiguous, etc.) and then optimized their behavior to take advantage of this knowledge? We have several reasons to be confident that this was not the case. In particular, the subjects whose data are included here were never told about these trials, and during postexperiment interviews none of them reported detecting any special trials. Furthermore, even if subjects had been explicitly told to look for special trials, it would have been very hard to detect them: First, because these trials were interspersed among 50% of trials that were completely random, each special trial type was relatively rare. Second, there were several variations of each special trial type, making it difficult to memorize any specific pattern. Third, even among the random trials, there are many that may partially resemble segments of the special trials, making it extremely difficult to discover any category boundaries. Fourth, the actual correct target was always randomized, and this was a much more salient piece of information for subjects to think about. Fifth, the token movements were quite fast, making it very hard to keep track of the specific pattern. Again, none of the subjects were told about these trials and none reported finding them when interviewed at the end. Finally, if against these odds a subject had indeed discovered the categories, then that subject's behavior would clearly exhibit optimal strategies. For example, if a subject identified a given trial as biasfor or biasagainst, then he/she should always make a decision after exactly seven tokens (i.e., deciding between 1400 and 1600 ms), but this was not observed for any subject (Fig. 5 C).
Assuming that subjects did not use any explicit strategy to make decisions, we examined how their behavior may agree or disagree with several varieties of decision models. To do so, we performed a series of simulations using the six models described in Materials and Methods. We presented each model with 100 repetitions of easy, ambiguous, biasfor, and biasagainst trials, and compared their behavior (in terms of decision time and success probability at decision time) to human data.
Like all of the models tested, the diffusion model (model 1) (Fig. 7 A) correctly exhibited faster decisions in easy versus ambiguous trials (KS test, p < 0.01). It also correctly exhibited a lower average success probability in ambiguous versus easy trials (Fig. 7 A, left, green box) (KS test, p < 0.01). However, it incorrectly predicted faster decisions in biasfor versus biasagainst trials (Fig. 7 A, right, red box) (KS test, p < 0.01). The reason for this is that, immediately after the first six token movements (time, 1200 ms), the neural activity was higher, and closer to threshold, in biasfor trials than in biasagainst trials (Fig. 7 A, shaded region in the fourth panel).
This result may appear counterintuitive. It might appear that a diffusion model (Stone, 1960; Laming, 1968; Ratcliff, 1978; Smith and Ratcliff, 2004; Ratcliff et al., 2007) will predict the same timing in biasfor and biasagainst trials because it accumulates the difference in sensory evidence for the two options. However, note that, in biasfor trials, there is no moment in time at which evidence favors the wrong target (i.e., the success probability function never falls below 0.5), and so the difference in evidence is always positive (favoring the correct choice). If the sign of an integrated quantity is sometimes positive and never negative, then the final result of the integration will be greater than zero. In contrast, for the first six tokens of biasagainst trials, the evidence always favors the wrong target, so the difference in evidence is always negative, and therefore the result of integration after six tokens will be less than zero. Therefore, any model that integrates differences in currently available sensory information will predict earlier responses in biasfor than in biasagainst trials. Note that all of our simulations include substantial intratrial noise (as opposed to intertrial noise) to demonstrate that our conclusions hold even if that is the only source of noise in the system. It is trivial to show that, if intertrial noise is the major source of variability, then our conclusions will only be stronger.
Next, we investigated whether the addition of a large leakage term to the diffusion model (i.e., yielding model 2) may reduce the difference in behavior between biasfor and biasagainst trials. In this view, by the time the decision is made, differences between the accumulated activities in the early part of biasfor and biasagainst trials would have decayed away, and behavior would be similar in both kinds of trials. However, even a very strong leak (so strong that the model had difficulty ever reaching the decision threshold) did not eliminate the difference between the trials (Fig. 7 B, right, red box). Nevertheless, we examined our human data to see whether a strong leak could explain it. In particular, we looked at decision times from a subset of biasfor and biasagainst trials in which a subject made the decision within 400 ms after the sixth token movement. These early decisions might still retain some bias that has not yet decayed away. However, we did not see any significant differences between decision times for biasfor versus biasagainst trials in any of the 16 subjects who made enough of these fast choices to make the comparison possible (mean difference, −9 ms; SD, 68 ms). When we limited the analysis to the first 200 ms after the sixth token movement (six subjects made at least a few of these very early choices), we again saw no significant differences (mean difference, −3 ms; SD, 23 ms). Thus, even a very strong leak term cannot explain our results.
Is it possible to explain our results if we postulate that the integrators can get “reset”? In other words, suppose that subjects can recognize the condition of complete ambiguity (when there are three tokens in each target) as a special case and reset their neural integrators back to baseline. Since beyond that point the biasfor and biasagainst trials are identical, then so would be the behavior. To evaluate this possibility, we looked among the random trials for variations of biasfor and biasagainst trials in which the first few token movements were not three and three. Specifically, we identified “biasupdown” trials as ones in which the first three tokens moved in the correct direction and then one moved in the opposite direction, and “biasdownup” trials as those in which the first token moved in the wrong direction and the next three in the correct direction. We also constrained the trials such that the profile of success probability was approximately the same for the remainder of the trial. Critically, as shown in Figure 5 E, success probability in biasupdown trials never reached the critical value of 0.5 that could potentially trigger a reset of the integrators. Nevertheless, the reaction time distributions of biasupdown trials were not faster that those of biasdownup trials (Fig. 5 F).
In summary, no model that temporally integrates currently available sensory information can explain our results. Therefore, we next considered models that do not integrate the currently available evidence, but rather, integrate new information provided by novel sensory events (i.e., token movements, or the change in sensory information). Indeed, a diffusion model in which the change in sensory information was being integrated (model 3) did correctly produce similar decision times in biasfor versus biasagainst trials (Fig. 7
C, right, green box) (KS test, p > 0.1). However, it in turn failed to reproduce the result shown on Figure 4
B, because its success probability was always the same in ambiguous and easy trials (Fig. 7
C, left, red box) (KS test, p > 0.1). The reason for this is straightforward: if the neural activity is an integral of the derivative of some quantity
In contrast, both versions of the urgencygating model reproduced these critical results. The model without a lowpass filter (model 5) correctly reproduced the lower success probability in ambiguous versus easy trials (Fig. 7 D, left, green box) (KS test, p < 0.01) as well as similar decision times in biasfor and biasagainst trials (Fig. 7 D, right, green box) (KS test, p > 0.1). However, it is highly susceptible to noise as evident from plots of example neural activity patterns. As discussed above, the brain can overcome such noise on each trial by averaging over a large number of neurons with uncorrelated fluctuations of activity (Shadlen et al., 1996). Such a process may be approximated by the addition of a lowpass filter to the model, which effectively reduces the gain of intratrial noise. As shown in Figure 7 F, the addition of a lowpass filter does not appreciably change the behavioral results, and the model still correctly produces lower success probability in ambiguous versus easy trials as well as similar decision times in biasfor and biasagainst trials.
Note that, if the model that integrates novel sensory information (model 3) is gated by a growing urgency function, then it will effectively become a version of the urgencygating model (model 6). This is because integrating the derivative of some signal with an integrator that has a short time constant is similar to lowpass filtering that signal. Thus, we can conclude that our results support models in which an urgency signal multiplicatively gates a filtered estimate of current evidence, and that one way to compute that estimate is through relatively fast integration of novel sensory events.
Decreasing accuracy criterion
If we suppose that the urgencygating model accurately describes the process underlying decision making in our task, then we can make an additional prediction about the level of confidence at which subjects will be making decisions across all trials, not just the special trials emphasized in Figures 4 and 5. In particular, if we make an educated guess about the E(t) function that is actually used by our subjects, then we can predict that its value at the time of the decision should decrease as a function of decision time. The reason can be seen by setting Equation 2 equal to a constant threshold T and solving for E_{i} (t) as follows: Of course, we cannot truly know the exact form of the E(t) function used by our subjects. It would be difficult to believe that subjects can precisely calculate Equation 5, but we can expect that they can make a reasonable estimate. For example, a simple “firstorder” estimate of sensory evidence is the sum of loglikelihood ratios (SumLogLR) of individual token movements as follows: where p(e_{j} S) is the likelihood of a token event e_{j} during trials in which the selected target is correct, and p(e_{j} U) is its likelihood during trials in which the unselected target is correct. Although this may at first appear complex, it simply amounts to counting the number of tokens which move in each direction. This expression for E_{S} (t) can then be used to estimate the posterior probability of target S being correct using the following relationship: Strictly speaking, this estimate of probability is wrong. It ignores the conditional probability between sequential token movements and the correct response: That is, the likelihood p(e _{1},e _{2}S) is not simply the product of p(e _{1}S) and p(e _{2}S) because e _{1} and e _{2} are conditionally dependent. To compute an accurate estimate of p(t), Equation 23 would have to take that conditional probability into account. Nevertheless, this firstorder estimate actually does quite well for the first 10 token movements: For those first 10 tokens, the estimate provided by Equation 23 linearly correlates with the real success probability computed using Equation 5 with a slope of 0.82 and an R ^{2} of 0.99 (p < 0.001).
With a reasonable estimate of how subjects may compute E(t), we set out to test whether the value reached by this quantity at decision time decreases as a function of time, as predicted by the urgencygating model. To do so, we grouped trials according to the number of tokens that moved before the decision time and calculated the value of E(t) for the selected target at the time of the decision. The result for one subject is shown in Figure 8 A. A simple linear regression through the data shows a significant fit (p < 10^{−10}) with a negative slope. A significant regression was found for 16 of 22 subjects (Fig. 8 B), and 15 of these had a negative slope. In summary, there was a trend for later decisions to be made at a lower level of E(t) than decisions made early in the trial, consistent with the predictions of the urgencygating model. Furthermore, the slope tended to be shallower (Fig. 8 C) (paired t test, p < 0.01) and the yintercept lower (Fig. 8 D) (paired t test, p < 0.01) during the fast blocks than the slow blocks, suggesting that, on average, the urgency signal follows a different timecourse during the two blocks and thus controls the tradeoff between speed and accuracy of decisions. This finding further predicts that the level of confidence that subjects have in the decisions they make should decrease with longer decision times.
Discussion
Several models propose that simple decisions involve the temporal integration of sequential sensory samples until a threshold is reached (Stone, 1960; Laming, 1968; Ratcliff, 1978; Carpenter and Williams, 1995; Usher and McClelland, 2001; Wang, 2002; Mazurek et al., 2003; Grossberg and Pilly, 2008). These models explain error rates and reaction time distributions, and are supported by neurophysiological studies showing buildup activity in a number of brain structures during decisionmaking tasks (Munoz and Wurtz, 1995; Kim and Shadlen, 1999; Gold and Shadlen, 2000; Roitman and Shadlen, 2002; Ratcliff et al., 2003, 2007; Shen and Paré, 2007).
The present study, however, prompts us to reconsider two aspects of these models. First, we propose that evidence for a given choice should not be computed by temporally integrating the information currently present in the stimulus. Instead, it should involve either summation of only new information (provided by a change in the state of the stimulus) or be a lowpassfiltered signal related to the state of the sensory information. This is consistent with studies suggesting that the time window of integration for perceptual decision making is on the order of 100 ms (Ludwig et al., 2005; Ghose, 2006). Second, we suggest that the long buildup of neural activity in constantevidence tasks is not caused by an integration process but is primarily attributable to a growing urgency signal that is unrelated to any particular choice.
An influential argument in favor of integrator models arises from their similarity to the SPRT (Wald, 1945), a statistical test for deciding whether current evidence for a given hypothesis is sufficient to ensure a criterion level of accuracy. Because the SPRT can be performed through summation of independent pieces of evidence and comparison to a threshold, it has been suggested that it is effectively implemented by integrator models (Bogacz et al., 2006; Bogacz and Gurney, 2007; Gold and Shadlen, 2007). However, there is a difference between how probability is calculated and how it is traded off against time. The SPRT is optimal in the sense of producing the best accuracy after a given time, or requiring the fewest number of samples to reach a given accuracy (Wald and Wolfowitz, 1948), but it does not implement a tradeoff between time and accuracy. Animals cannot afford to have a fixed threshold of accuracy but must be willing to tolerate lower success rates to reduce the time spent in making a decision (Chittka et al., 2009). An accuracy criterion that decreases over time accomplishes this and can be implemented through a multiplication of evidence by a growing urgency signal and comparison with a constant neural threshold.
Furthermore, the similarity between the diffusion model and the SPRT only holds if the sequential samples of information are statistically independent (Bogacz et al., 2006), which is not the case in most tasks that have been studied. In particular, if a sample is already predicted by previous samples, then it should be ignored. In other words, if integration takes place, then it should be integration of novel sensory information, and not of the information present in the stimulus at a given time. Because in constantevidence tasks novel information only arrives at cue presentation, the integration should result in a step function of neural activity (or a fast saturation in the case of noisy information). It should not resemble the long growth of activity observed in neural studies and inferred from reaction time distributions. That growth, we propose, is primarily attributable to an urgency signal that implements a tradeoff between speed and accuracy.
There is already widespread evidence for buildup signals in many brain regions and many experimental paradigms. For example, neural activity related to elapsed time has been reported in prefrontal cortex during duration reproduction (Jech et al., 2005), and in LIP during time interval (Leon and Shadlen, 2003) and motion discrimination (Churchland et al., 2008). During instructed delays with different possible durations, as each of the likely GO signal times approaches there is a buildup of neural activity in LIP during saccade tasks (Janssen and Shadlen, 2005), and in motor cortex during reaching tasks (Renoult et al., 2006), with corresponding changes of corticospinal excitability (van Elswijk et al., 2007). More generally, buildup activity has been reported in a variety of brain regions even during motor tasks that do not involve any decisions (Hanes and Schall, 1996; Munoz et al., 2000; Ivry and Spencer, 2004; Roesch and Olson, 2005; Tanaka, 2007; Thomas and Paré, 2007; Lebedev et al., 2008). It is therefore reasonable to suggest that such buildup can influence decisionmaking processes, which appear to involve the same structures that are involved in sensorimotor control (Glimcher, 2003; Romo et al., 2004; Cisek and Kalaska, 2005; Gold and Shadlen, 2007).
We cannot know whether our conclusions can be applied to other studies. It is likely that decisionmaking mechanisms are at least partially task dependent (Ghose, 2006). Nevertheless, urgency gating provides a simple potential explanation for data from the well known motion discrimination tasks. During these tasks, neurons in the medial temporal area (MT) reflect motion evidence very rapidly, equilibrating to a relatively steady coherencerelated firing rate within 150–200 ms (Britten et al., 1992). This is compatible with a temporal filter model (Ludwig et al., 2005) in which MT activity is a filtered version of the noisy motion signals arriving from earlier visual areas. During a “fixed duration” (FD) version of the task (Fig. 9 A), in which the monkey must report its decision only after an external GO signal, LIP activity equilibrates at a coherencedependent firing rate in ∼300 ms, suggesting that no additional integration of the MT signal takes place (Roitman and Shadlen, 2002) (see also Shadlen and Newsome, 2001). However, during a RT version (Fig. 9 B), in which the monkey can report its decision at any time, LIP activity continues to grow for as long as 800 ms (Roitman and Shadlen, 2002). Furthermore, the apparent threshold of LIP activity for saccade initiation in the RT task (60–70 Hz) is higher than the activity at which the same cells equilibrate in the FD task (<50 Hz), despite the fact that performance in both conditions is comparable. What can explain this task dependence of LIP activity?
Previous models have suggested that, during the FD task, the LIP signal saturates because there is a large leak term (Grossberg and Pilly, 2008) or because a threshold is crossed and activity stops at its current level after a delay (Mazurek et al., 2003). The urgencygating model suggests a straightforward alternative: In the RT task, monkeys are allowed to control the tradeoff between speed and accuracy, and do so through a growing urgency signal. This, multiplied by the coherencedependent MT input, produces LIP activity that exhibits a coherencedependent growth rate. In the FD task, monkeys must ensure that LIP activity does not reach the saccade threshold prematurely, so the urgency signal is low and constant until the GO signal. This, multiplied by the MT input, would yield coherencedependent but relatively constant LIP activity, as observed.
Recently, Ditterich (2006) showed that the behavioral and neural data from the RT task can only be explained by models that include a gain that grows linearly with time. Importantly, he showed that, as long as there is a growing gain, then it is not strictly necessary for the model to involve any type of temporal integration. His analysis is therefore compatible with the urgencygating model. Although Ditterich favored the presence of integration on the grounds of signaltonoise considerations, those considerations can also be met if MT signals are lowpass filtered before arriving in LIP, as in model 6.
Nevertheless, we cannot claim that urgency gating can explain the variety of results observed during the motion discrimination tasks. For example, Kiani et al. (2008) presented brief motion pulses to monkeys performing the FD task, finding a short window for motion integration (∼300–350 ms). However, Huk and Shadlen (2005) presented similar motion pulses during the RT task and obtained a much longer window. It remains unclear why such different results were obtained in these two studies.
These caveats aside, our data propose a reconsideration of the central concept of many recent models of decision making—buildup of activity through temporal integration. Summation of evidence from sequential samples is a good way to estimate the posterior probability of success for a given choice, and the brain may indeed use something like that when it is appropriate [e.g., in the present task or the task of Yang and Shadlen (2007)]. However, such a process should only sum novel information and therefore may not be responsible for the longduration growth of neural activity observed in many neurophysiological experiments and inferred from behavioral data in constantevidence tasks. Instead, we propose that this growth of activity is primarily attributable to multiplication of sensory evidence (which may be computed through a lowpass filter or through integration of the change in the stimulus) with a motor initiationrelated buildup signal. In this view, what determines the timing of actions is not the termination of pure decisionmaking processes followed by movement preparation and execution. Instead, the strength of evidence for a given choice is combined with a motor signal related to the urgency to make a choice, and it is the two together that turn decisions into action.
Footnotes

This study was supported by the Natural Sciences and Engineering Research Council of Canada, the EJLB Foundation, the Faculty of Medicine summer student program (Université de Montréal), and an infrastructure grant from Fonds de la Recherche en Santé du Québec. We thank Drs. Yoshua Bengio, Andrea Green, John Kalaska, Galen Pickard, and David Thura, and two anonymous reviewers for helpful comments on this work.
 Correspondence should be addressed to Dr. Paul Cisek, Département de Physiologie, Université de Montréal, C.P. 6128 Succursale Centreville, Montréal, QC H3C 3J7, Canada. paul.cisek{at}umontreal.ca