Figure 2. The optimal behavior by dynamic programming, and diffusion model implementation. *A*, Finding the optimal behavior requires trading off the expected reward for immediate decisions with the cost and expected higher reward for later decisions. Assuming a fixed time *t* after stimulus onset, a reward of 1 for correct decision and 0 for incorrect decisions, the figure shows the total expected future costs and rewards [that is, the expected return *V*(*g*,*t*)] from time *t* onward, for different beliefs *g* and actions of the decision maker. The green line represents the expected reward, max{*g*,1 − *g*}, for deciding immediately and corresponds to the belief that *H*_{1} (right half of graph) or *H*_{2} (left half of graph) are the correct decision. Instead, if the decision maker accumulates more evidence, her expected return (that is, confidence), 〈*V*(*g̃*, *t* + δ*t*) | *g*, *t*〉_{g̃} taking future rewards and costs into account, will increase (red line). However, accumulating more evidence also comes at an immediate cost of *c*(*t*)δ*t*, reducing its expected return (orange line). The optimal strategy is to choose the action that maximizes the expected return, such that the decision maker ought to accumulate more evidence as long as the associated expected return (orange line) dominates that of the expected return for making decisions immediately (green line). This partitions the belief space into three parts: as long as the decision maker's belief is between 1 − *g*_{θ}(*t*) and *g*_{θ}(*t*) (orange line above green line), the decision maker accumulates more evidence. Otherwise (green line above orange line), *H*_{1} is chosen if *g* ≥ *g*_{θ}(*t*), and *H*_{2} if *g* ≤ 1 − *g*_{θ}(*t*). *g*_{θ}(*t*) changes over time as (1) the cost function might change over time, and (2) the expected return for accumulating more evidence depends on how the decision maker expects to trade off costs and reward in the future, for times after *t*. *B*, The optimal behavior can be implemented by a diffusion model with time-varying boundaries {−θ(*t*),θ(*t*)} in particle space, corresponding to the bounds {1 − *g*_{θ}(*t*), *g*_{θ}(*t*)} in belief space. The particle location *x*(*t*) is determined by integration of momentary evidence δ*x*. As soon as the particle hits the upper bound θ(*t*) [lower bound, −θ(*t*)], *H*_{1} (*H*_{2}) is chosen. Five particle trajectories with fixed drift μ are shown, three of which lead to the—for this drift correct—choice of *H*_{1} (shown in yellow).