Introduction

Choosing between two or more available options is an activity extended in time and during that time, decisions evolve. The empirical literature on decision making distinguishes between perceptual and value-based decisions. Perceptual decisions (e.g., “Did I just smell coffee?”) typically become more accurate the longer they take, up to some threshold (for reviews, see1,2,3) and approximate optimality4,5,6. Value-based decisions (e.g., “Would I like a cup of coffee?”), in contrast, exhibit a number of well-known idiosyncrasies7,8,9,10,11. To date, these effects have, in the main, been explained by theories concerned with the distribution of outcomes based on utility12,13. Recently, however, there has been a shift towards the development of models, some informed by perceptual choice models, that attempt to characterize processes occurring during decision making14,15,16,17,18,19,20,21,22,23,24. Evidence of these processes has been derived from patterns of accuracy and response time, but there has been relatively little research on the behavioral correlates of these processes. The current paper reports an investigation of sub-decision dynamics and a new visualisation method to facilitate theoretical development in this area.

Important features of decision dynamics can be unveiled by tracking the motor execution that accompanies it. Specifically, researchers have examined characteristics of response trajectories (the path from initiation of a response to its completion) for decisions under varying conditions. Studies have employed a variety of response equipment: the computer mouse25,26,27,28,29,30,31, the Nintendo Wii remote32,33 and a motion-tracking system fixed to digits (34 see35,36,37 for reviews). For example, Dale et al.26 found that participants showed slower and longer response trajectories when they had to choose between “MAMMAL” and “FISH” to categorize “sea lion” than to categorize “salmon” or “lion”. In social categorization, participants take longer and show more divergent trajectories when categorizing relatively androgynous faces (with ambiguous sex features) as “MALE” or “FEMALE,” than when categorizing more typical male or female faces29. In these cases, categorizing more typical exemplars was easier (faster and more direct) than categorizing exemplars that were atypical or more similar to the incorrect category, a characteristic clearly observable from the response trajectories. These findings are not limited to categorization. Spivey et al.25 demonstrated that phonological competition between available choices was observed in response trajectories. When asked to “Click the Candle”, participants' responses were slower and less direct when required to choose between a picture of a candle and a candy, than when asked to choose between a picture of a candle and a pickle.

In line with some models of perceptual decision making, the foregoing studies suggest that early in a decision, distributed neural representations may be partially consistent with multiple outcomes35,38,39. As time elapses, information accumulates until these unstable patterns of neural activation dynamically evolve into more stable patterns associated with one of the available outcomes. Competition between outcomes increases the duration of the early unstable phase during which multiple responses remain possible. This gives rise to deflection in response trajectories from the shortest direct path. For example, in a study of attitudes, Wojnowicz et al.39 found that participants responded with greater deflection when responding in opposition to social stereotypes. In addition, in these trials, velocity exhibited early instability followed by a compensatory increase in velocity, which was predicted by a computational model of decision making40,19. Indeed, the gradual accumulation of information (sequential sampling) that may underlie the action dynamics effect is a common feature of many current decision making models41.

The foregoing literature suggests that great insights may be gained by tracking decisions as they unfold. To date, researchers have investigated neural activity during decision making in an attempt to characterize the evolution of perceptual42,43 and value-based44,45,46 decisions, but few studies have investigated changes in behavior during value-based decisions47,48,12. Koop and Johnson examined response trajectories during a version of the Iowa Gambling Task and found that response trajectories towards Good decks (rewards greater than losses) gradually became more direct across blocks of decisions, whereas response trajectories towards Bad decks (losses greater than rewards) did not.

One way to conceptualize how we make decisions is to consider the available outcomes as attractors in a decision space49,35. We make a decision when our behavior reaches the vicinity of one of these attractors. Killeen49 provided an account of conditioning as basic physics within which he proposed that behavior could be understood as movement in a behavior space in which incentives or reinforcers lie at centers of basins of lowered potential. Responses that are reinforced become more probable and faster over time as the trajectories in behavior space are warped through contact with these basins. In a similar vein, Spivey and Dale35 provided a model of decision dynamics that proposed that response trajectories in a binary choice may be understood as a path through a two-well attractor landscape. This landscape, the decision space, is an expression of the cognitive evaluation of the available outcomes in a particular decision and the motor execution of the response influenced by this evaluation. More recently, dynamic models of perceptual decision making inspired by neural models also emphasise the role of attractor dynamics in decision making50,24,51. From this perspective, competing outcomes (i.e., valued choices) induce divergent, slow and longer action dynamics when those outcomes are near equal in their potentiality.

In the experiments described here, participants played a simple game to gain points, which constituted a forced-choice simple discrimination task. In each experiment, some choices earned more points than others and participants gradually learned which choices earned them the most points. Each decision provided two choices and, in the majority of decisions, one choice earned more points than the other (High/Low, e.g., 20 points/5 points). In some decisions, participants were required to decide between two choices that were both worth the same value, either a specific higher-point value (High/High, e.g., 20 points/20 points) or a specific lower-point value (Low/Low, e.g., 5 points/ 5 points). The relative reward in both of these decision situations is 1, because the available choices in both cases earn precisely the same number of points (20/20 = 5/5 = 1.0). In light of the research summarized thus far, one might expect that, with equipotential response options, (a) there ought to be competition in both situations and (b) the degree of competition should be similar. In actuality, the decision dynamics, both within and across decisions, were markedly different in these two conditions.

Results

Across three experiments, the ratio of the high-point reward to the low point reward was increased: Experiment 1 (7/5; n = 34), Experiment 2 (10/5; n = 37) and Experiment 3 (20/5; n = 55). Learning was faster and more reliable as the high-point value increased. On average, participants learned more quickly to choose the higher of the two options in the 20/5 experiment than in the other two experiments; after just 12 decisions, participants chose the high-value symbol on over 80 percent of High/Low decisions. By halfway through the experimental session (18 decisions), the high-value symbol was chosen reliably in High/Low decisions in all three experiments (see Fig. 1, bottom-left panel). The high-point value also affected the proportion of participants who learned to consistently choose the High options. The bottom-right panel of Fig. 1 provides the distribution of high-point/low-point choice ratio across participants in each experiment. The log2 of the ratio of the probability of choosing the high choice to the probability of choosing the low choice on any High/Low decision (i.e., log2(pHigh/pLow)) is employed as a measure of the degree to which the participant reliably chose High options in High/Low decisions (in the literature on basic learning principles, the log2 ratio is employed as an index of relative allocation of behavior to two independent responses and it has been shown to be sensitive to relative magnitude of reinforcement52.

Figure 1
figure 1

Experimental task and learning data.

The top panel provides a schematic of the experimental procedure. The lower left panel depicts mean probability of choosing a High option in a High/Low decision across participants on consecutive blocks of 6 decisions in each of the three experiments (error bars denote standard errors). The shaded area indicates probability below 80%, the threshold used to infer that participants learned to choose the High option and the dashed line indicates 50%, the choice probability expected by chance. The lower right panel is a bubble plot of sensitivity to relative reward measured by the log2 probability of choosing a High option. As relative reward increased (the horizontal axis), more participants reliably chose the High choice in High/Low decisions. The size of each point is determined by the number of participants that obtained that value controlled for the number of participants in that experiment (i.e., probability density of that value within each experiment). The shaded area and black dashed line correspond to the same values as in the left panel. The blue dashed line indicates the mean log2 probability at each level of relative reward.

Mean log2(pHigh/pLow) increased as relative reward increased across experiments. There was a weak but significant positive correlation (r(126) = 0.22, p = .012) between the log proportion of High to Low responses, log2(pHigh/pLow) and the log proportion of reward magnitudes, log2(MHigh/MLow). We employed a series of binomial linear mixed effects models to further analyse this effect (see Table 1). In line with the previous observation, the addition of Experiment as a predictor improved the intercept-only model. In this model, accuracy was significantly lower in the 7/5 experiment than the 20/5 experiment, b = −0.5800, z = −2.589, p = .0096 and it was marginally significantly lower in the 10/5 experiment than the 20/5 experiment, b = −0.4086, z = −1.842, p = .0655. The model was improved by adding the learning effect across decisions and log transforming the effect of trials/decisions better fit the ceiling effect on accuracy that can be seen in Fig. 1. In the final model, log(Trial) was a strong significant predictor, b = 1.1373, z = 11.9905, p < .0001, but the interaction effect of Experiment and log(Trial) was not significant. There no significant difference in the change in accuracy across trials between the 20/5 experiment and the 10/5 experiment, b = −0.1454, z = −1.0232, p = .3062, or between 20/5 and 7/5, b = −0.0666, z = −0.4635, p = .6430.

Table 1 Comparison of fits of binomial linear mixed effects models to predict High choices in High/Low decisions

Data selection for trajectory analysis

During decisions, the positional (pixel) coordinates of the response trajectories were recorded. In line with previous work on action dynamics (e.g.25), we then assumed a coordinate system in which the initiation of each decision trajectory constituted the origin. The horizontal (x) and vertical (y) axes of the computer screen constituted the dimensions of this ‘decision space'.

In the analyses of the response characteristics, we filtered out participants who did not appear to learn. Only participants who chose the high-value symbol in High/Low decisions on 80% of the last 12 decisions (n7/5 = 26, n10/5 = 26, n20/5 = 48) in each experiment were included. These participants were deemed to have demonstrated that they had learned the crucial distinction in the experiment. The remaining participants were considered not to have learned the High/Low distinction well enough for their responses to be comparable to those who had. The greatest per-experiment proportion of participants satisfied this response requirement in the 20/5 experiment (88%), followed by the 7/5 experiment (77%) and then the 10/5 experiment (71%). The fact that a greater proportion of participants reliably learned the distinction in the 7/5 experiment than the 10/5 experiment was unexpected as overall learning (mean pHigh) was better on average in the 10/5 experiment. In the following analyses, we also excluded trajectories in which participants chose a low-value symbol in a High/Low decision, as we considered such decisions to be errors and errors are known to exhibit different response characteristics (e.g., reaction times16) from expected or correct decisions.

Reaction time

Reaction time patterns were not affected by the increase in relative reward across experiments. In the first 12 decisions, reaction times were relatively similar across decision types. In High/Low and High/High decisions, reaction times gradually decreased throughout the experiments, but, in Low/Low decisions, reaction times remained consistently high. To examine the effects of decision type on reaction times across trials, we fit a series of linear mixed effects models, using maximum likelihood estimation on log-transformed reaction times (see Table 2). There was no significant effect of relative reward (Experiment) on log reaction time. Reaction time decreased significantly across decisions, b = −0.0087, t = −14.98, p < .0001, but there was no main effect of decision type. The change in reaction times across Low/Low decisions was significantly different, b = 0.0081, t = 6.75, p < .0001, from the change across High/Low decisions, the change across High/High decisions was not b = −0.0004, t = −0.38, p = .7063. That is, as can be seen in Fig. 2, exposure to the learning context reduced the reaction times of High/Low and High/High decisions, but not Low/Low decisions.

Table 2 Comparison of fits of linear mixed effects models used to predict log transformed reaction time
Figure 2
figure 2

Measures of choice conflict across blocks of 12 decisions.

L/L denotes Low/Low, H/L denotes High/Low and H/H denotes High/High decision types. (A) The top panel depicts the mean reaction time and bootstrapped confidence intervals (calculated using the R package ggplot259) in each block of 12 trials in each experiment. (B) The lower panel depicts the same values for maximum deviation, the furthest point in a trajectory from the straight line from the initiation of a response to its completion.

Maximum deviation

For each trajectory, we extracted how much a trajectory deviated from the endpoint response option. This is an additional measure of cognitive conflict: If a trajectory has high maximum deviation, it means participants are moving their computer mouse closer to the alternative response. With a lower maximum deviation, it reflects a more direct movement towards their choice (exhibiting less conflict and more confidence). Similarly to the reaction times, maximum deviation was relatively similar across decision types in the first 12 decisions and then decreased in High/Low and High/High decisions in the remaining decisions. Maximum deviation during Low/Low decisions remained consistently high throughout the experiment with a slight increasing trend. To statistically investigate these patterns, we also used a series of linear mixed effects models (see Table 3). Findings were similar to those previously found for reaction times. There was no significant effect of relative reward (Experiment) or main effect of Decision type, but maximum deviation decreased significantly across decisions, b = −0.7594, t = −3.26, p = .0011. The change in maximum deviation across Low/Low decisions was significantly different, b = 1.6263, t = 3.37, p = .0008, from the change across High/Low decisions, the change across High/High decisions was not, b = 0.1921, t = 0.40, p = .6901. As participants progressed through the experiment, the deflection towards the unchosen symbol decreased in High/Low and and High/High decisions suggesting reduced conflict between choices. This did not occur in Low/Low decisions, suggesting persistent conflict in these decisions throughout the experiment. Results, by experiment, are displayed in Fig. 2.

Table 3 Comparison of fits of linear mixed effects models used to predict maximum deviation

Horizontal velocity

Within a decision, the available outcomes were presented at the extreme left and right of the experimental display. As a consequence, competition between the available outcomes may be expressed in the horizontal component of response velocity (). To further analyze the competition between outcomes, point to point velocity in the x direction was calculated and then interpolated to 101 time steps to normalize response time and provide an average velocity profile in the last 18 decisions for each condition in all three experiments (see upper panel of Fig. 3). In all three experiments, horizontal velocity towards the eventual choice was slower to increase in the Low/Low condition than in the other two conditions, providing evidence of persistent early conflict during these decisions. It is worth remembering that the mean RT of Low/Low decisions was significantly greater than High/Low or High/High decisions, so the evolution of velocity seen in this figure underestimates the differences in real time (Low/Low decisions took approximately 300 ms longer in real time on average). In binned segments of trajectory (Fig. 3, lower panel), horizontal velocity during the third quintile (40–60%) was significantly lower (p < .05) in Low/Low than both High/Low and High/High in all three experiments. Otherwise, the evolution of x velocity within decisions was quite similar for equivalent conditions across the three experiments.

Figure 3
figure 3

Mean horizontal velocity across 100 time steps in the final 18 response trajectories.

(A) The top panel shows the evolution of horizontal component of velocity towards the ultimate choice in the response trajectory. Velocities in the x direction were calculated between consecutive points in each trajectory. (B) The velocity time series were interpolated into 100 time steps and then the mean velocity at each time step calculated to depict the time-normalised evolution of velocity. The lower panel shows mean horizontal velocity and bootstrapped confidence intervals during consecutive quintiles of 20 time steps.

Response trajectory shape

If it is harder to make a given response because participants are experiencing conflict, then the trajectories of those responses should be more divergent: They should spend more time between choices and in some cases may even move towards the alternative choice before making a response. We interpolated each trajectory into 101 time steps so they could be overlaid into an average plot (see25,26). Trajectories of decisions to the left-hand choice were reflected in the vertical axis at the origin so that all trajectories ended at the right-hand choice for ease of comparison. The final 18 decisions were used because, by this point in all three experiments, participants were choosing the high point choice on 80% of all High/Low decisions. This is shown separately for each experiment in the lefthand column of Fig. 4. In all three experiments, trajectories during Low/Low decisions were very different to those observed in High/Low and High/High decisions. In decisions that included a high-point choice, trajectories were relatively direct, whereas trajectories in Low/Low decisions exhibited considerably larger deflections towards the unchosen symbol. These deflections suggest stronger and more persistent competition between the available response options during Low/Low decisions.

Figure 4
figure 4

Interpolated mean trajectories and decision spaces for blocks of 12 trials within each experiment.

(A) The left column depicts the 20% trimmed mean trajectories for the final 18 trials in each experiment. (B) Surface plots depict inferred potential fields based on momentary velocities and accelerations derived from positional coordinates within response trajectories (see text for details). The positions of the circles depicted on the decision spaces indicate the approximate starting point of trajectories (i.e., 0,0) and the colors of the circles denote the decision type.

The most conspicuous difference was between High/High and Low/Low decisions, in which the relative magnitude of the points available for each choice was the same. In a statistical comparison of x coordinates (paired t tests) at each interpolated time step26, Low/Low trajectories travelled significantly closer to the unchosen option than High/Low trajectories for considerable portions of the trajectories (7/5: steps 52 to 101; 10/5: steps 55 to 82; 20/5: steps 57 to 93) and closer than High/High trajectories for similar portions (7/5: steps 41 to 90; 10/5: steps 55 to 75; 20/5: steps 50 to 94). In addition, in Experiments 1 (7/5) and 3 (20/5), small portions of the High/High decision trajectories were significantly closer to the eventual choice that High/Low trajectories (7/5: steps 29 to 53; 20/5: steps 44 to 52; departures of less than 8 consecutive points were not considered significant to preserve family-wise error-rate), offering some weak but significant suggestion that High/High decisions may have been easier than High/Low decisions. It is also worth noting that this facilitation of High/High over High/Low was earlier in the trajectories than the conflict effects observed in the Low/Low trajectories.

Modeling decision space

By connecting empirical action dynamics with systems of differential equations, insightful new visualizations of decision spaces are possible. We sought to characterize the cognitive landscape from which decisions emerge. Using momentary velocities and accelerations derived from the positional coordinates in the decision trajectories, we inferred a potential field that captured characteristics of this decision space. Low choices in High/Low decisions were included in these analyses.

Using the computer-mouse coordinates collected during choices, it was assumed that motion during a choice can be described by a potential field given by the function V(x, y) = Vx(x) + Vy(y), where x and y are the screen coordinates. As a first approximate, it was assumed that motion in the x-direction was independent of motion in the y-direction, hence V(x, y) could be separated into the two functions Vx(x) and Vy(y). Thus, the most simple system of second-order ordinary differential equations describing the motion could be written

The functions Vx(x) and Vy(y) were approximated using the experimental data in two steps. First, the screen was discretized into a mesh (, ) and, using the mouse-cursor position data, approximations of the velocities ( and ) and accelerations ( and ) were calculated at each mesh point by interpolation and averaging over a chosen set of the participants' motion data. Second, using eqs. (1) and (2), the averaged velocities and accelerations were fed back into the integrals

from which the overall potential function V(x, y) was calculated. Note that this was done independently of the time when the mouse cursor reached a specific position on the screen.

Across the three blocks of 12 choices in each experiment, the induced decision spaces evolved to match the relative values of the available choices. This is most clearly seen in the increase in potential at the location of the low value choice in the High/Low decision space (see Fig. 4, second column from left). For High/Low decisions, in early decisions, these spaces showed relatively equal potential at the location of low- and high-reward choices and across decisions, the location of the high-reward choice decreased in potential and the location of the low-reward choice increased in potential. Visually, the decision space tipped towards the high-reward choice demonstrating the facilitation of high-reward choices across decisions.

Further evidence of the change in dynamics across decisions is observed in the gradients of the potential functions V(x, y) corresponding to decision dynamics during the first and final block of 20/5 decisions (Fig. 5). On the high value side of the figure (right-hand side), both early (red arrows; first block) and late (green arrows; third block) point towards the stimulus on that side of the figure (i.e., 20). However, there was a marked changed on the low value side of the figure, as participants learned the values of the available choices. In the first block (red arrows), motion towards the low value stimulus was likely to persist; red vectors on the low value side of the figure point toward the low value stimulus (i.e., 5). In contrast, in the final block (green arrows), the vector of motion changed towards the middle of screen indicating a reduction in the x component of velocity in the direction of the low value stimulus; green vectors on the low value side of the figure no longer point toward the low value stimulus.

Figure 5
figure 5

The gradients of the potential functions V(x, y) corresponding to decision dynamics during the first and third block of 20/5 decisions, with arrows pointing “down hill” and where the high value stimulus is located on the right-hand side of the figure.

The first block is highlighted with red arrows and the third blck with green arrows. There is a marked change in the arrow directions on the left-hand side (low value side) between the first and third block, indication of a positive bias towards the high value in the third block, where no such bias can be seen in the first block. Shaded circles denote approximate locations of choice symbols.

The decision spaces also highlighted differences between High/High and Low/Low decision dynamics. During High/High decisions, stronger attraction (steeper slopes) indicated easier decisions (faster, more direct) and potential at the choice locations was stable or decreased across blocks indicating evolution of stronger attraction towards high-point choices across decisions. In contrast, during Low/Low decisions, weaker attraction (shallower slopes) towards the available choices indicated greater persistence of indecision (slower, less direct choices). The potential at 4 of the 6 low-point choice locations increased across Low/Low blocks indicating weakening of attraction to these choices. Finally, in the middle of the decision space, the slope towards choices plateaued, providing tentative evidence of a late ‘saddle point' prior to eventual movement towards the available choices. These decision space models provide support for Killeen's49 and Spivey and Dale's35 position that a choice between two alternatives can be understood as movement in a two-well attractor landscape.

Discussion

Across three experiments, participants were sensitive to the relative reward for outcomes in a binary decision task. As the ratio of the higher reward to the lower increased across experiments, participants were more likely to reliably choose the outcomes with the higher rewards. Within experiments, for those who learned to reliably choose the high reward outcomes, the evolution of preference within and across decisions was similar across experiments. When one outcome was more favorable than the other (High/Low), decisions were fast and direct. When both outcomes earned equal rewards, decision dynamics depended upon the value of the available rewards. If both choices earned high rewards (High/High), the decisions were fast and direct, like High/Low decisions. If both choices earned low rewards (Low/Low), decisions were slow and indirect, with persistent early instability. Potential surfaces derived from positional coordinates fit with theoretical characterizations of decision making (e.g,49,35,24) in which available choices are attractors in a decision space.

To date, preferential decision making has, in the main, been investigated by analyzing patterns of choice allocation and response times. Our analyses of decision dynamics potentially provide much greater detail on the evolution of decisions. In the current experiments, reaction times were sensitive to the greater competition during Low/Low decisions. However, when we controlled for reaction time (Fig. 3), the evolution of horizontal velocity towards the available choices remained markedly different across conditions. Low/Low decisions were not simply slower versions of High/Low and High/High decisions; they exhibited a distinct temporal evolution characterized by persistent early competition that was not observed in the other decisions.

Low/Low decisions induced greater competition than High/High decisions. If one considers that the rewards available for choices determined the attraction towards the available choices, then one might expect persistent competition between two strong attractors in High/High decisions. Instead, participants chose one of the available high value options quickly and directly. Greater competition between similar low value choices than similar high value choices has been seen in some recent experimental data and in basal ganglia and diffusion models of decision conflict53. We can see two possible explanations for these differences in the current study. The first is that participants may have set a reference point based on some function of the mean points per decision10,19,13. Since low value symbols were below this reference point, they may have been experienced as a loss and aversive, even though they remained nominally rewarding. Research on framing effects17 provides convincing evidence that repulsion from perceived bad choices (loss aversion) is a powerful determinant of eventual choice. In this case, in addition to attraction towards the available choices, low value choices may have exerted a repulsion on response trajectories, in effect a dynamic expression of approach-avoidance conflict54. In High/High decisions, there would have been no repulsion to counter the attraction towards the eventual choice resulting in faster more direct trajectories. A weakness with this account is that repulsion from Low value options would encourage faster and more direct choices in High/Low decisions than in High/High decisions, which was not observed (in fact there was some evidence of the reverse effect). Within a binary choice framework, it is difficult to distinguish attraction towards the terminal choice from repulsion from alternative choices, but analyses of decision dynamics provide a fresh approach to distinguishing between these different possibilities.

An alternative explanation of greater competition during Low/Low decisions is that the slower uncommitted responding in Low/Low choices was due to a negative incentive contrast effect. A negative incentive contrast effect occurs when following experience with a high value reward, the incentive effect of a low value reward is reduced. For instance, animals are known to run slower and less directly towards a less preferred reward following training with a preferred reward (see55 for a review). The fact that responding to High/High and High/Low decisions exhibited similar characteristics suggests that the highest available stimulus on a trial dominated decision dynamics. Further research under tighter laboratory controls will facilitate more thorough investigation of these findings including testing the stability, in longer protocols, of the differences identified herein. In particular, the seeming absence of conflict during High/High decisions relative to High/Low decisions invites further empirical research. In an attractor model, Low/Low decisions might be understood to constitute a decision space with two weak attractors that fail to capture response trajectories until later in the decision. In High/High and High/Low decisions, stronger attractors captured response trajectories earlier and pulled trajectories faster towards completion.

As mentioned previously, some dynamical models of decision-making predict greater competition between similar low value choices than between similar high value choices. In particular, basal ganglia and diffusion models predict these effects, but so, arguably, do models that include lateral inhibition, when inhibition is high4,19,21. Indeed, the very low conflict observed during High/High decisions might constitute evidence of strong inhibition of the High alternative in these cases. The foregoing models constitute a subset of sequential-sampling models of decision making, in which information gradually accrues over time to bias the eventual choice. The current method of collecting detailed movement data on decisions seems particularly appropriate for the analysis of such models, since they propose processes that unfold over time and eventually arrive at one of the available options. Dynamic competition between graded continuous representations of choices may account for gradual shifts in trajectory and ‘changes of mind' (e.g.24). Finally, a number of these models demonstrate attractor dynamics, which may allow researchers to develop model decision spaces to compare with decision spaces generated from experimental data.

Within a broader learning context, Schöner and Kelso56,57 proposed a dynamical account of learning as the integration of intrinsic dynamics and behavioural information. In our experiments, intrinsic dynamics refer to the attractor landscape in the space of possible responses (including physical, biological and learned constraints) prior to exposure to a new decision and behavioural information both mitigates the expression of those dynamics in that decision and updates the intrinsic dynamics for future decisions. Average decision spaces, such as those generated in the current analyses, characterise the gradual evolution of preference in different subsets of environmental contingencies. In this way, we approximate the average change in the intrinsic dynamics of preference and the average impact of behavioural information. Metaphorically, the decision space is the ‘playing field' on which decisions compete. However, Schöner and Kelso emphasise that the greatest contribution of their account is to enable modelling of intrinsic dynamics at the individual level. In contrast, our average decision spaces necessarily occluded inter-individual differences in learning and, thus, differences in the structure of decision spaces at the level of the individual participant (with his/her particular physical, biological and learning constraints). At present, many trajectories are required to establish stable decision spaces and, thus, it was not possible to compare decision spaces at the level of the individual participant or trajectory. As we learn more about the expression of decisions in two-dimensional space, then it might be possible to generate decision spaces for individual participants or decisions. For instance, one might hypothesise a baseline space (e.g., for a condition or an individual), from which a specific trajectory might provide deviations that would allow us to generate an hypothetical decision space for that trajectory. It is our hope that analyses of decision dynamics and the depiction of decision spaces will provide a flexible testbed for future theoretical development.

Methods

Participants were drawn from Amazon Mechanical Turk (mturk.com), restricted to the United States, with the following inclusion criteria. Participants were required to have completed a minimum of 100 Mechanical Turk tasks prior to participation and they were required to have an approval rate 95% or above (i.e., in 95% of cases they were paid for the work they had done), a measure of how effectively they typically dealt with Mechanical Turk tasks. Participation in the experiments earned US$0.50. At the beginning of each participant's performance, they answered a series of demographic questions (which could be filled in before, or after, but not during, their performance). This included simple non-identifying information such as age and first language. From Amazon, participants were forwarded to an external link that presented a “game-like” interface using Adobe Flash®.

At the beginning of the experiment, the following instructions were presented:

Thank you for your work on this brief HIT.

In the next screen, you will see a small green button. Click it to begin the first round. There are 36 rounds and in each round you will choose between two shapes to earn points. Different shapes give different points. The more points you earn the better! Are you ready?

Click the bottom-right corner of this instruction box to continue…

On each decision, participants clicked on a button at the center of the bottom of the screen to begin their response, which was made to one of two shapes from the Bodoni font set presented on the top-left or top-right of the computer screen. This produced, on each decision, a computer mouse-cursor trajectory from the bottom center to the left or to the right (see Fig. 1).

Across three experiments, the ratio of the high-value choice to the low-value choice was systematically manipulated. Participants were assigned to either Experiment 1 (7/5; n = 34), the high-point reward was 7 and the low was 5, in Experiment 2 (10/5; n = 37), high was 10 and low was 5 and in Experiment 3 (20/5; n = 55), high was 20 and low was 5. To control for any potential effect of the Bodoni font, at the start of each experiment, two shapes were randomly (across participants) assigned to the experiment-specific high value and the two others to the low value (these stimulus-reward pairings remained consistent within each participant's set of decisions, so that they could learn these values). In each decision, two of the four symbols (High 1, High 2, Low 1 and Low 2) were presented as choices, creating three possible choice situations: a high-value choice versus high value choice (High/High), a high-value choice versus low-value choice (High/Low) and a low-value choice versus low-value choice (Low/Low). The Flash game presented 36 of these decisions in a random sequence, 24 in the High/Low condition and 6 in each of High/High and Low/Low conditions. On clicking a choice, the points for that choice were presented in green text (e.g., +20) in the middle of the screen and a point counter in black text at the top of the screen was incremented by that value (see Fig. 1).

Amazon Mechanical Turk provides access to a demographically diverse population58. However, though arguably providing increased ecological validity, the online environment is relatively uncontrolled compared to typical decision-making experimental contexts. We employed a number of exclusion criteria designed to exclude participants who did not pay due attention to the experimental task. From a total of 140 participants, participants who satisfied the following criteria were excluded: those whose mean reaction time was below 500 ms (n = 9), those with a reaction time on any trial of greater than 10 s (n = 6), those who completed fewer than 36 trials (n = 5) and those who chose the left or right stimulus on over 75% of trials (n = 2). Overall, 11 participants were excluded (8%). Following these exclusions, trajectories of the remaining 129 participants were visually analysed and three further participants were removed because of unusual patterns (e.g., “swooping”, moving directly left or right before moving upwards towards the choices). 126 participants were assigned to the three experiments as described above.