## Abstract

Much of what we know about how the brain forms decisions comes from studies of saccadic eye movements. However, saccadic decisions are often studied in isolation, which limits the insights that they can provide about real-world decisions with complex interdependencies. Here, we used a serial reaction time (RT) task to show that prior expectations affect RTs via interdependent, normative decision processes that operate within and across saccades. We found that human subjects performing the task generated saccades that were governed by a rise-to-threshold decision process with a starting point that reflected expected state-dependent transition probabilities. These probabilities depended on decisions about the current state (the correct target) that, under some conditions, required the accumulation of information across saccades. Without additional feedback, this information was provided by each saccadic decision threshold, which represented the total evidence in favor of the chosen target. Therefore, the output of the within-saccade process was used, not only to generate the saccade, but also to provide input to the across-saccade process. This across-saccade process, in turn, helped to set the starting point of the next within-saccade process. These results imply a novel role for functional information-processing loops in optimizing saccade generation in dynamic environments.

**SIGNIFICANCE STATEMENT** Saccades are the rapid, ballistic eye movements that we make approximately three times every second to scan the visual scene for interesting things to look at. The apparent ease with which we produce saccades belies their computational sophistication, which can be studied quantitatively in the laboratory to provide insights into how our brain manages the interplay between sensory input and motor output. The present work is important because we show for the first time how this interplay operates both within and across saccades to ensure that these eye movements are guided effectively by learned expectations in dynamic environments. More generally, this study shows how sensory-motor decision processes, typically studied in isolation, interact via functional information-processing loops in the brain to produce complex, adaptive behaviors.

## Introduction

The generation of goal-directed saccadic eye movements, like other aspects of sensory, cognitive, and motor processing, can be sensitive to expectations (Friston, 2010; O'Reilly, 2013; Seriès and Seitz, 2013; Pezzulo et al., 2015). For example, saccadic reaction times (RTs) are affected when the expected location of a visual saccade target is cued in advance, such as as in the Posner cueing task (Posner et al., 1980; Posner, 1980), or learned implicitly from the statistical structure of a recent sequence of events, such as in the serial RT task (Nissen and Bullemer, 1987; Reed and Johnson, 1994; Schwarb and Schumacher, 2012). These effects imply a dynamic process that can establish appropriate expectations from one saccade to the next. However, the computational principles that govern this sequential-updating process are not yet fully understood.

To identify these principles, we measured saccadic RT distributions of human subjects performing a dynamic auditory location-discrimination task that required them to look in the direction (left or right) of a virtual sound source that varied probabilistically for sequences of sounds (see Fig. 1*A*). Our analyses focused on two sets of complementary questions. First, what specific aspect of saccade processing was affected by trial-by-trial, state-dependent expectations? Consistent with previous reports, we found that saccadic RTs were sensitive to state-dependent transition probabilities (in this case, the conditional probabilities that a sound would come from the right or left on the current trial given its location on the previous trials; see inset in Fig. 1*A*) (Nissen and Bullemer, 1987; Reed and Johnson, 1994; Schwarb and Schumacher, 2012). We extended those results by analyzing the RT distributions using versions of the Linear Approach to Threshold with Ergodic Rate (LATER) model (Carpenter and Williams, 1995; Reddi et al., 2003; Nakahara et al., 2006; Oswal et al., 2007; Noorani and Carpenter, 2016). This kind of “rise-to-threshold” model has been used to analyze saccadic RT data under a broad range of conditions, including how expectations are learned gradually in response to changes in the relative frequencies of occurrence of particular saccade targets (Anderson and Carpenter, 2006; Noorani and Carpenter, 2016). We used it to show that expectations based on learned, state-dependent transition probabilities have consistent effects on the expected (prior) probability of making a particular saccade, but not on how the instructive sensory cue is processed to initiate the saccade (see Fig. 1*B*).

Second, how were these prior probabilities updated on a trial-by-trial basis? Here, we extended the LATER model to include a dynamic process for updating priors based on normative principles of sequential inference in unpredictable environments (see Fig. 1*C*) (Glaze et al., 2015). This kind of sophisticated inference process is consistent with how human subjects solve a range of cognitive, perceptual, and motor tasks, but has not yet been related directly to the saccadic system (Courville et al., 2006; Behrens et al., 2007; Nassar et al., 2010; Wilson et al., 2010; Haith and Krakauer, 2013; Payzan-LeNestour et al., 2013; Gallistel et al., 2014; Glaze et al., 2015; Otto et al., 2015). We tested key features of these models, including how estimates of state-dependent transition probabilities are updated according to the strength of evidence provided on each trial about the current state. These analyses show that saccadic priors are updated sequentially according to normative principles that can involve functional information-processing loops that couple within- and across-saccade decision processes (see Fig. 1*B*,*C*). These findings provide new insights into, not just how the brain promotes speeded saccadic responses to the most likely target locations in dynamic environments, but also more generally how decision processes, typically studied in isolation, interact to produce complex, adaptive behaviors.

## Materials and Methods

#### Behavioral task

Human subject protocols were approved by the University of Pennsylvania Internal Review Board. Forty-one subjects (15 female and 26 male; age range = 18–42 years) participated in the study after providing informed consent. Data from one subject were excluded from the analyses because of the very high prevalence of short-latency saccades (64.25% of measured RTs were <100 ms). Data from two other subjects were excluded from the analyses because both chose one option almost exclusively (first: 916 leftward choices out of 996 trials; second: 1201 rightward choices out of 1205 trials). Each subject underwent one session of training before participating in one to three experimental sessions.

The task required subjects to indicate whether the virtual source of a sound presented through headphones (Senheiser HD 598 over-ear headphones) was from the left or right side by making a saccadic eye movement to a visual target located on the appropriate side. Subjects were seated and placed their chin in a chin-rest that was ∼60 cm from a Tobii T60XL eye tracker and display that sampled the position of both eyes at 60 Hz (see Fig. 1*A*).

The sounds were obtained from the Listen HRTF database (Institute for Music/Acoustic Research and Coordination). The sound presented on each trial was a set of band-pass-filtered pulses of white noise with a 10 ms gap between the pulses. The duration of each pulse was 50 ms, with 5 ms cosine ramps at the beginning and the end of the pulse. The sounds were played via headphones. The sounds varied in terms of their simulated azimuthal (horizontal) source location, picked randomly on a trial-by-trial basis from 50 different versions of that location chosen from the HRTF database.

For each trial, a fixation dot appeared on the screen, which the subject had to look at for 600 ms without blinking before the sound stimulus was presented. The fixation dot then disappeared and the sound stimulus was played through the headphones. Simultaneous to sound onset, two targets appeared, one ∼14° directly to the left of fixation and the other ∼14° directly to the right of fixation. The subject was instructed to then make a saccadic eye movement to the target in the perceived direction of the virtual sound source. RT was defined as the time from sound onset (which had a reliable, ∼3 ms latency from the time of the software trigger to the time the sound was discernible from the headphones) to the time when the first sampled eye position was >9.3° horizontally from the fixation dot (velocity- and acceleration-based measures yielded nearly identical results). A counter indicating how many trials subjects had done so far was always shown at the bottom of the screen between trials. Each correct choice was worth one point. Subjects were paid based on how many points they earned during the given session. Each session lasted ∼40–60 min.

We tested how saccadic RTs and choice behavior on this task were affected by certain combinations of the following four factors: (1) left–right transition probability, (2) stimulus strength, (3) trial-by-trial feedback, and (4) speed–accuracy instructions. The particular combinations that we tested are described below.

Left–right transition probability governed the probability that the simulated sound source was on the left or right side on a given trial given its location on the previous trial. Specifically, sound source location was generated using a Markov chain with two states, *C*_{L} (left side) and *C*_{R} (right side), and four transition probabilities, *H*_{LR} (computed as the expected number of transitions per trial, or “generative hazard rate,” corresponding to a sound presented from the left side followed by a sound from the right side), *H*_{LL} = 1 − *H*_{LR}, *H*_{RL}, and *H*_{RR} = 1 − *H*_{RL}. To assess sensitivity to different transition probabilities, we used a range of values (0.15, 0.2, 0.25, 0.35, 0.65, and 0.85) in different combinations. Within a given session, *H*_{LR} and *H*_{RL} each took on one such value for the entire session, with *H*_{LR} ≠ *H*_{RL} in some sessions and *H*_{LR} = *H*_{RL} in others (see below). Each trial consisted of the presentation of a single sound, the location of which depended on the current state. State transitions occurred between trials.

Stimulus strength was held constant within a given session at a level defined as either “strong” (S) or “weak” (W). These two levels differed in terms of their left–right discriminability based on their simulated locations determined by the HRTF database. The strong stimuli were always defined as simulated sound sources located directly to the left or right (i.e., ±90°) from center. The weak stimuli were determined separately for each subject, before performing the main task, using a modified version of the adaptive QUEST procedure (Watson and Pelli, 1983; King-Smith et al., 1994). Using this procedure, the simulated azimuth, in degrees, of the weak stimuli was set to the mean of the threshold probability distribution yielding 70% correct responses in an unbiased environment (i.e., each sound presentation was equally likely to come from the left or right side). Typically, this procedure yielded simulated azimuthal locations of ∼3–4° from center.

Trial-by-trial feedback was either given (F) or not (NF) for each full session. When feedback was given, after a correct choice, the correct target turned green and a circle appeared around the target. After an error, the incorrectly chosen target turned red and a circle appeared around the correct target. For these sessions, a second counter at the bottom of the screen showed the running tally of correct trials.

In regard to speed–accuracy instructions, for most sessions, the subjects were instructed to choose as accurately and quickly as possible. For these sessions, the subjects had to earn as many points as possible for a set number of trials. There was no time limit for these sessions. We typically gave 1200 trials for each subject. However, five subjects indicated during the training session that they felt fatigued before finishing, so they were given either 600 trials (a single session with weak stimuli and no feedback) or 1000 trials (one session with weak stimuli and no feedback and four sessions with weak stimuli and feedback). Subjects were paid based on how many points they earned during the given session. For the remaining sessions, the subjects were instructed to emphasize speed (“respond as quickly as possible”). We used this manipulation to test whether such an instruction, which is known to reduce the total amount of evidence accumulated to trigger the saccade (equivalent to lowering the decision threshold in Fig. 1*B*) (Reddi and Carpenter, 2000), also affects the evidence accumulated across trials to establish appropriate priors (see Fig. 1*C*). In these sessions, subjects had to earn as many points as possible in 40 min. There was no limit on how many trials the subject could complete during that time (minimum = 1092, median = 1452, maximum = 1738 total trials completed per session).

We use the following labels in the text to refer to the particular task conditions that we tested, corresponding to combinations of the four factors described above.

Strong stimuli (S) included sessions with: (1) equal, low generative transition probabilities (*H*_{LR} = *H*_{RL} = 0.2 for 5 sessions, = 0.25 for 3 sessions); (2) unequal, low generative transition probabilities (*H*_{LR} = 0.35, *H*_{RL} = 0.15 for 6 sessions); or (3) unequal, high generative transition probabilities (*H*_{LR} = 0.65, *H*_{RL} = 0.85 for 7 sessions). Feedback was provided in all of these sessions except two (*H*_{LR} = *H*_{RL} = 0.2). Those two sessions were included in this group because performance was high (96.9% and 99.0% correct responses) and comparable to when feedback was provided. The subjects were instructed to choose as accurately and quickly as possible. These 21 sessions were conducted using 21 different subjects.

Weak stimuli with feedback (WF) included instructions to choose as accurately and quickly as possible; 10 sessions with *H*_{LR} = *H*_{RL} = 0.2 and 2 sessions with *H*_{LR} = *H*_{RL} = 0.25. These 12 sessions were conducted using 12 different subjects. Of these, three also participated in the S condition, and all 12 also participated in the weak, no feedback (WNF) condition.

WNF stimuli included instructions to choose as accurately and quickly as possible; 10 sessions with *H*_{LR} = *H*_{RL} = 0.2 and 3 sessions with *H*_{LR} = *H*_{RL} = 0.25. These 13 sessions were conducted using 13 different subjects. Of these, six also participated in the S condition, and 12 also participated in the WF condition.

WNF stimuli with instructions to emphasize speed (WNF_{sp}) included 10 sessions with *H*_{LR} = *H*_{RL} = 0.2. These 10 sessions were conducted using 10 different subjects, none of whom participated in any of the other conditions.

#### LATER analyses

All model-fitting procedures described below were implemented by finding the minimum of the appropriate objective function using the Optimization Toolbox from MATLAB (RRID:SCR_001622).

#### Basic LATER model

The LATER model describes saccadic RT as the time it takes a linear variable (*r*) to progress from a starting value (*S*_{0}) to a threshold value (*S*_{T}) (Carpenter and Williams, 1995; Reddi et al., 2003; Noorani and Carpenter, 2016). Assuming that the rate-of-rise *r* varies from trial-to-trial with a Gaussian distribution, *r* ∼ *N*(μ* _{r}*, σ

_{r}), RT is distributed as a reciprocal of Gaussian with a heavy tail toward longer values: (

*S*

_{T}−

*S*

_{0})/

*r*= Δ

*S*/

*r*. We used a basic model that treats σ

_{r}as a scale factor set to one and had two independent parameters: (1) the mean rate of rise (μ

*) and (2) the difference between the starting point and the threshold level (Δ*

_{r}*S*).

Consistent with previous applications, we always fit the basic LATER model only to RT data from correct choices because the simple version of the model that we used does not account for errors (Carpenter and Williams, 1995; Reddi et al., 2003). For conditions with strong stimuli, this approach was justified because errors were rare. For conditions with weak stimuli, errors were more common, so we also fit the data to a modified model that accounted for errors using two competing processes (see “LATER model with dual accumulators” section below).

Best-fitting parameters of the LATER model were first found by minimizing an objective (likelihood) function *M* that computed the negative mean of the logarithm of the conditional probabilities of obtaining the RT data given the LATER model, using three free parameters (μ* _{r}*, Δ

*S*, and θ) as follows: for all RT′

_{correct,i}> θ

where φ is the standard normal probability density function, μ* _{r}* and Δ

*S*are parameters of the LATER model as described above,

*N*is the number of trials used in the fit, and RT′

_{correct,i}is the measured RT on trial

*i*of

*N*in the given session that was from a correct trial and had a value >θ. The parameter θ was used as a criterion for deciding whether a given RT was generated by a separate, fast (e.g., express) process (Fischer and Ramsperger, 1984; Carpenter and Williams, 1995; Reddi et al., 2003). The value of θ was constrained to be between 0 and 200 ms. However, if the best-fitting value under these constraints was >190 ms, then the model was refit with the upper bound of θ set to the fifth percentile of the RT data used for the given fit, which improved the estimates near 200 ms. The best-fitting value of θ from these fits of the basic LATER model were used to define the set of RTs from the given session that were used for all subsequent fits. Note that for Equation 1, we used the mean instead of the sum of the log-likelihood because the free parameter θ can affect the number of data points (

*N*in Eq. 1) used in each iteration of the fitting procedure (for model comparisons, we multiplied this value by

*N*to obtain the total log-likelihood).

We determined within-subject dependencies of the two basic LATER parameters (Δ*S* or μ* _{r}*) on two sets of task conditions, as follows. In each case, we fit nested models to the set of all RT′

_{correct}data from the given session (corresponding to a single subject), as defined by the fits to Equation 1 (i.e., with all the fast saccades removed).

The first set of conditions tested whether or not best-fitting values of Δ*S* or μ* _{r}* depended linearly on the logarithm of the state-dependent, generative transition probability (

*H*

_{C-1,C0}=

*H*

_{LR},

*H*

_{LL},

*H*

_{RL}, and

*H*

_{RR}). We used the following two nested models: (1) a four-parameter model in which both Δ

*S*and μ

*were expressed as linear functions of log(*

_{r}*H*

_{C-1,C0}): Δ

*S*= β

_{ΔS,0}+ β

_{ΔS,1}* log(

*H*

_{C-1,C0}) and μ

*= β*

_{r}_{μr,0}+ β

_{μr,1}* log(

*H*

_{C-1,C0}), where β

_{ΔS,0}, β

_{ΔS,1}, β

_{μr,0}, and β

_{μr,1}were free parameters; and (2) a three-parameter model in which the parameter of interest (Δ

*S*or μ

*) was a single free parameter across conditions and the other parameter was expressed as a linear function of log(*

_{r}*H*

_{C-1,C0}), as above.

The second set of conditions tested whether best-fitting values of Δ*S* or μ* _{r}* depended on the number of trials after a change point. We used the following two nested models, separately for 0, 1, 2, and 3 trials after a change point (denoted

*t*

_{i}; in addition,

*t*

_{4+}denotes 4 or more trials after a change point, corresponding to the steady-state value of the parameter): (1) a four-parameter model, with Δ

*S*

_{ti}, μ

_{r}_{,ti}, Δ

*S*

_{t4+}, μ

_{r}_{,t4+}and (2) a three-parameter model in which the parameter of interest (Δ

*S*or μ

*) was a single, free parameter for both*

_{r}*t*

_{i}and

*t*

_{4+}.

For both sets of within-subject comparisons, significance was assessed via a likelihood-ratio test (*D* = twice the difference in the log-likelihoods of the two fits, with a difference in the degrees of freedom equal to one, and compared with a χ^{2} distribution). A significant effect implies that the data were better fit by the first model, in which the given parameter varied as a function of task condition. We also assessed across-subject effects using a Kruskal–Wallis test, with transition probability or trials after a change point as the grouping variable.

#### LATER model with dual accumulators

Because the basic LATER model does not account for errors, we also used a modified LATER model based on two competing accumulators, one for correct responses and the other for error responses (Reddi et al., 2003). This model assumes that the correct and error processes race toward their respective boundaries on each trial, with the winner determining the choice and RT on that trial. We used this model to fit data from the S, WF, and WNF conditions (see Fig. 3*B–E*).

This model had five parameters to replace μ* _{r}* and Δ

*S*from the basic LATER model: (1) mean rate of rise for correct choices, μ

_{r}_{,correct}(constrained to be >0), (2) mean rate of rise for the incorrect choices, −μ

_{r}_{,error}(constrained to be <0), 3) starting point

*S*

_{0}, (4) threshold magnitude

*S*

_{T}(+

*S*

_{T}for correct choices, −

*S*

_{T}for incorrect choices), and (5) a parameter ε that can add a delay to correct or error RTs relative to the time of boundary crossings in the model. Specifically, we computed the probability of a correct response as follows: Where φ is the cumulative distribution function of the standard normal distribution, Δ

*S*

_{correct}=

*S*

_{T}−

*S*

_{0}, Δ

*S*

_{error}=

*S*

_{T}+

*S*

_{0}, and RT′ represents data from non-short-latency saccades, as described above. The parameter ε is equivalent to a criterion bias in signal detection theory in evaluating the difference distribution, 1/RT′

_{correct}− 1/RT′

_{error}(Green and Swets, 1966). The value of ε was >0 in all of our fits, which implies that an accumulator with mean μ

_{r}_{,correct}and an accumulator with mean −[μ

_{r}_{,error}+ εΔ

*S*

_{error}] race independently toward bounds

*S*

_{T}and −

*S*

_{T}, respectively. If S

_{T}is reached first, then a correct choice is made with the RT determined by the time of bound crossing. However, if −S

_{T}is reached first, then an error choice is made and the RT is delayed by the amount , where

*r*

_{error}∼

*N*(μ

_{r,error}, σ

*) is the particular instantiation of the rate of rise on that trial. Therefore, the delay increases as the rate of rise decreases. We also tested a four-parameter model in which the bias parameter ε was set to zero. The four-parameter model captures the RT behavior well, but tended not to capture the choice behavior as well as the five-parameter model (likelihood-ratio test,*

_{r}*p*< 0.01 for all subjects who underwent weak-stimuli conditions).

We fit this model by minimizing the following objective (likelihood) function *M* that computed the negative sum of the logarithm of the conditional probabilities of obtaining the RT data given the model, using the five parameters describing the dual accumulators:
where *p*(correct) is from Equation 2; *p*(error) = 1 − *p*(correct); and RT′_{correct,i} and RT′_{error,j} are RT data obtained from the *i ^{th}* correct trial and

*j*error trial, respectively, from the given session and had values >θ. The values of θ

^{th}_{correct}and θ

_{error}were obtained for each session by fits of the same RT data from correct and error trials, respectively, to Equation 1 to remove short-latency values. We assessed within- and across-subject effects as described above for the basic LATER model.

#### Adaptive LATER model

To better understand the principles that guided the accumulation process reflected in the saccadic priors across trials, we fit the data to a modified, adaptive LATER model. In this model, the starting value *S*_{0} represents the prior that evolves over time according to normative principles (Glaze et al., 2015). Specifically, the log-prior-odds of the sound being on the right versus left side on trial *n* (ψ_{n}) are a function of both the log-posterior probabilities from the previous trial (*q*_{R,}_{n}_{−1} and *q _{L}*

_{,}

_{n}_{−1}= 1−

*q*

_{R,}

_{n}_{−1}for right and left, respectively) and the transition probabilities

*H*

_{RL}and

*H*

_{LR}(Bishop, 2006; Glaze et al., 2015) as follows: The log-posterior-odds,

*L*

_{n}= log(

*q*

_{R}_{,n}/

*q*

_{L}_{,n}) = log[

*q*

_{R,n}/(1−

*q*

_{R,}

_{n}_{−1})], are updated on each trial as the sum of the log-prior-odds and the log-likelihood ratio provided by the sensory evidence on that trial (

*LLR*

_{n}, which is governed by α, a free parameter associated with the given stimulus strength) as follows: where

*LLR*

_{n}= −

*logit*(α) for left sources, +

*logit*(α) for right sources.

We used this process to govern trial-by-trial dynamics of the saccadic prior in the LATER model and then fit this adaptive model to the RT data as described for Equation 1, above. This adaptive model had either six or eight total free parameters, depending on the number of unique, state-dependent, generative transition probabilities (*H*; by contrast, the basic LATER model fit to the same data had three free parameters per value of *H*): either two or four parameters (μ_{r}_{,H}) representing the mean rate of rise per value of *H*; three parameters governing the starting value of the LATER process, *S*_{0} = β* _{n}* =

*f*(

*Ĥ*,

_{RL}*Ĥ*, α) from Eqs. 4 and 5, where

_{LR}*Ĥ*and

_{RL}*Ĥ*are the subjective estimates of the respective objective values for that session; and one parameter representing the ending value of the LATER process,

_{LR}*S*

_{T}, such that Δ

*S*from Equation 1 =

*S*

_{T}−

*S*

_{0}. Note that both μ

*and α reflect stimulus strength, but of the two, only α is used to inform the prior on the subsequent trial and thus is also potentially sensitive to post-saccade feedback. Moreover, by leaving*

_{r}*Ĥ*and

_{RL}*Ĥ*as free parameters, this model tests the extent to which subjects can learn and use subjective estimates of the objective, latent values to govern their saccadic behavior, as has been shown for certain choice tasks (Glaze et al., 2015). Finally, we emphasize that this parameterization assumes that the across-trial decision affects the starting point of the LATER process (

_{LR}*S*

_{0}), but ultimately its effect on saccade generation is via Δ

*S*and, therefore, it could in principle be equivalently applied to the ending value of the LATER process (

*S*

_{T}).

We fit the adaptive LATER model by minimizing the objective (likelihood) function *M* that computed the negative sum of the logarithm of the conditional probabilities of obtaining the RT data given the model while respecting the sequence of *N* trials, defined as follows:
Here, we used θ obtained per session by fits of the same RT data from correct trials to Equation 1 to remove short-latency values, as described above.

We fit both the basic and the adaptive LATER model to data from each individual session, including all tested conditions (S, WF, WNF, and WNF_{sp}). As noted above, some subjects participated in more than one session and were tested using different combinations of factors in each session. However, participation across sessions by individual subjects was not uniform across conditions (e.g., there was strong overlap in the subjects who participated in the WF and WNF conditions, but not otherwise). We therefore analyzed differences in model fits and parameter estimates across conditions on the group level, which included both within- and across-subject effects.

#### Integration time index

To quantify the gradual, across-trial updates of saccadic prior probabilities described by the adaptive model fits, we computed an integration time index. We computed the integration time index from each session as the normalized area under the curve of Δ*S* from the adaptive model fits as a function of trials 1–3 after a change point (see Fig. 3*B*,*C*). Specifically, we computed *E*[Δ*S*_{c,n} − *E*[Δ*S*_{c,m}]], where *E*[ ] indicates the expected (mean) value across all values of *c* and *n*, *c* indicates a left or right choice, *n* is 1–3 trials after a change point, and *m* is 4 or more trials after a change point. The value of this index is zero if Δ*S* changes abruptly, larger if Δ*S* changes gradually after a change point.

## Results

We measured saccadic RTs of human subjects performing a dynamic auditory discrimination task. Subjects were required to make a saccadic eye movement to look at a visual target in the right or left visual field in response to a brief noise burst played through headphones from a virtual sound source located to the right or left of central gaze, respectively (Fig. 1*A*). Sound source location was generated using a Markov chain with two states, *C*_{L} (left) and *C*_{R} (right), and four generative transition probabilities, *H*_{LR} (the probability of a sound presented from the left side followed by a sound from the right side), *H*_{LL} = 1 − *H*_{LR}, *H*_{RL}, and *H*_{RR} = 1 – *H*_{RL}. Each trial consisted of the presentation of a single sound, the location of which depended on the current state. State transitions occurred between trials.

### Saccadic priors reflected state-dependent transition probabilities

Previous studies that analyzed saccadic RT data using the LATER model used a state- (history-) independent process that is equivalent to *H*_{LR} + *H*_{RL} = 1; that is, the probability that the location of the sound is to the left is *H*_{L} = *H*_{RL} = *H*_{LL} and to the right is *H*_{R} = *H*_{LR} = *H*_{RR} regardless of the outcome of the previous trial (Carpenter and Williams, 1995; Reddi et al., 2003). To study how expectations affect saccadic processing in more structured environments, we used state-dependent transition processes in which the probability that the sound is to the left or right depends on its prior location; i.e., *H*_{LR} + *H*_{RL} ≠ 1. For the first set of tests, we also used strong, easily discriminable stimuli [the median (interquartile, or IQR) error rate per session was 2.56% (range = 2.11–3.91%)] and feedback on each trial to minimize uncertainty about the previous state.

Under these conditions, we found that saccadic priors were updated dynamically according to the appropriate, state-dependent transition probabilities (*H*; Fig. 2). Specifically, we tested three sets of conditions: (1) relatively low, equal values of *H*_{LR} and *H*_{RL} (0.2 or 0.25), implying a stable environment in which target locations tended to repeat (Fig. 2*B*,*E*); (2) relatively low, unequal values of *H*_{LR} (0.35) and *H*_{RL} (0.15), implying a more stable environment following right versus left locations (Fig. 2*C*,*F*); and (3) relatively high, unequal values of *H*_{LR} (0.65) and *H*_{RL} (0.85), implying an unstable environment in which target locations tended to switch slightly more after right versus left locations (Fig. 2*A*,*D*,*G*). Figure 2*A* shows example RT distributions for trials associated with the high, unequal values of *H*_{LR} (0.65) and *H*_{RL} (0.85) on reciprobit axes, with reciprocal RT on the *x*-axis and percentage cumulative frequency on a probit scale on the *y*-axis. Consistent with the assumptions of the LATER model (Fig. 1*B*), each distribution plotted in this way followed an approximately straight line except for the small percentage of very short-latency saccades that likely reflect an express saccade or other fast process (Fischer and Ramsperger, 1984; Carpenter, 2012; Noorani and Carpenter, 2016). Critically, the reciprobit lines tended to show a qualitative “swivel” as a function of the different transition probabilities, consistent with changes in priors (Fig. 1*B*) (Carpenter and Williams, 1995).

To quantify these effects, we fit the basic LATER model to the interleaved conditions associated with each state transition. These fits yielded changes in the height of the rise-to-threshold process (Δ*S* in Fig. 1*B* and Eq. 1) that depended appropriately on the state-dependent transition probability when measured both for individual subjects (filled circles in Fig. 2*B–D*; likelihood-ratio test, *p* < 0.01 in all cases except one subject in Fig. 2*B*) and median values across subjects (Kruskal–Wallis test, *p* < 0.01 in all three cases). In contrast, best-fitting values of the mean rate of rise of the rise-to-threshold process (μ* _{r}* in Fig. 1

*B*and Eq. 1), which is typically interpreted in terms of the strength of the stimulus instructing the saccadic eye movement, were not affected as consistently by these manipulations: data from individual subjects showed some dependencies on the generative transition probability (possibly reflecting a priming effect thought to modulate μ

*under some conditions; Cho et al., 2002; Goldfarb et al., 2012), but median values across subjects did not (Figs. 2*

_{r}*E–G*;

*p*> 0.01 in all three cases). Therefore, saccadic RTs reflected learned expectations about state-dependent transition probabilities via changes in the height of a rise-to-threshold saccade generation process, representing a form of prior probability of making a particular saccade.

### Saccadic priors reflected the accumulation of uncertain evidence over trials

The data shown in Figure 2 correspond to conditions in which the auditory stimuli were easily identifiable as coming from the left or right side and were followed by explicit visual feedback about the correct location. Under those conditions, the expected probabilities of a left or right choice on the current trial depended on the previous state (i.e., whether the sound on the previous trial came from the left or the right), but not the states before that. Here, we used less-discriminable stimuli and no feedback (WNF; see Fig. 5*A*). Under these conditions, there was uncertainty about the previous state. According to normative theory, this uncertainty should lead to more gradual updating across trials, representing a form of evidence accumulation (Fig. 1*C* and Eq. 4 and 5) (Glaze et al., 2015).

We found that, consistent with normative theory, saccadic priors changed gradually over multiple trials in the presence of state uncertainty (Fig. 3). Specifically, when weak stimuli were used and no feedback was given, the best-fitting value of Δ*S* from the LATER model was largest just after a change point, then decreased gradually over several trials (Fig. 3*C*, cf. dark gray curves in Fig. 1*C*). These effects were not evident when using strong stimuli, which reliably indicated the current state and therefore corresponded to abrupt and complete adjustments to Δ*S* after a change point (Fig. 3*B*). The gradual updates of Δ*S* in the presence of state uncertainty were not evident for μ* _{r}*, which, on average, stayed relatively constant across trials for the three task conditions (Fig. 3

*D*,

*E*).

One potential concern about these fits is that the basic LATER model does not account for the errors that occurred in the weak stimulus condition, which were therefore not included in the fits (Carpenter and Williams, 1995; Reddi et al., 2003). To address this concern, we fit the full datasets from the WNF condition to an extended model that used competing accumulators to account for both correct and error choices (Reddi et al., 2003). This model captured empirical error rates (Pearson's *r* comparing empirical and simulated error rates per session was 0.99, *p* < 0.0001) via a competing process that tended to have equal or smaller best-fitting values of Δ*S* and μ* _{r}* than those obtained from correct trials for both repeated (Δ

*S*: median value = 0.97 for error trials vs 0.88 for correct trials, Wilcoxon signed-rank test,

*p*= 0.48; μ

*: 2.47 vs 2.85, respectively,*

_{r}*p*< 0.001) and switched choices (Δ

*S*: 0.77 vs 1.35, respectively,

*p*< 0.00001; μ

*: 2.54 vs 3.01, respectively,*

_{r}*p*= 0.0034). These fits imply that the subjects were using consistent strategies throughout these conditions, and errors tended to occur when the noisy, trial-by-trial evidence was particularly weak. Critically, the best-fitting values of Δ

*S*and μ

*for correct trials from this model were nearly identical to those obtained from the basic LATER fits to correct trials only (Pearson's*

_{r}*r*comparing best-fitting values of both μ

*and Δ*

_{r}*S*from the two models per session was 0.99,

*p*< 0.0001). Accordingly, the effects of trial history using these fits were also nearly identical to those reported above: Δ

*S*, but not μ

*, for correct choices was adjusted gradually after a change point on trials with weak stimuli and no feedback (Fig. 3*

_{r}*B–E*, magenta symbols; note that there were too few error trials to conduct a similar analysis to identify their dependence on trial history).

### Saccadic priors were updated according to predictions of a normative model

The preceding analyses demonstrated that changes in saccadic prior probabilities affected the total amount of evidence accumulated per saccade (Δ*S* in the basic LATER model) and not the rate of rise of the decision variable (μ* _{r}* in the basic LATER model), which is consistent with previous findings (Carpenter and Williams, 1995). To better understand the principles that guided how the saccadic priors were updated across trials, we fit the data to a modified, adaptive LATER model that describes such updates according to normative principles (Glaze et al., 2015). According to this model, the starting point

*S*

_{0}represents the prior that is updated on each trial based on three parameters: the subject's estimates of

*H*

_{LR}and

*H*

_{RL}and a parameter α that represents the weight of evidence provided by the stimulus about the current state and thus strongly governs the time course of prior updates, particularly after change points (Fig. 1

*C*and Eqs. 4–5). The model included several other parameters as in the basic LATER model: one decision threshold,

*S*

_{T}, plus separate values of μ

*for each generative transition probability (models using a single value of μ*

_{r}*across conditions yielded similar results). Unlike the basic model, which accounted for differences in Δ*

_{r}*S*as a function of generative transition probability (Fig. 2) or trials after change point for weak stimuli (Fig. 3) only by explicitly using separate parameters for each such condition, a single version of the adaptive model was sufficiently flexible to account for all of these effects (Fig. 4).

We compared fits to the adaptive and basic models for data collected under several different conditions that were likely to affect how priors were updated across trials. These conditions, which led to different patterns of accuracy (Fig. 5*A*) and RTs (Fig. 5*B*) for each saccadic decision, were as follows: (1) S stimuli, corresponding to the data presented in Fig. 2, which tended to have the highest accuracy and short RTs; (2) WF stimuli, which tended to have slightly lower accuracy and longer RTs; (3) WNF stimuli, corresponding to the data presented in Fig. 3*C*,*E*, which tended to have slightly lower accuracy and shorter RTs than in the WF condition; and (4) WNF_{sp} stimuli, which tended to have the lowest accuracy and short RTs.

The adaptive model tended to provide better fits to the data than the basic models, particularly when weak stimuli were used and the subjects were not emphasizing speed (the model comparisons were quantified using the Bayesian information criterion, which accounted for differences in the numbers of parameters between the two sets of non-nested models; Fig. 5*C*). Under these conditions, priors tended to change gradually after change points (e.g., Fig. 3*C*), as predicted by the adaptive model (Fig. 1*C*) but not the basic model. In contrast, the two sets of models tended to provide similar fits for conditions in which priors were predicted by the adaptive model to either change abruptly (S) or not change (WNF_{sp},) after change points (Fig. 1*C*), both of which could be accounted for using the basic model applied separately to conditions with different generative transition probabilities. However, there was also considerable individual variability across conditions in the relative quality of the fits of the two models. This variability was strongly dependent on variability in the time course of prior updates. In particular, subjects better fit by the adaptive model tended to update their priors more gradually across trials (Fig. 5*D*). This result emphasizes the flexibility of the adaptive model and its general applicability to all of the conditions we tested.

This flexibility reflected task-dependent differences in best-fitting parameter values from the adaptive model (Fig. 6). The parameter μ* _{r}*, representing the mean slope of the rising process that is sensitive to the strength of the instructive sensory signal (Reddi et al., 2003), was stronger in the strong versus the weak stimulus conditions (Fig. 6

*A*; Wilcoxon rank-sum tests comparing S separately with WF, WNF, and WNF

_{sp},

*p*< 0.001 in each case, and comparing WF separately with WNF and WNF

_{sp}and WNF with WNF

_{sp},

*p*> 0.24 in each case) and in all cases nearly identical to best-fitting values from the basic models (Pearson's

*r*comparing best-fitting μ

*values from the two models, across all conditions, was 0.96,*

_{r}*p*< 0.00001), as expected. The parameter α, which governs the weight of evidence used across trials to infer the current state, was generally larger for strong versus weak stimuli unless feedback about the current state was given after each saccade (Wilcoxon rank-sum tests comparing S and WF,

*p*= 0.38; S and WNF,

*p*< 0.01; S and WNF

_{sp},

*p*< 0.001). The parameter

*S*

_{T}, representing the threshold of the LATER process, was lowest in the speeded condition (Fig. 6

*C*; Wilcoxon rank-sum tests comparing WNF

_{sp}separately with S, WF, and WNF,

*p*< 0.01 in each case). This result is consistent with the idea that this parameter governs the speed–accuracy trade-off (Gold and Shadlen, 2007; Bogacz et al., 2010), which for our subjects tended to emphasize accuracy (particularly in the WF condition) unless otherwise instructed (i.e., WNF

_{sp}).

The final two parameters governed subjective estimates of the two generative transition probabilities, *H*_{LR} and *H*_{RL}. Best-fitting values of these parameters were correlated with their corresponding objective values across the relatively broad range that we tested in the strong stimulus condition (Fig. 6*D*). This result indicates that the subjects learned and maintained approximately appropriate estimates of the generative transition probabilities based on experience. However, subjective estimates of either parameter alone tended to be biased toward values of 0.5 and, accordingly, their sum tended to be biased toward values of 1.0 (Fig. 6*E*). These biases were in the direction of history-independent values, as has been reported previously (Glaze et al., 2015). This bias was not simply driven by uncertainty about change points, because it was most reliably present when strong stimuli or weak stimuli with feedback were used (Fig. 6*D*). Therefore, in general, subjects tended to implicitly assume that the probability of the sound coming from a particular direction was more independent of the previous trials than it actually was.

### Saccadic priors reflected interactions between within- and across-saccade processes

These model fits also yielded insights into further interactions between the within-trial rise-to-threshold process that leads to a saccade and the across-trial inference process that establishes appropriate expectations about the state-dependent transition probabilities. When feedback was given (including all subjects in the WF condition and all but two subjects in the S condition), it provided unambiguous evidence for the current state. Accordingly, under these conditions, the within- and across-saccade decision processes were not strongly coupled. Specifically, the best-fitting values of α from the adaptive model, representing the weight of evidence used across trials to infer the current state, and *S*_{T}, representing the total prior and sensory evidence in the LATER process used to infer the current state before feedback, were unrelated to each other across subjects (Pearson's *r* = −0.21, *p* = 0.40; Fig. 7*A*). In contrast, when feedback was not given (the WNF and WNF_{sp} conditions), the total accumulated evidence for the within-saccade decision appeared to drive the across-saccade process. Specifically, there was an approximately linear relationship between best-fitting values of logit α and *S*_{T} across subjects (*r* = 0.76, *p* < 0.0001; Fig. 7*B*). This relationship reflected effects that occurred both across task conditions that explicitly manipulated *S*_{T} (median values of both α and *S*_{T} were smaller in the WNF_{sp} versus WNF conditions; Wilcoxon rank-sum tests, *p* < 0.01 in both cases) and within task conditions (WNF: *r* = 0.78, *p* = 0.0019, WNF_{sp}: *r* = 0.80, *p* = 0.0063; although only the former, which included strong integration across trials, held for a rank-order test, WNF: Spearman's ρ = 0.65, *p* = 0.0196, WNF_{sp}: ρ = 0.40, *p* = 0.26). We did not find any relationship between α and μ* _{r}*, which governed the within-trial evidence, for any task condition (

*p*> 0.05 for ρ in each case). This result implies that, in the absence of other feedback, the total, predefined level of evidence accumulated in each trial can be used by the across-trial inference process to update expectations about state-dependent transition probabilities.

## Discussion

We used an auditory discrimination task to determine how sequential, saccadic RTs reflect learned expectations in dynamic environments. We presented three main, novel findings. First, learned state-dependent transition probabilities affect the prior probability of making a particular saccade. In the context of saccadic RTs, these effects can be modeled effectively as changes in the starting point of a rise-to-threshold saccade-generation process. This finding is consistent with both experimental and theoretical work implicating a similar computational basis for certain effects of expectations on RT and choice behavior using other tasks (Carpenter and Williams, 1995; Bogacz et al., 2006; Gold and Shadlen, 2007; Mulder et al., 2012; Glaze et al., 2015). Second, these adaptive adjustments follow principles of normative evidence accumulation, including gradual updates across trials (saccades) when there is uncertainty about the current state (Wilson et al., 2010; Glaze et al., 2015). Third, this across-saccade evidence-accumulation process depends on a confidence-like signal (Kiani and Shadlen, 2009; Purcell and Kiani, 2016), which in our task was linked to the within-saccade decision threshold that was also sensitive to feedback. Together, these results imply that saccade generation is governed by sophisticated and interacting forms of inference that operate on multiple timescales. Below we discuss in more detail how these findings relate to previous studies and how they can help guide our understanding of the underlying neural mechanisms.

### Relationship to other work

Our findings provide new insights into the computational basis for sequential structure in saccadic RTs, as have been studied extensively using versions of the serial RT task (Nissen and Bullemer, 1987; Reed and Johnson, 1994; Schwarb and Schumacher, 2012). In particular, we showed that this sequential structure can reflect specific, saccade-by-saccade adjustments to normative, rise-to-threshold decision processes. These findings put RT measurements, long recognized as window into higher brain function (Luce, 1986; Noorani and Carpenter, 2016), on more even footing with other saccade metrics that have been linked to principles of normative decision making in uncertain environments. For example, in visual search tasks, visual saccadic inhibition of return reflects trial-by-trial updating of subjective prior beliefs based on recent experience and contextual cues in the environment (Ludwig et al., 2012). Similar to our findings, these effects can be accounted for via changes in the baseline value of a LATER-like model of saccadic decision making (Farrell et al., 2010). Saccadic accuracy in visual search, along with saccadic choices on a cued discrimination task, are similarly modulated by expectations in a manner that is consistent with optimal, probabilistic inference (Shimozaki et al., 2003; Eckstein et al., 2006). Together, these results imply that the timing, selection, and accuracy of saccadic eye movements can yield important insights into the complex computations that the brain employs to use expectations to optimize behavior.

These RT-sensitive computations include certain principles of optimal inference that are needed to make effective predictions in environments that are both uncertain and can undergo fundamental, unexpected changes (Behrens et al., 2007; Nassar et al., 2010; Payzan-LeNestour et al., 2013; Gallistel et al., 2014; Glaze et al., 2015). Somewhat similar principles appear to govern sequential RT effects measured in key-press tasks, although previous studies using those tasks focused only on conditions with little or no state uncertainty. Specifically, under those conditions, sequential key-press RTs appear to involve trial-by-trial updates of the starting or ending point of a rise-to-threshold decision process, following both normative and non-normative principles (Goldfarb et al., 2012; Zhang et al., 2014). We showed that similar updates to a normative decision process that governs saccadic RTs are sensitive to state-dependent transition probabilities, albeit with an apparent predisposition to assume history-independent values of ∼0.5 that may minimize computational costs (Drugowitsch et al., 2012; Shenhav et al., 2013). Moreover, in the presence of uncertainty, these processes can flexibly accumulate evidence to identify the current state of the environment. Both of these features of saccadic processing are hallmarks of high-level inference processes and thus support the notion that the saccadic system is a useful substrate for understanding, not just decisions that lead to immediate actions, but also how those decisions depend on learned behavioral contexts or models of the world (Gold and Shadlen, 2007; Otto et al., 2015; Purcell and Kiani, 2016).

### Possible neural mechanisms

A primary advantage of studying the saccadic system is our extensive knowledge of the underlying neural mechanisms. Our findings suggest several novel extensions to existing ideas about these mechanisms that can be used to guide future experiments. For example, the within-saccade decisions that in our task drive the selection and timing of the impending saccade relate directly to numerous lines of research involving measurements of neural activity in nonhuman primates performing relatively simple sensory-driven saccade tasks. These studies have identified correlates of the kind of rise-to-threshold processes described by the LATER model in a network of interconnected brain regions including the lateral intraparietal area (LIP), frontal eye field (FEF), superior colliculus (SC), and basal ganglia (Hanes and Schall, 1996; Horwitz and Newsome, 1999; Kim and Shadlen, 1999; Roitman and Shadlen, 2002; Ding and Gold, 2010; Ding and Gold, 2012). Consistent with the prior-driven changes in starting value of the LATER process that we found, baseline activity in each of these areas that occurs before the onset of visual cues instructing particular saccades can under certain conditions reflect the probabilities of making those saccades (Basso and Wurtz, 1997; Dorris and Munoz, 1998; Coe et al., 2002; Lauwereyns et al., 2002; Ding and Hikosaka, 2006; Rao et al., 2012). Our results suggest that these baseline modulations may be driven by across-saccade decision processes that are sensitive to stimulus discriminability, feedback, and change point dynamics.

There is less known about how and where in the brain these kinds of across-saccade processes are implemented. Nevertheless, neural correlates of some of the key computational components have been identified. In the context of saccadic eye movements, signals related to feedback monitoring have been identified in several brain regions, including parts of the medial frontal and lateral prefrontal cortex (Stuphorn et al., 2000; Scangos et al., 2013; Teichert et al., 2014). Signals related to the detection and use of environmental change point dynamics for inference problems have been found in the cingulate cortex and may also involve more general arousal-related modulations that include a primary recipient of cingulate output, the locus ceruleus–norepinephrine system (Behrens et al., 2007; Nassar et al., 2012; Tervo et al., 2014; Joshi et al., 2016). Integration of these signals across saccades may involve brain regions that encode history-dependent information related to saccade choices, which includes parts of parietal, cingulate, and prefrontal cortex (Sugrue et al., 2004; Bernacchia et al., 2011). This integration process also likely depends on working memory signals that involve interactions between the hippocampus and cortex (Shadlen and Shohamy, 2016), which would be intriguing targets of future studies using this kind of task.

We further propose that the interactions that we found between these within- and across-saccade decision processes may also have a compelling mapping onto a known neural circuit: the information-processing loops involving the cortex and basal ganglia. In the oculomotor version of this loop, cortical areas FEF and LIP both project to the caudate in the basal ganglia. Caudate output is divided between direct and indirect pathways through the basal ganglia that ultimately converge in the substantia nigra, pars reticulata (SNr). SNr output, in turn, goes both to the SC and subsequently to brainstem circuits that control movement and, via the thalamus, back up to cortex (Hikosaka et al., 2000). Recent work has emphasized this circuit's role in integrating sensory and other information to form decisions that guide saccadic behavior (Bogacz and Gurney, 2007; Ding and Gold, 2013; Wiecki and Frank, 2013). In part, these functions may involve adjusting the decision threshold, which is equivalent to *S*_{T} in our LATER model, but in the brain, may involve multiple changes in the dynamics of the within-saccade rise-to-threshold process (Lo and Wang, 2006; Forstmann et al., 2008; Heitz and Schall, 2012).

Together, these findings suggest that our task may provide a useful model system for studying functional information-processing loops in this cortical-basal ganglia system. In principle, these loops may involve the following flow of information. First, biases determined across trials are fed from cortex into the basal ganglia. Second, these biases are then integrated there with incoming sensory information to form a saccadic decision variable. Third, this decision variable terminates when reaching a predetermined bound that mediates the speed-accuracy trade-off. Fourth, this termination causes a release of inhibition that both facilitates the saccade plan encoded in the superior colliculus and causes information related to the decision bound to be sent back up to cortex, via the thalamus, to be used to determine subsequent biases. Among the novel hypotheses suggested by this scheme is that disruption of the basal ganglia–thalamus–cortex feedback pathway may hinder the ability of the brain to make decisions across saccades that depend on the amount of sensory evidence accumulated for each saccade. Our findings may provide a strong behavioral and theoretical basis for pursuing this kind of study in the future.

## Footnotes

This work was supported by the National Institutes of Health (Grants R01 EY015260 and NSF-NCS 1533623 to J.I.G.) and the University of Pennsylvania (Vagelos Undergraduate Research Grant and University Scholars Grant to T.D.K.). We thank Kenan Saleh for help with data collection and Roger Carpenter, Takahiro Doi, Long Ding, and Kamesh Krishnamurthy for comments.

The authors declare no competing financial interests.

- Correspondence should be addressed to Joshua I. Gold, Department of Neuroscience, University of Pennsylvania, D407 Richards Laboratories, Philadelphia, PA 19104-6072. jigold{at}mail.med.upenn.edu.