## Abstract

The brain makes use of noisy sensory inputs to produce eye, head, or arm motion. In most instances, the brain combines this sensory information with predictions about future events. Here, we propose that Kalman filtering can account for the dynamics of both visually guided and predictive motor behaviors within one simple unifying mechanism. Our model relies on two Kalman filters: (1) one processing visual information about retinal input; and (2) one maintaining a dynamic internal memory of target motion. The outputs of both Kalman filters are then combined in a statistically optimal manner, i.e., weighted with respect to their reliability. The model was tested on data from several smooth pursuit experiments and reproduced all major characteristics of visually guided and predictive smooth pursuit. This contrasts with the common belief that anticipatory pursuit, pursuit maintenance during target blanking, and zero-lag pursuit of sinusoidally moving targets all result from different control systems. This is the first instance of a model integrating all aspects of pursuit dynamics within one coherent and simple model and without switching between different parallel mechanisms. Our model suggests that the brain circuitry generating a pursuit command might be simpler than previously believed and only implement the functional equivalents of two Kalman filters whose outputs are optimally combined. It provides a general framework of how the brain can combine continuous sensory information with a dynamic internal memory and transform it into motor commands.

## Introduction

In less than a quarter of a second, the motion of a tennis ball hit by an opponent elicits smooth pursuit eye movements with appropriate speed and direction. This is despite very noisy motion information provided to the pursuit system by motion-sensitive areas (Osborne et al., 2005, 2007). To overcome this noise, the brain integrates stimulus information over time across large populations of neurons (Treue et al., 2000), which improves the reliability of sensory information (Snowden and Braddick, 1991; Perrett et al., 1998). How exactly this integration of information occurs and how it contributes to the movement dynamics remain unknown. To fill this gap, we propose a new model for the sensory estimation of visual motion signals through Kalman filtering (Kalman, 1960) and show as a proof-of-concept that such a simple mechanism can reproduce many seemingly unrelated findings regarding the control of smooth pursuit eye movements.

The smooth pursuit system also relies on prediction of target motion to overcome sensory processing delays and to accurately follow the target. Predictive smooth pursuit can be observed during sinusoidal pursuit in which the phase lag between eye and target velocity is almost zero (Dallos and Jones, 1963). They are present before target motion onset in which the eyes start to move smoothly before the start of expected target motion (Barnes and Asselman, 1991). They are found during the transient disappearance of a moving target in which eye velocity is maintained at non-zero values and is predictively increased before target reappearance (Bennett and Barnes, 2003; Orban de Xivry et al., 2006; Coppe et al., 2012). These different aspects of predictive smooth pursuit have been viewed as different predictive systems (i.e., short-term vs long-term prediction; Barnes, 2008) that are believed to be mechanistically very different from visually guided smooth pursuit. How the visual and predictive signals are combined in the brain remains essentially unknown. Previous models combine the visually guided and predictive smooth pursuit responses by means of an artificial switch mechanism (Barnes, 2008). In addition, those models were purely deterministic while the brain has to deal with noisy sensory and internal signals.

Here, we specifically designed a model describing sensory and memorized motion processing of noisy sensory inputs and tested its principles for smooth pursuit eye movements. Our model is based on two Kalman filters: (1) one estimating retinal motion for visually guided movements; and (2) one computing a dynamic internal representation of target motion (Orban de Xivry et al., 2008) for predictive movements. The measure of uncertainty associated with sensory and predictive signals is used to combine those signals in a statistically optimal way (Ernst and Banks, 2002; Vaziri et al., 2006; Ronsse et al., 2009). The emerging properties of our stochastic model account for several aspects of pursuit dynamics, but similar mechanisms could be at play in controlling predictive and sensory-driven movements of other effectors, such as the arm or head.

## Materials and Methods

#### Global structure

The model aims at estimating the current retinal slip (RS) through two different mechanisms: (1) Kalman filtering of noisy (delayed) sensory inputs; and (2) prediction of future RS. In Kalman filtering, noisy sensory inputs are combined with the previous estimate of RS to refine this estimation. Noisy sensory inputs are delayed by 80 ms (Krauzlis and Lisberger, 1994). The predicted RS is obtained from an efference copy of eye velocity and an internal representation of target motion (obtained from memory if available). Estimated and predicted values of the RS are then combined in a statistically optimal manner (Bayesian integration) and processed similarly as in previous models. We will develop the details of the model below.

More specifically, the general structure of the model is shown in Figure 1. The model takes target velocity as its input and produces eye velocity as its output. The model consists of three main parts. (1) Visual processing computes an estimation of the sensory RS (^{Sens}) and its associated uncertainty (Σ^{Sens}) from target and eye motion signals. (2) The internal representation block computes a prediction of the estimated RS (^{Mem}) and an estimate of its uncertainty (Σ^{Pred}) based on past information about target motion (_{k}^{Mem}) and the current estimation of the eye velocity (coming from an efference copy). In some cases, there is nothing stored in the memory and a default internal representation is used instead (see below). (3) The motion pathway produces the motor command, using both sensory and predicted RSs, weighted by their uncertainty (Bayesian integration).

The processing of the motor command is a simplified version of classical pursuit models (Krauzlis and Lisberger, 1994). It is transmitted to the premotor system, which includes a pathway with a gain element *T*_{1} and a second parallel pathway with a neural integrator. Its output is sent to both the eye plant producing the eye velocity signal and an internal model that predicts the eye velocity that will be produced by these motor commands. The internal estimation of future eye velocity is combined with the estimation of the sensory RS and stored in memory to compute the internal representation of target motion for the future trials. The eye plant approximates the dynamics of the eyeball, the extraocular muscles, and the tissues that surround them. Its transfer function is composed of a second-order function, with time constants *T*_{1} and *T*_{2} fixed to 170 and 13 ms (Robinson, 1976; Zee and Robinson, 1979; Robinson et al., 1986). There is no noise added to the eye plant (no motor noise simulated).

#### Sensory pathway

##### The noisy RS as input.

The RS is estimated from the slip of the image of the target on the retina, i.e., by subtracting eye velocity from target velocity (Fig. 2). Information about RS appears to be noisy (Osborne et al., 2005, 2007; Stein et al., 2005; Osborne and Lisberger, 2009; Gold and Watanabe, 2010). In the middle temporal area (MT), some neurons are tuned for speed (Maunsell and Van Essen, 1983; Nover et al., 2005). The tuning curves are broader for neurons coding higher speed (Maunsell and Van Essen, 1983; Nover et al., 2005). Therefore, representation of higher speed might be more variable than lower speed.

To obtain a noisy sensory input, we computed the actual RS (RS^{det}) and added both additive and multiplicative noise to the signal. With this simplified computation, we make the assumption that the problem of motion integration is solved. A Bayesian account of the motion integration stage has been proposed in other studies (Dimova and Denham, 2009; Bogadhi et al., 2011). Additive noise was used to specify a baseline sensory noise level, and multiplicative noise was used to simulate signal-dependent noise (Harris and Wolpert, 1998). This resulted in the following expression:
with RS^{det} the deterministic value of RS (current target velocity minus current eye velocity), δ_{add} ∼*N*(0, σ_{sens,add}^{2}), and δ_{mult} ∼*N*(0, σ_{sens,mult}^{2}). This noisy signal was then processed (visual processing box in Fig. 1) to obtain an estimate of the RS (^{Sens}).

##### Generative model of the sensory world.

To compute an optimal estimate of the sensory RS through Kalman filtering, a generative model about the evolution and observation of this variable is needed. In this framework, the evolution of the RS signal is described as a random walk by the following equation:
where RS^{Sens} represents the internal estimate of the RS, and θ* _{k}* is an additional process noise with θ

*∼*

_{k}*N*(0, Q

^{2}) representing the state variability.

In addition, the brain gets some observation of RS^{Obs} that represents a noisy version of RS^{Sens}:
where γ* _{k}* and

*v*are the multiplicative and additive measurement noises with γ

_{k}*∼*

_{k}*N*(0,

*D*

^{2}) and

*v*∼

_{k}*N*(0,

*R*

^{2}). Equations 2 and 3 represent the generative model. The noise characteristics (

*Q*,

*D*, and

*R*) of the generative model are critical for the implementation of the Kalman filter (see below, Eqs. 5, 6).

##### Optimal estimation of the sensory RS.

This section details the actual implementation of the Kalman filter that is used to estimate the sensory RS. The Kalman filter aims at estimating a hidden variable from a noisy input. In our simulations, the observed input is RS^{Noisy} computed following Equation 1. We hypothesize that σ_{sens,mult} and σ_{sens,add} (Eq. 1) are equal to *D* and *R* (Eq. 3). That is, the brain knows the statistics of the noise that perturbs the retinal signal, which is required to tune the Kalman filter. However, an exact knowledge of those noise parameters is not crucial, as we will discuss at the end of Results (Beck et al., 2012).

Given the observed value of the RS (RS_{k}^{Noisy}) and the previous estimate of the RS (_{k}^{Sens}) and given the knowledge of the dynamics of the system (Eq. 2) and the statistics of the noise in the sensory input (Eq. 3), an optimal estimate of the RS on the next time step can be obtained through Kalman filtering:
where RS_{k}^{Noisy} is the noisy signal of RS observed by the brain at time *k* (with a sensory delay of 80 ms), and η* _{t}* is the internal noise of the estimation with η

*∼*

_{t}*N*(0, Ω

_{n}

^{2}).

Following Kalman theory and because the different noises are Gaussian and uncorrelated, the optimal value for *K _{k}* can be computed as follows (Todorov, 2005; Izawa et al., 2008):
with
where Σ

_{k}

^{Sens}is the estimated error variance. This estimation will be used to assess the uncertainty associated with the estimation of

_{k}

^{Sens}.

#### Predictive pathway

##### One hundred fifty milliseconds into the future.

The output of the predictive pathway corresponds to the measured RS that is expected 150 ms later if eye velocity did not change. This 150 ms period gives the predictive pathway an advance of 70 ms with respect to the actual RS. The remaining 80 ms compensate for the visual feedback delays. Target velocity is computed through Kalman filtering in a way similar to the estimation of RS. To do so, we also need current eye velocity (*ė*_{k}^{eff}) from an efference copy signal that can be estimated through an internal model of the eye plant (Skavenski and Robinson, 1973; Robinson, 1981). Therefore, the output of the predictive pathway at time *k* is:
where _{k}^{Mem} represents the predicted target velocity that is stored in memory with an advance of 150 ms (to see how it is computed, see Eq. 14).

##### Noisy target velocity as input.

In the absence of visual information (e.g., blanking of the target), there is good evidence suggesting that a prediction of future target motion can be used to drive the smooth pursuit system (for review, see Barnes 2008). In our model, the predictive pathway receives noisy target velocity as an input:
where TV is the actual value of target velocity, ε_{add} ∼*N*(0, σ_{pred,add}^{2}), and ε_{mult} ∼*N*(0, σ_{pred,mult}^{2}). The actual value of TV* _{k}* is not available to the system and is approximated by summing

_{k}

^{Sens}and

*ė*

_{k}

^{eff}.

##### Generative model of target motion.

To compute an optimal estimate of target velocity through Kalman filtering, a generative model about the evolution and observation of this variable is needed. This model represents the current knowledge of the statistics of the target velocity, i.e., how target velocity will evolve over time:
where TV_{k}^{Pred} is target velocity, and *u _{k}* represents the change in velocity from time

*k*to

*k*+ 1 that is computed from the best prediction of target motion currently available (see Eq. 14), with α

*∼*

_{k}*N*(0,

*Q*

_{pred}

^{2}) being the process noise and

*B*

_{int}= 1.

The predicted dynamics of target velocity (Eq. 8) has an additional term (*u _{k}*) compared with the dynamics for RS (Eq. 2). This term represents the knowledge that future target velocity will vary with time. The control signal

*u*represents the expected change (i.e., computed from memory) in target velocity from time

_{k}*k*to time

*k*+ 1 (Eq. 9).

As in Equation 3, the internal prediction of target motion needs to be compared with the actual value of target velocity. We hypothesize that the brain knows this observation is corrupted by noise, thus
with β* _{k}* ∼

*N*(0,

*R*

_{pred}

^{2}) and φ

*∼*

_{k}*N*(0,

*D*

_{pred}

^{2}), the additive and multiplicative noises.

##### Optimal estimation of target motion.

This section details the actual implementation of the Kalman filter that is used to estimate target motion. The Kalman filter aims at estimating target motion from a noisy input. This estimate of target motion is not used during the current trial but is stored in memory (Fig. 1) and will be used for the next trial (see below, Eq. 14).

In our model, the observed input is TV^{Noisy} that is computed following Equation 8. It is again hypothesized that σ_{pred,add} = *R*_{pred} and σ_{pred,mult} = *D*_{pred}.

Given the observed target motion (TV_{k}^{Noisy}) and its estimate (_{k}^{Pred}) and given the knowledge of the different noise statistics (*R*_{pred}, *D*_{pred}, and *Q*_{pred}), target motion can be estimated in a statistically optimal manner. The internal representation _{k}^{Pred} is updated thanks to the observation TV_{k}^{Noisy} through Kalman filtering:
where _{k}^{Pred} is the predicted target velocity at time *k* computed by the Kalman filter, and ε* ^{k}* ∼

*N*(0, Ω

_{e}

^{2}) is the internal noise. The uncertainty Σ

^{Pred}is given by the following: The variance of the prediction Σ

_{k}

^{Pred}will be used to combine predicted and observed RSs in the motion pathway (see below).

##### Using optimal estimate in the next trial.

Because _{k}^{Mem} tries to anticipate the target motion 150 ms into the future, it has to rely on past experience, e.g., from previous trials. Thus, current estimates of _{k}^{Pred} (Eq. 11) are only used to improve the internal representation of target motion for the next trials but not the current trial. As a result, the model learns and improves its internal estimation of target trajectory over consecutive trials. That is, the estimated target motion at trial *j* is used as the internal prediction of target motion at trial *j* + 1, with some noise:
where δ_{mult}^{Pred} ∼*N*(0, Int_{mult}^{2}) and δ_{add}^{Pred} ∼*N*(0, Int_{add}^{2}). The content of the memory is used to drive smooth pursuit after combination with the output of the sensory pathway (see below). It is also used to build the internal representation of target motion that will be used on the next trial (Eq. 11).

When simulating the first appearance of a moving target, a default internal representation will be used (i.e., with value zero for the first 150 ms and directly based on _{Sens} after that period). That is, _{k}^{Mem} = _{k + 80}^{Sens}, which represents a short-term extrapolation of future target motion to compensate the sensory delay.

##### Motion pathway.

The inputs of the motion pathway come from two different sources of information [Fig. 3: sensory (i.e., retinal) and internal (i.e., extraretinal) signals]. These two sources of information are weighted in function of their estimated variability (Σ^{Sens} and Σ^{Mem}) in a Bayesian way. We approximated Σ^{Mem} by Σ^{Pred} because we noticed that Σ^{Pred} did not vary a lot from trial to trial. The optimal weighting of such stochastic variables (Ernst and Banks, 2002; Vaziri et al., 2006) is given by the following:
The model needs an initial condition for the uncertainty of the sensory and internal signals. This initial condition is based on previous experience. At movement onset, the visual feedback will be favored if the internal representation is absent or inaccurate (Σ^{Pred} is large). Practically, this is implemented by increasing the value of *Q*^{Pred} (the process noise), which will increase the noise of the internal representation and then favor sensory information. Changing the initial conditions given to the sensory variance (e.g., after target disappearance Σ^{Sens} = ∞ as a result of the absence of retinal feedback) can thus lead to an abrupt change of relative weighting between sensory and internal signals.

In the final steps of our model, the weighted *G _{v}*), a second-order filter (

*H*), and a gain element (

_{v}*A*). The parameters we used were similar to those used by Krauzlis and Lisberger (1994) and were equal to the following: The output of this image velocity motion pathway was then sent to a (leaky) integrator (called “positive efferent copy feedback loop” by Krauzlis and Miles, 1996) that is characterized by two variables:

_{v}*G*

_{int}and τ.

*G*

_{int}determines the gain of the pursuit system (gain ∼

*G*

_{int}

^{2}). It is set to

In the present model, the memory used for prediction stores a representation of target motion, which is dynamically updated by the Kalman filter based on past information. This contrasts with models in which short-term memories store efferent copies of eye motor commands that are replayed to replace the retinal information for predictive pursuit (Krauzlis and Miles, 1996; Bennett and Barnes, 2003; Madelain and Krauzlis, 2003). Therefore, the motion pathway is active during both visually guided pursuit and in the absence of retinal signals.

##### Simulation parameters.

Our model was implemented in MATLAB/Simulink (Mathworks). We ran all simulations with a fixed step size of 1 ms using the fourth-order Runge–Kutta method from Simulink (MATLAB) for numerical integration.

The SD of the different stochastic variables (Eq. 1) was tuned to produce a qualitatively correct pursuit response in a wide variety of contexts. Note that the set of used parameters represents only one possible choice inside a large manifold. Parameters were kept constant for all stimulations. The variances of the additive and multiplicative noise of the sensory signals were set to σ_{sens,add} = 10, σ_{sens,mult} = 1.5 (Eq. 1). In most instances (except for the sensitivity analysis; see Fig. 14), the variances of the additive and multiplicative noise fed to the Kalman filter matched the actual variances: *R* = σ_{sens,add} and *D* = σ_{sens,mult}. For the predictive pathway, these values were decreased by a factor 2: σ_{pred,add} = 5 and σ_{pred,mult} = 0.75 (Eq. 8) and *R*^{Pred} = σ_{pred,add} and *D*^{Pred} = σ_{pred,mult} (Eq. 10). The noise of the Kalman estimation was the same for the sensory and predictive Kalman filters: Ω* _{n}* = Ω

*= 0.3 (Eqs. 4, 11). From trial to trial, the internal representation of target motion was slightly corrupted with additive (Int*

_{e}_{add}= 1) and multiplicative Gaussian noises (Int

_{mult}= 0.1; Eq. 14) to simulate memory decay. Initial conditions of noise variances were set to Σ

^{Sens}= 1 and Σ

^{Pred}= 1. None of these parameters changed during simulations;

*B*

_{int}= 1 (Eq. 9).

For all trials in which no predictions were available (Figs. 4, 5, first half-cycle in Fig. 6, first pursuit trial in Fig. 8), the noises for the system dynamics (*Q* and *Q*^{Pred}) of RS and TV were set to 1 (Eq. 9). For other trials when target motion prediction was available (Fig. 6 for all half-cycles except the first one, Figs. 7, 8 for all pursuit trials except the first one, Figs. 9, 10), the noise added to the dynamics of the system for TV (Eq. 9) was reduced (*Q*^{Pred} = 0.3) because we assumed that the knowledge about target velocity decreased the associated uncertainty.

In simulations or human data, pursuit onset was detected by fitting (mean least squares regression) a piece-wise linear function on the eye velocity trace measured during an interval of 300 ms starting at stimulus onset, as follows:
where *t* is the time (seconds), *T* is the time of pursuit onset (seconds), *A* is the level of eye velocity before pursuit onset (degrees per second), and *B* is the mean acceleration during pursuit initiation (degrees per square seconds). The constants *A*, *B*, and *T* were considered as the free parameters of the function.

For sinusoidal pursuit (Fig. 7), the pursuit gain for a half-cycle was computed as the peak eye velocity divided by the peak target velocity. The phase was obtained as the difference in time (expressed in angular degrees) between those two peaks.

When the target was blanked (Figs. 10⇓–12), we set Σ^{Sens} = ∞, because there was no more visual input to the sensory system. This choice allows us to cancel any influence of the visual pathway on smooth pursuit response during target blanking. Therefore, we have _{k}^{Pred} during this period of time. During blanking, the noise values for * _{k}* (Eq. 14) increased linearly as the estimation of the target became more uncertain with time evolving. At time

*X*(in seconds after target disappearance), Int

_{mult}and Int

_{add}were both multiplied by (1 +

*X*) because we hypothesized that uncertainty should increase when there is no sensory feedback (Wei and Körding, 2010). To simulate the predictive recovery of eye velocity,

*G*

_{int}was increased linearly from a time

*T*that was randomly chosen between 50 and 250 ms before target reappearance. The slope of this increase was equal to

## Results

The Kalman filtering approach allows us to simulate smooth pursuit behavior in a wide variety of contexts that range from purely visually guided pursuit to purely predictive pursuit in the absence of any visual input. In the first part of Results, we will describe how the sensory Kalman filter can account for visually guided smooth pursuit dynamics. Then, we will show how our approach of predictive pursuit allows us to unify the different aspects of predictive smooth pursuit within one simple mechanism, i.e., pursuit during sinusoidal motion, anticipatory pursuit before target motion onset, and pursuit during transient target disappearance. Finally, we will demonstrate that our simulations are very robust to changes in the parameters of the Kalman filter.

### Simulations of visually guided pursuit: comparison with human data

The model performance in response to the sudden motion of a visible target is illustrated in Figure 4*A*. After a fixation period (∼500 ms), the target underwent a “step ramp” motion (20°/s). Different simulations are shown in gray, showing the variability of the model output attributable to the simulated sensory noise.

One of the feature landmarks of pursuit initiation is a very stereotypical initial acceleration component (Lisberger et al., 1981; Carl and Gellman, 1987; de Brouwer et al., 2002). Therefore, we analyzed pursuit acceleration as predicted by the model by simulating different target velocity steps. We compared the simulated results with experimental results from human subjects (data were reanalyzed from de Brouwer et al., 2002). This comparison is shown in Figure 4*B*. We measured the acceleration as the mean acceleration in the interval between 80 and 180 ms after pursuit onset (see Materials and Methods). Note that this model did not contain a specific saturation function for pursuit acceleration, as was the case in previous models (Bennett and Barnes, 2003, 2006). Despite the absence of an explicit saturation function, the model simulations closely matched smooth pursuit acceleration in humans for values of RS up to 50°/s and also exhibited a saturation effect of acceleration at ∼150°/s^{2}. This saturation arises from the signal-dependent noise that perturbs the sensory input.

The latency of the pursuit response was 120 ± 14 ms (mean ± SD). These values were similar to those measured by Tarnutzer et al. (2007) and Rasche and Gegenfurtner (2009). The SD observed when initiating pursuit at 20°/s (500 ms after target onset) was ∼2°/s (10%). This value was also similar to those observed in previous studies (Osborne et al., 2005; Rasche and Gegenfurtner, 2009). Thus, our new model reproduced human visually guided pursuit dynamics with high fidelity only using a simple Kalman filter mechanism.

### Simulations of visually guided pursuit: comparison with previous models

To directly compare our model predictions to previously published models, we simulated two classical models in Figure 5 (Robinson et al., 1986; Krauzlis and Lisberger, 1994). Simulations of pursuit initiation to a target moving at a constant velocity generated with the pursuit model of Robinson et al. (1986) were very similar to those produced with the pursuit model of Krauzlis and Lisberger (1994); pursuit acceleration and the overshoot at the end of the initiation phase were comparable. Our model did not reproduce the ringing behavior present in these previous models. However, our model appears consistent with many recent studies that did not exhibit a clear ringing behavior (Orban de Xivry et al., 2006; Spering and Gegenfurtner, 2007; Medina and Lisberger, 2009). It is very likely that the context of the experiments plays a major role in this ringing behavior (e.g., repetition of similar target trajectories).

It is worth noting that the current pursuit model does not contain an “image acceleration motion pathway,” as was the case in previous models (Krauzlis and Lisberger, 1994; Krauzlis and Miles, 1996). This image acceleration motion pathway was necessary in these previous models to generate persistent oscillations of eye velocity during sustained pursuit and to avoid large overshoots at the end of the pursuit initiation phase. In our model, it is not necessary because the weight of the signals from the predictive pathway increases after pursuit initiation. The predictive pathway limits the overshoot at the end of pursuit initiation (because the predictive system uses an estimation of the RS with an advance of 150 ms, predictive pursuit velocity will not overshoot target velocity).

### Simulations of smooth pursuit in response to a sinusoidally moving target

Tracking a sinusoidally moving target results in a smaller phase lag than when tracking unpredictable target motion (Dallos and Jones, 1963; Yasui and Young, 1984; Barnes et al., 1987, 2000; Barnes and Ruddock, 1989) because the smooth pursuit system can exploit the predictable nature of target motion. Our model can achieve such small phase lags by using the memory acquired during half a cycle of target motion as an internal representation of future target motion for the next half-cycle. As can be observed in Figure 6, initially, there is no internal representation of future target motion (first half-cycle). During this period, the model relies solely on sensory signals (^{Sens}), with a sensorimotor delay of 80 ms and lags behind the target (Fig. 6). Then, target motion estimated during this period is used as an internal representation of target motion for the second half-cycle. This internal representation will be gradually refined over time with each cycle of target motion. For all ensuing half-cycles, the estimated target motion obtained during one half-cycle is retrieved during the next one. The use of this internal representation yielded a dramatic decrease in the phase lag to levels comparable with experimental observations [phase lag <10° for low frequencies (<0.4 Hz); for review, see Barnes, 2008], and the pursuit gain was close to 1.

After several such cycles, the phase lag and pursuit gain reached similar levels to those reported in human subjects. We quantified this behavior in Figure 7, comparing the pursuit gain and phase lag from our model to the ones measured experimentally in Barnes et al. (1987). Our model closely matched the experimental findings without any change in parameters.

### Simulations of anticipatory pursuit before target motion onset

Building an internal representation of target motion from past experience is advantageous for tracking sinusoidally moving targets but also for anticipating future target motion. Indeed, it is well established that, when the direction of target motion and the timing of its onset are known, subjects are able to move their eyes before target onset (Kowler and Steinman, 1979a,b). Similarly, in our model, after a few trials of repeated identical target motion, the memory contains enough information to construct a reliable internal representation of the future target velocity. Therefore, the model exhibits anticipatory pursuit responses to the expected future target motion, i.e., the eyes start moving before target motion onset. Figure 8*A* illustrates a reactive pursuit response at first target presentation with visually guided eye movements, and Figure 8*B–D* shows anticipatory eye movements in consecutive trials using the prediction of the future target velocity (attributable to the internal memory Kalman filter estimating target motion 150 ms into the future). As can be observed in Figure 8, there was a marked increase of anticipatory pursuit over the course of trials. In our model, anticipation arises from the use of the internal representation of target motion stored in memory that is replayed with an advance of 150 ms.

The update of the internal representation of target motion after each trial will also naturally lead to a “history effect” (Kowler et al., 1984). This effect is observed when target motion varies from trial to trial and results in a weighted average of target motion from multiple previous trials in the internal memory. As a consequence of this history effect, anticipatory eye velocity will be stronger after two consecutive trials with fast target motion than after one trial with low and one trial with fast target motion (Poliakoff et al., 2005). Thus, our model was able to capture all aspects of anticipatory smooth eye movements thanks to an internal representation of target motion that was refined over the course of trials by a Kalman filter.

### Simulations of learning by observation

The building of the internal representation can be achieved without any eye movements. Indeed, it is known that anticipatory smooth pursuit eye movements can arise after viewing but not pursuing a moving target (Barnes et al., 1997). In our model, eye movements can be inhibited during target motion by setting *G*_{int} to 0. After viewing a moving target three times, the internal representation that was built can give rise to a clear predictive smooth pursuit response (Fig. 9, solid black line). This predictive response differs from the pursuit response observed in the absence of an internal representation (Fig. 9, dashed line).

### Simulations of smooth pursuit during target blanking

Another important instance of predictive smooth pursuit eye movements can be observed when the moving target is temporarily blanked for several hundred milliseconds (Orban de Xivry and Lefèvre, 2007; Orban de Xivry et al., 2008). We simulate this case of predictive pursuit in Figure 10 in a 1 s target blanking paradigm.

As in human subjects, simulated eye velocity decreased after target blanking until reaching an asymptotic velocity plateau. Both τ and *G*_{int} in the motion pathway (Fig. 3) determine the rate at which pursuit velocity decays toward the residual velocity plateau (Becker and Fuchs, 1985). The time constant is *G*_{int} = 0.5. This time constant was similar to the one measured in previous studies (Becker and Fuchs, 1985; Orban de Xivry et al., 2008). For the simulations in Figure 10*A*, we kept τ = 100 ms constant, and *G*_{int} was set to 0.6 during blanking periods (see Materials and Methods, Simulation parameters). In general, the smooth pursuit response was variable, even during the blanking period, despite the fact that the gain *G*_{int} was kept constant for all trials. This variability resulted from the noisy ^{Pred}.

Two different simulations are highlighted in blue and red (Fig. 10*A*), differing only in the behavior near target reappearance. The red trace illustrates a trial in which there was no predictive recovery, i.e., the pursuit model did not anticipate target reappearance. On this trial, the large acceleration occurring after target reappearance was attributable to sensory signals (visually guided acceleration). However, experimentally, there is a clear predictive reacceleration before the target reappears on most trials of such paradigms (Bennett and Barnes, 2003). Such a predictive recovery is illustrated by the blue trace in Figure 10*A*. This recovery results from an increase in *G*_{int} before target reappearance. The time of this increase was varied from 250 to 50 ms before target reappearance. Late increase in *G*_{int} did not yield strong predictive recovery, whereas early increase in *G*_{int} did. The existence of such an independent time control mechanism was confirmed by the specific impairment of predictive smooth pursuit eye movements in frontotemporal dementia patients (Coppe et al., 2012).

Different residual pursuit velocities have been reported in the literature for an individual subject (Orban de Xivry et al., 2006, their Fig. 8). Our model can reproduce these differences by changing the value of *G*_{int}. In Figure 10*B*, each simulation had a different value of *G*_{int} (ranging from 0.9 down to 0.4; see Materials and methods), resulting in different residual velocities.

Given that the model possesses a dynamic internal representation of target motion (Orban de Xivry et al., 2008), it was also able to reproduce the behavior during the blanking of an accelerating target (Bennett and Barnes, 2006; Bennett et al., 2007; Fig. 11) or a sinusoidally moving target (Whittaker and Eaholtz, 1982; Kveraga et al., 2001; Fukushima et al., 2002; Fig. 12). When the target was accelerating during the blanking period, the residual eye velocity was slightly increasing, as is the case in human recordings (Fig. 11). Note that, in our simulations, the value of *G*_{int} was constant during the occlusion period, whereas other models with a static internal representation increased the value of the gain to obtain the eye acceleration during the blanking (Bennett and Barnes, 2006). Instead, in our model, the acceleration during the blanking period was attributable to the internal representation of target motion trajectory.

As mentioned above, the use of a static internal representation would fail to provide appropriate target velocity signals for predictive tracking if the eye velocity is zero when the target is blanked (Orban de Xivry et al., 2008). In this case, a pursuit model with a static memory would not be able to increase its pursuit velocity, because there is no more retinal information and no velocity stored into the short-term memory. A simple example of this situation is shown in Figure 12 in which the sinusoidally moving target disappears a bit before eye velocity is zero. When the visual information of target disappearance was available (80 ms after its disappearance), the eye velocity was equal to zero. In our model, the pursuit velocity increased after target blanking thanks to the dynamic internal representation of target motion. During the blanking period, the mean pursuit gain was not as large as before (because of a decrease of *G*_{int} from 1 to 0.7 during the blanking period), but pursuit was clearly larger than zero and synchronized with the target motion, showing that the internal dynamic representation of the target motion (i.e., based on past information of the target motion) correctly updated the RS information. In contrast, a static memory would predict zero eye velocity during the blanking period (Fig. 12, dashed trace). To the best of our knowledge, our new model is the first capable to simulate this behavior.

In summary, the predictive pathway that is based on a trial-to-trial refinement of a dynamic internal representation of target motion allowed us to simulate the three different instances of predictive smooth pursuit that were thought to rely on different systems and to be primarily disconnected (Barnes, 2008), i.e., anticipatory smooth pursuit eye movements before target motion onset (Barnes and Asselman, 1991), predictive pursuit of a sinusoidal target motion (Stark et al., 1962), and predictive pursuit during transient target disappearance (Mitrani and Dimitrov, 1978).

### Weighting of the sensory and predictive response

The model is able to modulate the weights of the sensory and predictive pathways when there is a conflict between them. For instance, an internal representation of a target moving at 10°/s can be in conflict with a visual target moving at 15°/s (Fig. 13). In this case, the smooth pursuit response will be biased by the most reliable representation. If the sensory RS has normal noise, it will bias the smooth pursuit response toward the actual target velocity (Fig. 13, solid black line). In contrast, if the target is dimmed, the internal representation will primarily dominate the smooth pursuit response (Fig. 13, dashed black line). The noise parameters are critical for determining the level of eye velocity at the end of the trial. This reweighting of the sensory and predictive pathways arises from the optimal combination of their signals (Eq. 15).

### Sensitivity to noise

The introduction of stochastic noise was one of the most important features of our new model. Changing the amplitude of the noise present in the system will obviously influence the pursuit response, but changing the noise can also be interpreted differently depending on whether the brain is aware of this change or not. For example, if one increases the noise present in the system (σ_{sens,add} and σ_{sens,mult} in Eq. 1) and the estimated noise by the brain (*D* and *R* in Eq. 3) in the same way, then the pursuit response will simply become slower, with a lower acceleration. This is attributable to the Kalman filter effectively taking longer to estimate target motion as a result of the increase in noise. However, if one only changes the system noise (Eq. 1) without adjusting the estimated noise, the story is different. In this case, the brain does not correctly estimate the noise present in the system, leading to suboptimality (Beck et al., 2012). Figure 14 illustrates how much a suboptimal estimation of the noise can affect the pursuit response. When the noise in the model was overestimated, the pursuit variability decreased (Fig. 14*B*, blue) but the average pursuit response was little affected (Fig. 14*A*, blue). In contrast, pursuit variability was dramatically amplified when the brain underestimated the noise present in the system, whereas the average pursuit response was again only slightly modulated (Fig. 14, red). Thus, our results would argue that, practically speaking, the brain only needs a reasonable approximation of the noise of the system but does not require precise knowledge of it.

## Discussion

We propose a novel model integrating sensory and predictive signals that was tested on smooth pursuit eye movements. In this model, Kalman filters provide an estimate of the retinal and extraretinal inputs and their associated uncertainty. This uncertainty is used to combine the sensory and internal memory signals in a weighted manner (Bayesian integration). Kalman filtering allows our model to naturally produce both predictive and sensory-driven pursuit characteristics using a single mechanism.

### Kalman filtering to estimate and combine retinal and extraretinal inputs

Previous studies (Osborne et al., 2005, 2007) have highlighted the importance of noise in the sensory signals, explaining most of the variability in the initiation of pursuit eye movements. It is also known that the nervous system lowers the impact of noise on the sensory estimation (Karmali and Merfeld, 2012). In the present model, noisy retinal inputs induce variability on pursuit latency and acceleration during pursuit initiation. During initiation, the integration of the noisy inputs through Kalman filtering helps reduce the noise and refine the internal representation of target motion.

The uncertainty associated with the output of the Kalman filter yields a straightforward way for combining retinal and extraretinal signals in a statistically optimal manner (Ronsse et al., 2009). This is consistent with experimental studies demonstrating the continuous presence of predictive signals despite the randomization of the target parameters (Kowler and McKee, 1987; Kao and Morrow, 1994). It argues against the existence of a switch between the sensory and predictive pathways (Barnes, 2008). This optimal weighting of two different noisy sources of information is a hallmark of brain function (van Beers et al., 1999; Ernst and Banks, 2002; Hillis et al., 2002; Körding and Wolpert, 2004; Vaziri et al., 2006), even at the single-neuron level (Fetsch et al., 2012). For the first time, our model combines continuous flow of sensory and predictive information to drive motor behavior, whereas a previous model used Kalman filtering for online prediction only (Shibata et al., 2005). A possible implementation of Kalman filters in the brain has been proposed previously (Denève et al., 2007), but a full model based on Kalman filters that integrates aspects of predictive and reactive motor control had never been developed. Our particular, simple model based on Kalman filters could account for all the main dynamics of predictive and reactive smooth pursuit eye movements without switching between submodels and with the same model parameters.

In the Kalman filtering approach, it is hypothesized that the brain knows the properties of the noise (i.e., its variance; Orbán and Wolpert, 2011; Gallistel and Matzel, 2013), yet if the estimated noise properties are incorrect, the estimate of the sensory input might be either biased or overly variable (Beck et al., 2012). This perspective suggests that the processing of visual information through Kalman filtering might be unreliable because of the required knowledge of noise properties. In contrast, we demonstrate here that, in the pursuit system, the dynamics are quite robust with respect to this hypothesis and that incomplete knowledge of noise properties mostly influences the variability of the response but not its average. Therefore, Kalman filtering appears to be a robust solution for the processing of sensory inputs even when the noise properties are not known accurately.

### Using a dynamic representation of the expected target motion

Previous models always used a static (constant in time) memory module for the predictive component, which only allowed to make predictions at a fixed horizon (Shibata et al., 2005; Soechting et al., 2010) or which only allowed for constant velocity inputs (Bennett and Barnes, 2003; Churchland et al., 2003; Madelain and Krauzlis, 2003). For instance, Shibata et al. (2005) used a Kalman filter for online prediction of target trajectory. In contrast, in the present model, Kalman filtering was also used to build a dynamic memory from past information, even in the absence of active pursuit of target motion (Barnes et al., 1997). Indeed, the history of target motion can be used to predict future target motion when tracking a sinusoidally moving target or a similar periodic pattern (Dallos and Jones, 1963; Barnes et al., 1987; Barnes and Ruddock, 1989), when anticipating the timing of target motion (Barnes and Asselman, 1991; Wells and Barnes, 1998; Barnes and Donelan, 1999; Barnes et al., 2000), or when scaling the anticipatory eye velocity with the previous target velocity (Barnes and Collins, 2008). Our model demonstrates for the first time that all these features can stem from the existence of an internal representation of target motion and do not require separate mechanisms. The existence of such an internal dynamic representation of target motion is supported by target blanking experiments with nonlinear target motion (Orban de Xivry et al., 2008, 2009). It allowed us to simulate pursuit responses when tracking an accelerating target (Bennett et al., 2007, 2010) or a sinusoidally moving target (Fig. 11). In our simulations, a default internal representation was improved on a trial-to-trial basis thanks to Kalman filtering. This trial-to-trial refinement has been used to explain the content of motor memories for motor learning (Krakauer et al., 2006; Kording et al., 2007). The presence of such dynamic internal representation could also explain the predictive ability of human subjects in ball catching (Hayhoe et al., 2005; Diaz et al., 2013) or hitting (Land and McLeod, 2000).

In our model, the efference copy signal is essential to build the internal representation of target motion (Fig. 1). Schizophrenic patients cannot process the efference copy signal properly (Pynn and DeSouza, 2013). This impairment results in abnormalities in the awareness of actions (Frith et al., 2000) and in a deficit of sensory prediction (Shergill et al., 2005; Synofzik et al., 2010). The dependence of the internal representation on the efference copy signal might explain why schizophrenic patients are impaired in predictive smooth pursuit (Sereno and Holzman, 1993; Thaker et al., 1998, 2003; Trillenberg et al., 1998; Hong et al., 2005; Nagel et al., 2007; Adams et al., 2012) and in prediction of target trajectory (Hooker and Park, 2000; Spering et al., 2013).

The output of the predictive pathway had a lead of 150 ms with respect to the sensory signal (70 ms in advance of the actual target motion given 80 ms of sensory delay), yet we do not believe that a single time horizon for this prediction exists. Rather, we hypothesize that there are many predictions of the target motion for different time horizons, such as those used for simulations of walking (Azevedo et al., 2004; Aftab et al., 2012). Such optimal prediction of future target motion could ideally replace our prediction made on this arbitrary 150 ms horizon, although our results suggest that it might be a reasonable value.

### Neural substrate of the model

Sensory areas appear to filter noise coming from sensory inputs to improve the reliability of the input over time. A model based on Kalman filtering can account for neuronal activity in visual areas during vision of natural images (Rao and Ballard, 1997). In these areas, the prior is represented by spontaneous activity (Berkes et al., 2011). Neurons in the temporal cortex also accumulate evidence for face recognition (Perrett et al., 1998). Their response to a new sensory input is biased by previous sensory inputs, i.e., by a prior (Jellema and Perrett, 2003). This is true for motion processing areas as well. Area MT plays a major role in the integration of local motion signals (for review, see Born and Bradley (2005)). Osborne et al. (2004, 2007) found evidence of information accumulation in the activity of a large population of MT neurons. The dynamics of neural activity in area MT might therefore reflect the iterative computation performed by the Kalman filter (Pack and Born, 2001; Osborne et al., 2004). However, MT neurons did not seem to discharge when blanking the target (Newsome et al., 1988), which suggests that this area is located within the visual pathway of our model.

The predictive pathway could be part of a parietofrontal network as suggested for manual interception tasks in monkeys (Merchant and Georgopoulos, 2006). For these tasks, area 7a of the posterior parietal cortex and the motor cortex appear to encode different parameters of target motion. This internal representation of target motion is then used in a predictive manner by the motor cortex (Merchant et al., 2004). For eye movements, neurons in the frontal eye field (FEF) are active during transient disappearance of the tracking target and might represent some form of internal representation of target motion for manual target interception or saccades (Barborica and Ferrera, 2003, 2004; Xiao et al., 2007; Ferrera and Barborica, 2010). The part of the FEF dedicated to smooth pursuit has been hypothesized recently to store previous expectations of target motion that can be used for smooth pursuit initiation (Yang et al., 2012). Alternatively, this prior could be interpreted as the dynamic internal representation of past target motion and could thus bias smooth pursuit initiation in the same way, as postulated in our model. In summary, the interaction between area MT and FEF might be critical to form an estimate of the RS incorporating both sensory and predictive signals.

Finally, the cerebellum exerts a large influence on the predictive pathway. First, it acts as a forward model for eye movements (Ghasia et al., 2008). Second, it may also contain an internal representation of target motion (Miles and Fuller, 1975; Suh et al., 2000; Cerminara et al., 2009) that does not depend on retinal signal (Stone and Lisberger, 1990; Suh et al., 2000).

### Conclusion

Kalman filters elegantly describe in a functional manner how the brain may deal with noisy sensory inputs, how it may form memories, and how it may take advantage of uncertainty estimation to combine different signals optimally. These three principles allowed us to accurately describe visually guided and predictive smooth pursuit dynamics observed in a wide variety of tasks within a single theoretical framework.

## Footnotes

This work was supported by Natural Sciences and Engineering Research Council of Canada, Canada Foundation for Innovation, the Botterell Fund (Queen's University, Kingston, Ontario, Canada), Ontario Research Foundation, Fonds National de la Recherche Scientifique, Fondation pour la Recherche Scientifique Médicale, the Belgian Program on Interuniversity Attraction Poles initiated by the Belgian Federal Science Policy Office, Actions de Recherche Concertées (French Community, Belgium), and the European Space Agency of the European Union. J.J.O.d.X. is supported by the Brains Back to Brussels program from the Brussels Region.

The authors declare no competing financial interests.

- Correspondence should be addressed to Philippe Lefèvre, Avenue Georges Lemaitre, 4, B-1348 Louvain-La-Neuve, Belgium. philippe.lefevre{at}uclouvain.be