## Abstract

To want something now rather than later is a common attitude that reflects the brain's tendency to value the passage of time. Because the time taken to accomplish an action inevitably delays task achievement and reward acquisition, this idea was ported to neural movement control within the “cost of time” theory. This theory provides a normative framework to account for the underpinnings of movement time formation within the brain and the origin of a self-selected pace in human and animal motion. Then, how does the brain exactly value time in the control of action? To tackle this issue, we used an inverse optimal control approach and developed a general methodology allowing to squarely sample infinitesimal values of the time cost from experimental motion data. The cost of time underlying saccades was found to have a concave growth, thereby confirming previous results on hyperbolic reward discounting, yet without making any prior assumption about this hypothetical nature. For self-paced reaching, however, movement time was primarily valued according to a striking sigmoidal shape; its rate of change consistently presented a steep rise before a maximum was reached and a slower decay was observed. Theoretical properties of uniqueness and robustness of the inferred time cost were established for the class of problems under investigation, thus reinforcing the significance of the present findings. These results may offer a unique opportunity to uncover how the brain values the passage of time in healthy and pathological motor control and shed new light on the processes underlying action invigoration.

**SIGNIFICANCE STATEMENT** Movement time is a fundamental characteristic of neural motor control, but the principles underlying its formation remain little known. This work addresses that question within the inverse optimal control framework where the challenge is to uncover what optimality criterion underlies a system's behavior. Here we rely on the “cost of time” theory that finds its roots into the brain's tendency to discount the actual value of future reward. It asserts that the time elapsed until action completion entails a cost, thereby making slow moves nonoptimal. By means of a thorough theoretical analysis, the present article shows how to sample the infinitesimal values of the time cost without prior assumption about its hypothetical nature and emphasizes its sigmoidal shape for reaching.

## Introduction

Movement time (MT) is an inherent characteristic of motor control, but the profound principles underlying its formation, be they neural or computational, remain in practice little understood. Prediction of MT is, however, a critical issue in all fields concerned with biological movement, such as humanoid/rehabilitation robotics, neuroprosthetics, brain-machine interfaces, or computer animation. In neuroscience, investigating the underpinnings of one's own motion pace is of crucial importance as many motor disorders, such as Parkinson's disease, lead to bradykinesia (Berardelli et al., 2001). This symptomatic movement slowness, not reducible to musculoskeletal or sensorimotor deficits, seems to be due to a lack of implicit “motor motivation” (Mazzoni et al., 2007; Baraduc et al., 2013). Such a reduced action vigor mainly originates from a dysfunction of the basal ganglia (BG) and the dopamine system (Desmurget and Turner, 2010; Turner and Desmurget, 2010; Brücke et al., 2012), thereby showing that MT is not just an emergent property of the task but a planned quantity. Although it is well understood why movements cannot be too fast, as exemplified by the speed/accuracy trade-off (Fitts, 1954; Harris and Wolpert, 1998; Tanaka et al., 2006), few studies actually addressed the complementary question of why movements are not slower.

Recently, the theory of the cost of time (CoT) has emerged as a promising avenue to tackle the issue of action invigoration (Shadmehr, 2010; Shadmehr et al., 2010; Shadmehr and Mussa-Ivaldi, 2012). This theory was originally motivated by the natural tendency of the brain to decrease the actual value of future reward (Myerson and Green, 1995). Porting this psycho-economical decision making to motor control (Wolpert and Landy, 2012), Shadmehr and colleagues proposed that the purpose of any action would be to put the neural system in a more rewarding state. Therefore, the slower the movement, the smaller the acquired reward. In other words, slow movements may be undesirable because the very passage of time entails a cost per se, thus placing the problem within the normative framework of optimal control (OC) theory (Scott, 2004; Todorov, 2004). The generic concept of a CoT revealed itself sufficient to account for MT in a variety of motor tasks (Hoff, 1994; Liu and Todorov, 2007; Shadmehr et al., 2010; Shadmehr, 2010; Rigoux and Guigon, 2012). The actual shape of the time cost remains, however, rather elusive in neural motor control because it was chosen a priori in most existing investigations. Researchers have assumed linear (Hoff, 1994; Harris and Wolpert, 2006; Liu and Todorov, 2007), concave (hyperbolic, Shadmehr, 2010; Haith et al., 2012; or exponential, Huh et al., 2010; Rigoux and Guigon, 2012) and even convex (quadratic, Shadmehr et al., 2010) shapes for the CoT (Fig. 1*a*). This choice could be motivated by psychological/behavioral observations, imposed by the mathematical constraints of the problem's formulation or just guided by the researcher's intuition.

In the present study, we instead used an inverse OC approach where the goal is to automatically uncover what optimality criterion underlies a (presumably optimal) system's behavior. To this aim, we developed a complete and robust methodology for characterizing how the brain values time from basic experimental motion data. The hyperbolic nature of the temporal discounting of reward (Shadmehr et al., 2010; Haith et al., 2012) was reaffirmed for saccades but without making any prior assumption about its hypothetical nature. When applied to reaching, a striking sigmoidal shape of the CoT was discovered over the time interval of actual reach durations, thereby ruling out purely concave or convex time costs for limb movement control. These methodological and empirical breakthroughs may offer a unique opportunity to assess how the brain values time in neural movement control across tasks, individuals, or species.

## Materials and Methods

#### Theoretical analysis

##### Working hypotheses.

This work relies upon the general assumption that biological movement is optimal with respect to some cost function (Engelbrecht, 2001; Todorov, 2004). It generally supposes that the trajectories triggered by the CNS can be accounted for by a certain infinitesimal cost *h*(**x**, **u**, *t*) depending on the system state, the motor command, and the time, respectively. Originally, models of OC for biological movement were developed in fixed time where MT was typically taken from experimental measurements. This restriction can be released in free time OC whereby MT emerges from the optimality of behavior (Pontryagin et al., 1964; Kirk, 1970). In this context and in agreement with the CoT theory, we further assume that *h* can be separated into a term that values time only, *g*(*t*) (the infinitesimal CoT) and a term that depends on the state/control variables, *l*(**x**, **u**) (Fig. 1*b*). A mathematical treatment of that problem, given below, shows that it is actually possible to compute *g*(*t*) by resolving an OC problem in fixed time *t* with known initial/final states, given a system dynamics and trajectory cost. When the dynamics is linear with a single control variable and the trajectory cost is quadratic (i.e., a single-input linear quadratic [LQ] problem), there moreover exist mathematical properties of uniqueness and robustness for the CoT, guaranteeing that the CoT can be identified unequivocally for that class of problem. Our main hypothesis is thus about the additive separability of temporal and trajectory costs, which is consistent with the computational motor control literature and coherent with all the previous works devised in fixed time (see below for a justification of this assertion). An alternative formulation, often seen in reinforcement learning, may have assumed multiplicative separability (e.g., exponential discounting in an infinite-horizon setting) (Sutton and Barto, 1998). Although we do not have theoretical results in the latter case, Rigoux and Guigon (2012) showed that, under certain restrictive assumptions (e.g., reward acquired on a single time step and state), a similar free time additive setting can be recovered. However, assuming a multiplicative temporal cost would be a major change of paradigm with respect to the motor control literature as the presence of such a term could modify all the predictions made by classical fixed-time optimal control models.

##### General methodology for identifying the CoT.

Consider a system dynamics **ẋ = f(x, u)**, with state **x** ∈ ℝ* ^{n}* and control

**u**∈ ℝ

*, and fix a target*

^{m}**x**

*(*

^{f}*f*stands for

*final*throughout the text). A dot above a variable stands for its time derivative. Given an input

**u**(·) defined on an interval [0,

*t*

_{u}], we denote by

**x**

_{u}(·) the trajectory of

**ẋ(**

*t*

**) = f(x(**

*t*

**), u(**

*t*

**))**arriving at

**x**

*, i.e., satisfying*

^{f}**x**

_{u}(*t*

_{u}) = x*. We assume that the cost function writes as the combination of the CoT plus a trajectory cost, as follows: where the functions*

^{f}*g*and

*l*are non-negative. The function

*l*is classical to motor control and may capture various aspects of the trajectory, such as effort, energy, smoothness, or accuracy. It has been the subject of extensive investigations (e.g., Berret et al., 2011). It will be referred to as the trajectory cost throughout the study. What we know about

*l*is that it must account for the system trajectories in fixed time (i.e., when MT is known). To get persuaded of this, it suffices to realize that a model predicting MT accurately would be meaningless if it does not predict the correct system trajectories in fixed time as well. In what follows, we shall assume that

*l*is either known from the literature or can be identified via some inverse OC procedure performed in fixed time (e.g., Mombaur et al., 2010; Berret et al., 2011). The function

*g*is the infinitesimal (i.e., instantaneous) CoT we are looking for, whose antiderivative is the actual CoT and will be denoted by

*G*(

*t*) −

*G*(0) =

*G*(0) = 0).

We consider the following free time OC problems:

Given an initial condition **x**^{0}, minimize the cost *C*(**u**, *t*** _{u}**) among all inputs

**u**(·) and all times

*t*

**such that**

_{u}**x**and

_{u}(0) = x^{0}**x**

_{u}(*t*

_{u}) = x*(by definition of*

^{f}**x**).

_{u}The existence of minimal solutions **u**(·) with a finite time *t*_{u} may be guaranteed under some technical conditions on the dynamics and on the cost (typically, compactness of the set of controls and convexity of **f** and *l* with respect to **u**, or **f** linear and *l* quadratic) (e.g., Lee and Markus, 1967). We will assume that such conditions hold here.

Let **u**(·) be a minimal solution of the problem. Then there exists a curve **p**(*t*), *t* ∈ [0, *t*** _{u}**], in ℝ

*called the adjoint or costate vector, and a real number λ = 0 or 1, such that (*

^{n}**x**(·),

**p**(·),

**u**(·), λ) satisfies the conditions of the Pontryagin Maximum Principle (PMP) (e.g., Pontryagin et al., 1964; Lee and Markus, 1967; Todorov, 2006). In particular, defining the Hamiltonian of the problem is as follows: The adjoint vector satisfies the Hamiltonian equation

**ṗ = −**

**f**(

**x**,

**u**) (these controls are such that the linearization of the system around the associated trajectory

**x**is not controllable). We will assume that no such control exist for

_{u}**f**(

**x**,

**u**) (this is always true for controllable linear systems for instance), and so we can choose λ = 1 in ℋ. As a consequence, we obtain the following: On the other hand, it is obvious that

**u**(·) is also a minimal solution of the following OC problem in fixed time

*t*

**as follows:**

_{u}Minimize the cost
among all inputs **v**(·) such that **x _{v}(**0

**) = x**(and

^{0}**x**

_{v}(*t*

**). The Hamiltonian associated with this problem is as follows: ℋ**

_{u}) = x^{f}_{0}defines the same Hamiltonian equation than ℋ since

**p**(·) is also an adjoint vector for this OC problem in fixed time, and it is the only one since the adjoint vector associated with λ = 1 is unique.

Thus, for a given time *t*** _{u}** and a given cost

*l*, we can compute the value

*g*(

*t*

_{u}) in the following way. First, solve the fixed-time OC problem in time

*t*

_{u}. The result is a triple

**(x(·)**,

**p(·)**,

**u(·))**from which the Hamiltonian can be calculated. Second, set to get the value of the infinitesimal CoT at time

*t*

**. A flowchart illustrating the process is given in Figure 2.**

_{u}A more elegant way to obtain the same conclusion is to proceed as follows. Let *V*(*t*, **x ^{0})** be the value function of the OC problem in fixed-time

*t*, that is: where the infimum is taken among all inputs

**u**(·) such that

**x**0

_{u}(**) = x**(and

^{0}**x**

_{u}(*t*

**) = x**

*). It is the optimal cost of a motion in time*

^{f}*t*between

**x**

^{0}and

**x**

*. Then the time*

^{f}*t*

_{u}of an optimal solution

**u**(·) of the free time problem satisfies the following: and so necessarily

*V*is differentiable with respect to

*t*). It is well known from the Hamilton-Jacobi-Bellman theory that

*t*

**ℋ**

_{u}, x^{0}) =_{0}

^{★}(x(

*t*

_{u}), p(*t*

**1), where**

_{u}),**ℋ**

_{0}

^{★}(

**x, p**, 1) = max

_{v}ℋ

_{0}(

**x, p, v,**1). Since for the optimal control

**u**we have ℋ

_{0}

^{★}(

**x(**t

_{u}), p(*t*

**1**

_{u}),**) =**ℋ

_{0}(x(

*t*

_{u}), p(*t*

_{u}), u(*t*

**1), we recover in this way Equation 7. We did not use the standard way to define the value function: for a MT equal to**

_{u}),*t*, this is usually

*w*,

**x**(

*w*)) =

*V*(

*t*−

*w*,

**x**(

*w*)) =

*w*,

**x**(

*w*)), hence ∂

*V*/∂

*t*= −∂

*t*.

Remarkably, the latter analysis shows that the derivation extends to stochastic settings (Stengel, 1986; Todorov, 2006). In that case, the infinitesimal CoT is simply the partial time derivative of the expected value function of the stochastic OC problem in fixed time *t*** _{u}**. More precisely, for the stochastic dynamics

*t*. More precisely, for the stochastic dynamics

_{u}*d*

**x = f(x, u)**

*dt*

**+ g(x, u)**

*d*ξ where

*d*ξ is a standard Wiener process,

*g*(

*t*

**) results from the stochastic Hamilton-Jacobi-Bellman equation as follows: In the LQ Gaussian (LQG) case, the infinitesimal CoT can be easily computed because the value function has a parametric form whose parameters can be evaluated via the resolution of decoupled ordinary differential equations (e.g., Kappen, 2011).**

_{u}In summary, the above theoretical considerations show that *g*(*t*) can be calculated by solving a deterministic or stochastic OC problem in fixed time *t* for some initial/final states of the system (**x**^{0} and **x*** ^{f}*). Movement duration and initial/final states are basic information that can be retrieved from experimental data. Thus, given a trajectory cost

*l*and a model of the system dynamics, it is straightforward to sample the values of the mapping

*t*↦

*g*(

*t*) on some time interval [

*t*

_{min},

*t*

_{max}]. The boundary values correspond to the range where real observations can be taken in practice and come out experimentally. This is the empirical range of movement durations. As a consequence, the integral CoT (i.e.,

*G*) can only be identified up to a constant since the values of

*g*(

*t*) for

*t*<

*t*

_{min}are unknown/unobservable. This constant shift, however, does not affect either the solution of the free time OC problem or the predicted amplitude–duration relationship over the range [

*t*

_{min},

*t*

_{max}] (that will be exactly recovered on this interval). To infer the CoT outside of the range of actual measurements, we may devise at least two ways. First, the identified CoT may just be extrapolated for

*t*<

*t*

_{min}and

*t*>

*t*

_{max}. Alternatively, values of

*g*(

*t*) may be computed for unobserved times by relying on predictions of the empirical amplitude–duration relationship (e.g., obtained via regression analysis). Values inferred using the first and second methods will be displayed as dotted lines and solid lines, respectively, in all graphs depicting the CoT. Values inferred over the range of actual reach durations will be emphasized by filled circles added on the top of solid lines, which locate where the CoT was the most reliably identified.

##### Detailed solution for single-input LQ problems.

For deterministic single-input linear-quadratic problems, a more detailed account of the methodology can be given and closed-form solutions of the CoT can be obtained. This case is relevant as it includes basic eye and arm movements. The state of such systems can be described by **x** = (θ, …, θ^{(n−1)})^{◨} ∈ ℝ* ^{n}*, and then the dynamics has the form as follows:
which is a single-input linear system

**ẋ**=

*A*

**x**+

*Bu*,

*u*∈ ℝ. Typically,

*n*= 2 or

*n*= 3 for dynamic models of the arm or the eye (see below). Let us assume that a quadratic cost

*l*(

**x**,

*u*) = (

*u*−

**k**

^{T}**x**)

^{2}has been determined (e.g., an “effort” cost). We choose the final point

**x**

*as an equilibrium state of the system. Up to a translation in θ, we can always assume*

^{f}**x**

*= 0. We then choose a family of initial conditions*

^{f}**x**

^{0}(

*a*) = (

*a*, 0, …, 0), which are equilibrium states parameterized by the movement extent

*a*> 0 (i.e., the amplitude of the motion). For every amplitude

*a*> 0, we denote by

*t**(

*a*) the duration of the motion between

**x**

^{0}(

*a*) and

**x**

*(which can be estimated experimentally), and by*

^{f}*u*(·) a control minimizing the integral cost

^{a}*C*

_{t*(a)}(

*u*) =

*t**(

*a*) between

**x**

*(0) =*

_{u}**x**

^{0}(

*a*) and

**x**

*(*

_{u}*t**(

*a*)) =

**x**

*= 0. Applying the above methodology, we have to compute ℋ*

^{f}_{0}(

**x**(

*t**(

*a*)),

**p**(

*t**(

*a*)),

*u*(

*t**(

*a*)), 1) where: From the PMP,

*u*(

^{a}*t**(

*a*)) minimizes ℋ

_{0}(

**x**(

*t**(

*a*)),

**p**(

*t**(

*a*)),

*u*, 1) with respect to

*u*, which means ∂ℋ

_{0}/∂

*u*= 0, and replacing the solution in Equation 12, we obtain the following: Moreover, the value

*u*(

^{a}*t**(

*a*)) can be seen to depend linearly on

**x**

*(0) in the LQ case, and so it depends linearly on*

_{u}*a*since

**x**

*(0) =*

_{u}**x**

^{0}(

*a*) =

*a*

**x**, In other words,

^{0}(1)*u*(

^{a}*t**(

*a*)) =

*a*ϕ(

*t**(

*a*)), where the function ϕ( · ) is defined as follows: for every τ > 0, ϕ(τ) is the value

*u*

^{1}(τ) of the control minimizing the integral cost

*C*

_{τ}(

*u*) =

**x**

*(0) =*

_{u}**x**

^{0}(1) and

*x*(τ) = 0. The function ϕ(·) is a universal value of time that depends only on the system dynamics (

_{u}*A*,

*B*) and the trajectory cost (

**k**) and not on the specific behavior of an individual. This universal value of time can be computed explicitly thanks to the equations given by Ferrante et al. (2005). Applying formula given in Equation 7, we finally obtain

*g*(

*t**(

*a*)) = ϕ(

*t**(

*a*))

^{2}

*a*

^{2}.

Empirical observations show that the time *t**(*a*) is typically an increasing function of the amplitude, so that its inverse *a**(*t*) exists. We can then determine the function *g*(·) by *g*(*t*) = ϕ(*t*)^{2}*a**(*t*)^{2}. In particular, if it appears from experiments that the function *t** is approximately affine of the form *t**(*a*) = α*a* + β, then the infinitesimal CoT can be written as *g*(*t*) = ϕ(*t*)^{2}((1/α) *t* − β/α)^{2}. Hence, it suffices to compute the universal value of time ϕ(*t*), which can be done explicitly, to recover the actual infinitesimal CoT from the experimental duration/amplitude mapping.

Interestingly, this analysis also reveals the effect of globally rescaling *g* by multiplying it by some positive parameter κ. This would induce a new amplitude *t**(*a*) = α*a*/

##### Uniqueness and robustness properties for single-input LQ problems.

From the above considerations, it is clear that the values of the infinitesimal CoT may critically depend on the trajectory cost. This raises the question about the robustness and uniqueness of *g* in general as the cost *l* may be known only approximately or may even be quite different from the true cost without it being noticeable in the motion data under investigation. Although important, this question was not addressed in previous studies. The problem can be exemplified for planar point-to-point reaches where the torque change (Uno et al., 1989) and jerk (Flash and Hogan, 1985) trajectory costs are hardly distinguishable despite their different nature (dynamic vs kinematic costs). For the CoT theory, different trajectory costs may conceivably lead to divergent time costs. Thus, before identifying the CoT, the question of what trajectory cost underlies the system trajectories with known MT must be posed. It may be argued that the trajectory cost must be identified through the resolution of the following inverse OC problem in fixed time:

Given a set of recorded trajectories in time τ, find *l* such that every recorded trajectory **y**(*t*) is a minimum of the integral cost as follows:
among all trajectories **x**_{u} such that **x**_{u}(0) = **y**(0) and **x*** _{u}*(τ) =

**y**(τ).

However, this inverse problem is ill posed in general, which means that its solutions *l*(**x**, *u*) are neither unique nor stable with respect to perturbations of the recorded data (which are affected by sensorimotor and measurement noise and subject to interindividual variability). Within the single-input LQ framework, however, powerful results of uniqueness and robustness can be obtained, as explained hereafter.

We thus assume that the class of admissible trajectory costs is the class ℒ of quadratic costs ℒ = {*l*(**x**, *u*) = *u*^{2} + **x**^{T}Q**x** + 2**x*** ^{T}Su*}, with standard hypothesis on the matrices

*Q*and

*S*(i.e.,

*Q*=

*Q*≥ 0 and

^{T}*S*is such that

*l*is a positive semidefinite quadratic form). Given a cost

*l*∈ ℒ, the classical fixed-time LQ theory asserts that the solution of the corresponding OC problem in time τ satisfies

*u*(

*t*) =

*K*

_{τ}(

*t*)

**x**(

*t*), where the time-dependent matrices

*K*

_{τ}(

*t*) are determined through some Riccati equations. Alternatively, Ferrante et al. (2005) showed that all solutions in fixed time are determined by only two (1 ×

*n*) matrices

*K*

_{−},

*K*

_{+}(i.e., the set {

*K*

_{τ}(·), τ > 0} is uniquely characterized by a pair,

*K*

_{−},

*K*

_{+}). Following the latter approach, let us introduce the set 𝒦 of all such pairs, that is, (

*K*

_{−},

*K*

_{+}) belongs to 𝒦 if there exists a LQ problem associated with a cost

*l*∈ ℒ whose solutions are characterized by (

*K*

_{−},

*K*

_{+}). The following theorem is proved by Nori and Frezza (2004).

##### Theorem 1.

For any pair (*K*_{−}, *K*_{+}) ∈ 𝒦, there exists a unique vector **k** ∈ ℝ* ^{n}* such that (

*K*

_{−},

*K*

_{+}) describes the optimal solutions of the fixed-time OC problems associated with the following: Moreover, the map (

*K*

_{−},

*K*

_{+}) ↦

**k**is continuous (actually

**k = −**

*K*

_{+}

^{T}).

As a consequence, for single-input LQ problems, there exists a complete methodology for determining unequivocally and robustly the cost of the time *g*. The method is summarized as follows: (1) determine a pair (*K*_{−}, *K*_{+}) ∈ 𝒦 such that the associated optimal trajectories match accurately the recorded ones in fixed time (this can be written as a least-square problem); (2) set **k** = −*K*_{+}^{T} and *l*(**x**, *u*) = (*u* − **k**^{T}**x) ^{2}**; (3) compute the function ∂

*V*/∂

*t*(

*t*,

**x**) by using the Hamiltonian ℋ

_{0}associated with

*l*(

**x**,

*u*); (4) for every time

*t*, identify from experimental data an initial condition

**x***(

*t*) corresponding to a movement in time

*t*(if it exists for that

*t*); and (5) set

*g*(

*t*) = −∂

*V*/∂

*t*(

*t*,

**x***(

*t*)). The latter analysis proves that the CoT can be unequivocally and robustly identified from experimental data for single-input LQ problems. Although restricted, this class of problems covers most of the applications considered in the present study, which reinforces the significance of our findings. For more general problems, either the literature already provides some widely accepted trajectory cost or numerical techniques of inverse optimal control can be used to identify the cost function best replicating the system trajectories in fixed time (e.g., Berret et al., 2011).

#### Computational procedures

##### Models for saccades.

The eye plant model was taken from Shadmehr et al. (2010). Briefly, the eye dynamics was single-input linear system as follows:
where *c*_{1} = 1.165 × 10^{−5}, *c*_{2} = 3.9 × 10^{−3}, *c*_{3} = 0.241, and *c*_{4} = 1 for a human eye. The system state was **x** = (θ, θ̇, θ̈)^{◨}, and an effort cost *l*(**x**, *u*) = *u*^{2} was minimized. This problem falls within the single-input LQ framework described above.

We also performed simulations with a signal-dependent noise following Shadmehr et al. (2010) and identified the CoT in this stochastic context. Signal-dependent noise is a random variable with a normal distribution of mean zero and variance equal to *k*_{SDN} *u*^{2}. This problem then falls within the LQG framework, for which closed-form solutions are available to compute the value function and in particular its partial time derivative (for the equations, see Kappen, 2011). The default noise level (signal-dependent noise) was set to *k*_{SDN} = 0.0075, and a terminal cost accounting for accuracy of the form **x**^{◨}*D***x** with *D* = diag(10^{3}, 1, 10^{−2}) was added to the integral cost.

##### Models for reaching: Single degree-of-freedom (dof) limb.

For a 1-dof arm moving in the horizontal plane, the basic model used throughout the study was already described in numerous other studies (e.g., Hogan, 1984; Tanaka et al., 2006; Gentili et al., 2007; Gaveau et al., 2014) and is as follows:
where is θ the shoulder joint angle, τ is the muscle torque, *b* is the friction coefficient (*b* = 0.87 here), *I* is the moment of inertia of the arm with respect to the shoulder joint (value estimated based upon Winter's table for each participant) (Winter, 1990) and *u* is the single control variable.

For the trajectory cost, we typically considered canonical quadratic costs of the form *l*(**x**, *u*) = (*u* − **k**^{◨}**x**)^{2}, where **x**^{◨} = (θ, θ̇, θ̈) denotes the system state. The two most famous examples are the minimum torque change corresponding to *l*(**x**, *u*) = *u*^{2} (i.e., **k** = **0**) (Uno et al., 1989) and the minimum jerk corresponding to *l*(**x**, *u*) = **k**^{◨} = (0, 0, *b*)) (Flash and Hogan, 1985). However, other costs, possibly composite, may account for such planar movements. Therefore, additional perturbations of the effort cost were tested as follows: a torque-based cost (*l*(**x**, *u*) = 0.05*u*^{2} + τ^{2}) (Nelson, 1983), an energy-based cost (absolute work, *l*(**x**, *u*) = 0.05*u*^{2} + | τθ̇ |) (Berret et al., 2008), and an acceleration-based cost (*l*(**x**, *u*) = 0.05*u*^{2} + θ̈^{2}) (Ben-Itzhak and Karniel, 2008). All these costs were checked to predict symmetrical and smooth velocity profiles and thus to be plausible trajectory costs for such a type of movement under consideration. For visualization purposes, we considered when convenient *l*(**x**, *u*) = 0.05*u*^{2} and *l*(**x**, *u*) = 0.05

Sensorimotor noise is known to play a role on motor control during reaching, especially signal-dependent noise (Todorov and Jordan, 2002; van Beers et al., 2004). Therefore, we also tested signal-dependent noise among values in the following range *k*_{SDN} = [0; 0.05; 0.075; 0.1; 0.125]. In this stochastic context, a terminal cost accounting for accuracy of the form **x**^{◨}*D***x** with *D* = 5diag(10^{5}, 10^{3}, 10^{2}) was added to the integral cost.

##### 1-dof with muscle dynamics.

For the sake of completeness, we also tested a more advanced model of the arm, namely a rigid body actuated by two antagonist muscles modeled as second-order low-pass filters (Van der Helm and Rozendaal, 2000) with certain time constants. In that case, the single-joint arm dynamics could be written as follows:
where the subscripts 1 and 2 denote the flexor and extensor muscles, respectively, ε is the muscle excitation, *u _{i}* is the control variable, which can be thought of as motor neuron input, and σ is the time constant of the filter. Typically, the value is chosen ∼0.04 s in the literature for fast movements (e.g., Liu and Todorov, 2007; Guigon et al., 2007), but we tested values ranging from 0.03 s to 0.08 s to evaluate the effects of varying this parameter on the shape of the CoT. In the reported simulations, we minimized the following effort cost:

*l*(

**x**,

*u*) = 0.005(

*u*

_{1}

^{2}+

*u*

_{2}

^{2}).

##### 2-dof arm.

For a nonlinear multijoint planar arm model, a full description and the parameters used for the present simulations can be found in previously published articles (e.g., Berret et al., 2008, 2011). Briefly,
where θ = (θ_{1}, θ_{2})^{◨} and τ = (τ_{1}, τ_{2})^{◨} denote the joint angle and torque vectors, respectively. The quantities ℳ, 𝒦, and ℱ are the inertia, the Coriolis/centripetal, and the viscosity matrices, respectively.

Smooth torque signals were obtained by controlling the torque derivative to emulate the low-pass filter characteristics of muscles, which is a classical assumption done in other studies (e.g., Uno et al., 1989; Nakano et al., 1999) and is as follows:
where **u** is the control variable.

Anthropometric parameters were taken from Winter's tables (Winter, 1990), and values were adjusted to each participant.

##### Numerical optimal control.

To solve a free time optimal control problem, a naive strategy might consist of testing different MTs until the best duration is found, by solving fixed-time problems at each iteration. This approach would be of course computationally intensive and rather inefficient. However, it can be seen from the PMP that the free time setting just introduces an additional equation (see Eq. 3). Indeed, the Hamiltonian has to be zero along the optimal state/costate trajectory, which can be exploited to find the solution without resorting to any trial-error procedure and without additional computational complexity. For resolving general optimal control problems (fixed- or free time), we relied here a classical method that consists of transforming the OC problem into a nonlinear programming problem with constraints. More specifically, we used an orthogonal collocation method that is efficiently implemented in MATLAB (The MathWorks), called GPOPS (Benson et al., 2006; Garg et al., 2010; Rao et al., 2010). The nonlinear programming problem was solved by means of the well-established numerical software SNOPT (Gill et al., 2005). We checked a posteriori that the Hamiltonian was either zero (in free time) or constant (in fixed-time) along the resulting optimal state/costate trajectories. For LQ or LQG problems, the simulations were also performed by using the closed-form solutions of the problem, and we checked that both numerical methods yielded the same trajectories and time costs.

#### Experimental procedures

##### Main experimental task participants.

Twenty right-handed healthy adults, without neuromuscular diseases and with normal or corrected-to-normal vision, participated in this study (10 males, age: 30 ± 5 years, mass: 67 ± 12 kg). Written informed consent was obtained from each participant in the study as required by the Helsinki Declaration and the EA 4042 local Ethics Committee. All the participants were naive to the purpose of the experiment.

##### Motor task.

Participants performed visually guided single-joint movements (rotation around the shoulder joint) in the horizontal plane. Participants stood in front of a large vertical screen where spotlight targets (3 cm diameter) were displayed. We tested 10 amplitudes ranging from 5° to 95°, and 10 repetitions were recorded for each amplitude (5 in the rightward and 5 in the leftward direction). Participants started with the arm orthogonal to the screen. A sequence of 20 targets was then displayed (a new target appeared every 3.5 s). The starting arm position was varied across the experiment and coincided with the previous target location. For each participant, 5 blocks of 20 movements were recorded, for a total of 100 movements per participant. The sequence of trials was fully randomized within each block, however, ensuring that one leftward and one rightward movement of each amplitude was presented within each sequence of 20 trials.

##### Instructions.

Participants were instructed to move at a spontaneous, comfortable speed. They were asked to point toward the displayed spotlight target without trying to touch it because it would otherwise induce trunk movements. They were required to keep the arm fully extended, to minimize trunk rotations while performing the task and to perform one-shot movements without correction. They also had to support the weight of their arm by themselves and try to move in a transverse plane as much as possible. A familiarization phase consisting of 20 trials was performed before the recording of the 100 trials. Participants were allowed to rest and relax their arm every 20 trials for several minutes (a “pause” word appeared on the screen for that purpose).

##### Data collection and processing materials.

Arm and head motion of the dominant arm were recorded by means of a motion capture system (Optitrack device). Ten cameras were used to capture the movement of five retro reflective markers (14 mm in diameter), placed at well-defined anatomical locations on the moving arm and head (acromial process, humeral lateral condyle, apex of the index finger, left and right sides of the frontal bone).

##### Data analysis.

All the analyses were performed with custom software written in MATLAB from the recorded 3D positions of the markers (sampling frequency of 250 Hz). Recorded signals were low-pass filtered using a digital fifth-order Butterworth filter at a cutoff frequency of 10 Hz (MATLAB *filtfilt* function). Velocity profiles were computed via numerical differentiation.

The movement onset time was defined as the instant at which the linear tangential velocity of the fingertip exceeded 5% of its peak and the end of movement as the point at which the same velocity dropped below the 5% threshold. Movement duration was first inferred from those values as it is a traditional method. However, given the purpose of the paper, a second method was used where the velocity profiles were fitted to minimum jerk velocity profiles (Flash and Hogan, 1985) to infer the movement duration (in the spirit of Botzer and Karniel, 2009). To do so, a global optimization using MATLAB's *patternsearch* function was performed, and the parameters of the best-fitting minimum jerk problem were identified. On average, both methods gave similar results, but the second method revealed itself to be more robust to variations of motion amplitude and noise, and was therefore much more reliable in general. This second method was thus retained. Finally, the angular position of the arm was also evaluated from the shoulder and finger markers and using some trigonometry. The error between the theoretical amplitude and the one inferred from our recorded data was 2.0 ± 0.9 degrees on average across participants. This indicated that the participants fulfilled the task correctly by respecting the required amplitudes and that the stimuli displayed on the screen were correctly calibrated.

##### Additional experiment.

Four right-handed healthy adults were tested during an additional experiment addressing the case of small amplitudes/times (age: 32 ± 2 years, mass: 66 ± 4 kg). The same protocol as the main experiment was used, except that a smaller target width (0.3 cm instead of 3 cm) and a smaller fingertip marker (9 mm instead of 14 mm) were used because specific focus was put on small motion extents. A first block of 100 movements was recorded for amplitudes ranging from 5° to 95° as for the main task. A second block of 100 movements was recorded for amplitudes ranging from 1° to 5°. The order of the two blocks was counterbalanced across subjects. The smallest tested movements were ∼12 mm here, which was still much above the accuracy of the motion capture system (<0.5 mm from calibration). A total of 200 reaches per participant was thus available in this dataset, ranging from very small to very large motion extents.

## Results

### Illustration for saccades

We first considered the case of saccades to test the effectiveness and relevance of our methodology. In a series of papers, Shadmehr and colleagues (Shadmehr et al., 2010; Shadmehr, 2010) already investigated in great detail the extent to which the CoT theory could account for the duration of horizontal saccades and their variations with respect to the reward assigned by the brain to the task. They concluded that a hyperbolic CoT replicated better the amplitude–duration relationship than linear or quadratic alternatives. However, “the timescales [were] too short to allow [them] to dissociate between hyperbolic and exponential temporal discount functions” (Shadmehr, 2010). Considering intertrial intervals, they nevertheless concluded later that an exponential CoT was inconsistent with the data (Haith et al., 2012). Here we relied upon the same experimental data reported by Collewijn et al. (1988), and we used the same linear model of the oculomotor plant and trajectory cost (i.e., “effort” cost, quadratic in the control variable) to identify the CoT underlying human saccades (as described also by Harris and Wolpert, 1998; Tanaka et al., 2006; Kardamakis and Moschovakis, 2009; Shadmehr et al., 2010).

In contrast with Shadmehr et al. (2010), the present investigation did not require guessing a parametric form of the way the brain values the passage of time. Instead, we directly computed the infinitesimal CoT at different times based upon either the affine amplitude–duration relationship or the true data points exhibiting a growth larger than linear for ample saccades reported by Collewijn et al. (1988). We identified two similar yet slightly different time costs (Fig. 3*a*,*b*, black and gray traces). First, our results confirmed previous findings by showing that the CoT underlying saccades had undeniably a concave shape, thereby ruling out other time costs, such as linear and convex ones. When fitting the infinitesimal CoT on the range of actual durations (i.e., [36.5–295 ms]), we found that the root mean square error (RMSE) for the exponential fit was only slightly better than the hyperbolic one (0.010 vs 0.015). Fitting *g* on a larger interval would be possible, but the result would then depend heavily on the extrapolation method. Second, the reward assigned to the task could be estimated from the asymptotic value of *G*, without the need for a parametric guess of the CoT. However, when a parametric model of CoT was available (not based on blind intuition though but on the shape of *g*), the reward could be estimated via the parameter α (Fig. 3*c*), without resorting to any trial/error adjustments. In that case, the reward discount rate could be evaluated as well via the parameter β (Fig. 3*c*). Third, when modeling signal-dependent noise and thus switching to the stochastic context as in Shadmehr et al. (2010) or Harris and Wolpert (1998), the identified values of the infinitesimal CoT were found to be larger than for the deterministic case (Fig. 3*d*). This increase of the CoT was mainly a consequence of the larger expected trajectory cost due to the effects of sensorimotor noise. As the level of noise increased, the CoT approached a hyperbolic rather than an exponential discounting of reward (Fig. 3*d*, RMSE). Because our methodology provides an exact solution to the identification problem, the original amplitude–duration relationship was perfectly recovered from free time OC simulations as expected (using the inferred CoT, see Fig. 3*e*). The associated optimal saccade velocity profiles are reported in Figure 3*f*. The important result was that in all cases the instantaneous CoT was a decreasing function of time such that *G* (the integrated CoT) always had a clear concave shape. Importantly, our theoretical results for single-input LQ problems ensure that this inferred CoT is not just an artifact but does possess such well-identified characteristics. Our findings thus rule out nonconcave time costs for the control of saccades and confirm the hyperbolic shape of the CoT (Shadmehr et al., 2010; Haith et al., 2012).

### Application to reaching

We then investigated in depth the CoT underlying the control of reaching. As far as we know, inferring the CoT for reaching has not been attempted yet. To start, we considered the case of a single-joint arm moving in the horizontal plane whose linear musculoskeletal dynamics was described in several studies (Hogan, 1984; Tanaka et al., 2006; Berret et al., 2008; Gaveau et al., 2014) and detailed in Materials and Methods. We relied on the effort required to generate a reach as trajectory cost (i.e., a quadratic cost in the control variable). Incidentally, it coincided here with the torque change (Uno et al., 1989), a prominent model known to replicate well the bell shape of velocity profiles during single-joint rotations (Gottlieb et al., 1989). This problem also falls within the single-input LQ class for which we have powerful mathematical results.

To uncover the CoT underlying such self-paced arm pointing movements, experimental data were first gathered and parameters such as movement extent and duration were computed. For all the participants, the relationship between amplitude and duration was approximately linear, in agreement with the existing literature (Brown et al., 1990; Gentili et al., 2007). Figure 4 illustrates this relationship for 4 participants separately, which was quantified via linear regressions based on all the 100 trials recorded and depicted as single gray dots (*R*^{2} > 0.75 for all the 20 participants). Even though the task was fairly simple, interindividual differences were noticeable. In particular, both the slope and the intercept of the regression line varied across participants.

For each single movement, information about its amplitude and duration was eventually used to estimate the value of *g* for that time. Figure 5*a* displays the values of *g*(*t*) when sampled on such a single-trial basis (gray dots). Each dot in this figure corresponds to a dot in Figure 4. Because of sensorimotor noise and intertrial fluctuations (clearly visible in Fig. 4), the identified CoT was itself quite noisy when inferred from single trials. To get a more reliable and noiseless estimation of *g*, we then exploited the affine law linking amplitude and MT to infer the CoT on the time interval where actual measurements were taken in practice (Fig. 5*a*, black traces). The time interval containing the empirical reach durations is emphasized with filled circles added on the CoT traces. Its boundary values, corresponding, respectively, to the faster and slower motions observed during the experimental session, varied between participants as they were instructed to move at their own pace. Outside of this interval, the shape of the infinitesimal CoT was less reliable as it was either computed based upon regression equations (yielding predicted MTs for untested amplitudes) or just extrapolated by assuming that the initial and asymptotic values of *g* were both zero (Fig. 5*a*, solid lines and dotted lines, respectively). In Figure 5*b*, the corresponding integral CoTs, *G*, are depicted. The CoT exhibited a striking logistic shape for all the 20 participants, regardless of their spontaneous pace and anthropometric characteristics. Importantly, this conclusion could be drawn from the most reliable part of the curve, that is, over the range of MTs that were recorded in practice (and without resorting to any sort of extrapolation). This nontrivial shape would have been hardly predictable using a parametric approach as the CoT is neither concave nor convex but sigmoidal (at least on the time interval of interest, i.e., from ∼500–1500 ms). As a byproduct, these results show that for reaching the CoT is not purely hyperbolic or exponential.

#### Verification through direct optimal control simulations in free time

Before going further, we performed free time optimal control simulations by using the identified *g* to verify numerically the effectiveness of our methodology. As expected, the fitted relationship between amplitude and duration was exactly the same in simulation and experiment (Fig. 6*a*). This was expected given that the method seeks the CoT that will exactly account for the original amplitude–duration relationship. The velocity profiles were also bell-shaped and simply scaled with speed in accordance with well-known experimental observations (Fig. 6*b*) (e.g., Gottlieb et al., 1989). It is worth noting that solving free time optimal control problems is not computationally more intensive than solving their fixed-time counterparts. In particular, resolving a free time problem does not necessarily require a trial/error procedure as it is often assumed or done in the literature (e.g., van Beers, 2008; Shadmehr et al., 2010; Rigoux and Guigon, 2012; but see Hoff, 1994).

#### Consistency with respect to the fitting of the amplitude–duration relationship

In the preceding paragraphs, we assumed an affine relationship between amplitude and duration with the consequence of having non-zero MT at zero amplitude. Actually, other functions, such as concave ones, may fit the data equally well, with or without bias at the origin. To investigate the time cost consistency with respect to data fitting, we collected additional data involving reaching movements of amplitudes <5° because nonlinearity could mainly arise for small times/amplitudes (see additional experiment in Materials and Methods). While the shape of the amplitude–duration relationship was very close to linear for reaches >5°, these data allowed us to refine our analysis near the origin. Figure 7*a* shows the amplitude–duration relationship for amplitudes ranging from 1° to 5° (100 reaches) and 5° to 95° (100 reaches) for horizontal reaches with a fully extended arm. The data revealed that the relationship was more concave than linear in the vicinity of small amplitudes, even though an affine fit still performed relatively well overall (*R*^{2} > 0.85). Although it was not possible to decide between concave fits with or without intercept from the *R*^{2} values (light vs dark gray traces), the CoT appeared to be sigmoidal over the range of empirical MTs in any case (i.e., ∼300–1500 ms; see Fig. 7*b*). For all the participants, a similar bell shape for the function *g* starting from times ∼300 ms was found. The only noticeable difference between fits with or without intercept occurred for small times (corresponding to very short motion extents because the fitted curve was very steep near the origin) where *g* was found to increase as MT tended to zero in the latter case (dark gray traces). This finding is, however, hypothetical because it is not grounded on real data as we did not measure reaches with duration <300 ms in this experiment. Nevertheless, the integral CoT, *G*, exhibited a clear sigmoidal shape over the range of actual reach durations. The latter was fully characterized up to a constant shift corresponding to the (unknown) values of *g*(*t*) for *t* <300 ms. These values, however, do not affect the ability of the identified *G* to predict the correct amplitude–duration relationship for times of practical interest for reaching.

#### Consistency with respect to trajectory costs, motor noise, and muscle dynamics

Before concluding about the sigmoidal shape of the CoT on the time interval of empirical reach durations, we further tested its consistency with respect to changes of trajectory cost, motor noise, and muscle dynamics (Fig. 8). Thus far, we relied on a simple quadratic effort cost (i.e., the torque change here). However, because a variety of trajectory costs could replicate the bell-shaped velocity profiles of such single-joint arm rotations, we assessed the robustness of the CoT with respect to other plausible trajectory costs. We considered the minimum jerk (Flash and Hogan, 1985), torque (Nelson, 1983), absolute work (Berret et al., 2008), or acceleration (Ben-Itzhak and Karniel, 2008) models. The extracted time costs are reported in Figure 8*a*. This numerical analysis confirmed the consistency of the shape of *g* with respect to *l*. A candidate trajectory cost must at least replicate the arm trajectories when MT is known; therefore, this condition was ensured before extracting the CoT (i.e., the smooth bell shape of velocity profiles). We then tested the effect of motor noise as it is known play on role on arm movement control (Todorov and Jordan, 2002; van Beers et al., 2004). Results are reported in Figure 8*b*. Signal-dependent noise did not change the overall sigmoidal shape of the CoT but just tended to increase the CoT magnitude.

In the preceding simulations, a simple model of the arm mechanics and muscle dynamics was used. We tested the extent to which the sigmoidal shape of the CoT was dependent on the model of the musculoskeletal system. To this aim, we considered that the arm was driven by two antagonist muscles modeled as second-order low-pass filters with specific time constants. When still minimizing the torque change (but controlling the input to muscles), we found that the CoT remained very precisely the same whatever the time constants of the muscles (we tested 0.03–0.15 s). However, when the cost function was modified such that a quadratic cost (i.e., a “neural effort”) was minimized instead of the torque change, the CoT depended on the muscle characteristics (Fig. 8*c*). Importantly, although the precise values of the CoT obviously varied as a function of the muscle time constants, its sigmoidal shape was nevertheless preserved. Its rate of change *g* was robustly characterized by a steep increase until a peak was reached and followed by a slower decay on the time interval where the CoT could be reliably sampled. These tests further show that the neural effort cost, yet another plausible candidate trajectory cost known to predict smooth bell-shaped velocity profiles (e.g., Liu and Todorov, 2007; Guigon et al., 2007), also yields a sigmoidal CoT [especially for σ ≈ 0.04 s where velocity profiles are close to minimum jerk ones, which are known to well replicate human velocity profiles; Richardson and Flash (2002)].

#### Consistency with respect to multijoint dynamics, different initial/final states, and speed instructions

Experimentally, different initial/final arm configurations may lead to the same MT. From the identification standpoint, this may yield several possible values for *g*(*t*) where the CoT theory would assume only one. For the above planar single-joint arm movements, this was, however, not a problem. At the theoretical level, the invariance of *g* with respect to initial/final states follows trivially from the fact that the optimal solutions only depend on the amplitude *a*, and not specifically on the states **x**^{0} and **x*** ^{f}*. This fact is consistent with experimental observations where MT was found to mostly depend on

*a*and neither specifically on the relative starting hand position in the workspace nor on the leftward/rightward direction of the motion for such horizontal single-joint rotations (Gentili et al., 2007). In general, however, reaching involves a nonlinear plant and several degrees of freedom. The situation is thus more complex for a multiarticulated arm, but our methodology allows for sampling

*g*(

*t*) in the exact same way and effortlessly as soon as the associated fixed-time OC problem can be (numerically) resolved.

For a two-joint arm moving in the horizontal plane, the role of the state/control cost *l* is reinforced because the hand path depends on it. Attempting to predict MT would be completely irrelevant if the hand path cannot be accurately replicated when MT is known at least. Here we still considered the minimum torque change model for the trajectory cost because it is known to account well for both the quasi-straight path and bell-shaped velocity profile of the hand during planar point-to-point arm movements (Uno et al., 1989). We used the data of Young et al. (2009) and the reported equations for the amplitude–duration relationship to recover the CoT underlying such multijoint arm reaches. The task is illustrated in Figure 9*a*, and the amplitude–duration mappings at natural and quick speeds are given in Figure 9*b*. The CoT *g* was then sampled pointwise from these linear fits using our methodology (Fig. 9*c*,*d*), and it was extrapolated out of the range of real measurements. Because Young et al. (2009) did not report any leftward/rightward difference regarding MTs, we used the same affine relationship for leftward and rightward movements performed at natural speed. The infinitesimal CoT differed slightly depending on movement direction. Additionally, varying the starting posture of the arm also led to different instantaneous time costs for leftward hand displacements (Fig. 9*c*, left). These CoT variations are due to the arm's inertial anisotropy (Gordon et al., 1994) and the fact that the torque change is a dynamic cost. For the sake of completeness, we also identified the CoT for the hand jerk cost (Flash and Hogan, 1985), and this revealed that the CoT remained exactly the same regardless of movement direction and initial posture (data not shown).

The above variations of CoT were relatively small compared with what was observed during a change of speed instruction (Fig. 9*c*, right). When asking the participants to move quickly, one can observe that both the slope and the intercept of the affine relationship change drastically (Fig. 9*b*). This resulted in very different values for the function *g*. Not only the peak was multiplied by ∼30×, but MT was also penalized slightly earlier (onset time of *g*). This agrees with our theoretical analysis where we showed that globally rescaling the CoT cannot lead to a change in both slope and intercept of the amplitude–duration relationship.

Hitherto, we have identified the CoT from experimental data. We also assessed the extent to which the CoT identified in a given task could be used to predict MTs in another task. In Figure 9*e*, a center-out reaching task was thus considered as it has been the subject of extensive investigations (Gordon et al., 1994; van Beers et al., 2004). Using the CoT identified at natural speed for the central starting position and using the relationship between duration and extent for mediolateral movements, it was possible to predict the directional dependence of MT during a center-out task (Fig. 9*f*). The same could be observed at quick speed using the corresponding CoT. We eventually identified the values of *g* from the data of van Beers et al. (2004) at quick speed for the 16 movement directions and plotted the corresponding values in Figure 9*d* (black dots), where the CoT had been identified from the data of Young et al. (2009). The location of the dots around the CoT uncovered at quick speed confirmed the relative directional invariance of the infinitesimal CoT.

Above all, these findings reveal the robust sigmoidal shape of the CoT with respect to modifications of data fitting, trajectory cost, system dynamics, initial/final states, and task instructions, even though its precise values necessarily depend on all the details of modeling. These present results rule out purely concave or convex time costs for reaching, whereas its concavity (e.g., hyperbolic) was confirmed for saccades.

## Discussion

We addressed the issue of how the brain selects the duration of movement. We relied upon the “cost of time” theory that finds its roots in the brain's tendency to discount the actual value of future reward (Shadmehr, 2010; Shadmehr et al., 2010; Shadmehr and Mussa-Ivaldi, 2012). It asserts that the time elapsed until action completion entails a cost, thereby making slow moves nonoptimal. By means of a thorough mathematical analysis, we developed a complete, robust, and fairly general methodology to automatically identify the CoT from basic experimental data by sampling its infinitesimal values without making prior assumption about its hypothetical nature. An application to reaching revealed that an overall sigmoidal CoT underlies the planning of arm movement. In the following, we discuss the possible origin, rationale, and implication of such a CoT.

The CoT is a generic but appealing concept that has been recently popularized by Shadmehr's group. In particular, it is straightforward to interpret the CoT as a temporal discounting of reward by obvious mathematical equivalence (Shadmehr, 2010), thereby creating a link between the reward system and motor control (Wolpert and Landy, 2012). In most studies investigating this link, however, humans or animals were motivated via incentives, and gratification was often made explicit to them (Battaglia and Schrater, 2007; Dean et al., 2007; Niv et al., 2007; Hudson et al., 2008; Guitart-Masip et al., 2012) (e.g., food, monetary, token, or sensory information). For the neural control of movement, the actual reward assigned by the brain to any given action is rather an intrinsic and hidden quantity. For example, consider reaching for an ordinary object. It is difficult to predict what reward the brain actually assigns to such a task. Furthermore, if the object is a glass of water, the theory predicts that movement vigor should increase when the individual is thirsty, but exactly knowing the extent to which the intrinsic value of reward is modified seems tricky. Interestingly, the present methodology offers the possibility to robustly estimate and differentially compare such intrinsic rewards. Indeed, the reward attributed to the task can be inferred from the asymptotic value of *G*. Then it would be possible to record movements under different conditions and analyze how reward varies, all other things being equal.

However, time could be costly in motor control for reasons other than just a temporal discounting of reward. In particular, slow moves might also be undesirable because they monopolize a significant amount of neural resources due to the inevitable processing of the sensorimotor flow of information by the CNS, and neural processing of information is metabolically expensive (Laughlin, 2001; Lennie, 2003; Howarth et al., 2012). The attentional cost associated with slow-paced movement control could also augment and interfere with the proficiency of achieving other tasks, as suggested by dual-task experimental paradigms (Krampe et al., 2010). Hence, the signaling and processing of a quantity of information that likely grows with time during sensorimotor control might induce such a high cognitive load and large neural metabolic energy demand. Interestingly, a growing body of evidence also suggests that parkinsonian motor symptoms may be linked to neural metabolic deficits (Amano et al., 2014). The logistic shape identified for reaching may thus reflect this dual nature of the CoT. Indeed, its sole interpretation as a temporal discounting of reward might be insufficient to account for the inflection of *G*, and this secondary interpretation as a metabolic cost related to the engagement into a sensorimotor task may be speculated. This is what a recent study tends to confirm for planar arm reaching (Huang and Ahmed, 2012). A surprising constant metabolic cost related to the simple fact of being engaged in a reaching task was discovered. Therefore, the CoT we inferred here may also reflect this part of the metabolic cost that does not depend on the state of the musculoskeletal apparatus but on the underlying neural mechanisms that are involved in on-line movement control and that presumably depend on motion duration. An interesting topic for future research would consist of trying to disentangle these putative reward and metabolic parts of the CoT. This may be possible by tuning the task parameters adequately and by analyzing how the CoT varies accordingly (e.g., Klein-Flügge et al., 2015, where evidence for an inverse sigmoidal effort-related discounting of reward was reported).

It could still be argued that the CoT is just a mathematical epiphenomenon required to compensate for some missing features of the trajectory cost *l*. Besides a temporal discounting of reward, motion duration could indirectly reflect onto other costs, such as accuracy via constant noise accumulation (van Beers et al., 2004; van Beers, 2008) or metabolic energy expenditure (Alexander, 1997; Huang and Ahmed, 2012). For eye movement, it has been suggested that slow saccades are detrimental to movement accuracy as a consequence of constant noise that accumulates over time (van Beers, 2008). This would lead to a trajectory cost exhibiting a “U” shape as a function of MT because endpoint variability would not only increase at large speeds as a consequence of signal-dependent noise (Harris and Wolpert, 1998) but also at small speeds. This would yield optimal MTs without the need to resort to an explicit CoT. Although plausible for saccades and feedforward control, this view is unsustainable for visually guided reaching because sensory feedback is processed on-line to correct for motor errors. For reaching, this is rather the metabolic energy expenditure that may critically increase during slow-paced movement. Gravity could then be a predominant factor for MT formation. If so, MT should drastically increase in weightless conditions, but it is known that cosmonauts increase only slightly MT after adaptation to weightlessness (Papaxanthis et al., 1998). Moreover, criteria counting the effort required to counteract gravitational torques usually fail to predict realistic arm trajectories even in fixed time (Berret et al., 2011), and models minimizing the metabolic cost of muscle contraction had to resort to parameter tuning to be able to replicate empirical trajectories for a wide range of speeds, making it difficult to estimate metabolic energy expenditure across speeds (Alexander, 1997). Interestingly, the metabolic cost per arm movement reported in Huang and Ahmed (2012) exhibited a “U” shape with respect to motion speed, as for the case of walking (Ralston, 1958; Alexander, 1991), but with the difference that the subject's arm was fully supported by a robotic device. This observation further supports that the cost measured at rest or very low speed was mostly due to a metabolic energy expenditure associated with the fact of being actively involved into a reaching task (which would therefore depend on time only) rather than with gravity itself or with the complex energetics of the muscle contraction (e.g., Kushmerick, 2011). Unfortunately, it seems rather difficult from current knowledge to dissociate the part of metabolic energy expenditure that does depend on arm trajectories (and thus contributes to shaping them) from the one that does not depend on them but just on the passage of time.

Whatever the exact nature of this time cost is (temporal discounting of reward, neural/attentional metabolic cost or even something else, e.g., working memory decay), the present methodology allowed us to reliably infer its global shape for times corresponding to empirical reach and saccade durations. For reaching, the time interval approximately ranged from 300 to 1500 ms, whereas for saccades it ranged from ∼30 to 300 ms. Because the two CoTs exhibited different shapes, the question of their link can be posed. On the one hand, it seems conceivable to argue that the time costs underlying saccades and reaches are just independent because the two systems and their functions are different. On the other hand, it also seems possible to assume a single time cost. Indeed, the two CoTs may be juxtaposed as they only slightly overlap on their respective time intervals. Support for this idea was found when assuming zero duration at zero amplitude: in that case, the CoT for reaching had a shape reminiscent of the one for saccades in the vicinity of small times (typically <200 ms). However, such a similarity was only hypothetical because reach data were unavailable on such a range and whether or not the amplitude–duration relationship passed through the origin was undecidable on the present data. Actually, a regression intercept is often assumed in the literature (e.g., Fitts' law) (Young et al., 2009), but its relatively large value for reaching contrasts with the case of saccades. This discrepancy could be due to the role of feedback (which may require MT extensions to improve terminal reach accuracy) or to the incompressible duration of skeletal muscles activation/deactivation (which differ from extraocular muscles), making it impossible to attain zero duration as amplitude tends to zero. We nevertheless showed that this point was not critical regarding our main conclusion about the sigmoidal shape of the time cost over the range of real MTs. Further work, however, seems required to assess the extent to which temporal costs inferred from different motor tasks and systems (e.g., eye, wrist, arm, or whole-body) have compatible shapes on overlapping time intervals and whether or not they can be related.

Although the neural substrate of such a CoT is unknown, one may speculate that it is linked with the dopaminergic system because movement vigor is significantly altered in Parkinson's disease (Mazzoni et al., 2007). Indeed, dopamine plays a central role in the BG, a region related to the control of movement gain (extent and speed) and cost functions (Desmurget and Turner, 2008; Schmidt et al., 2008; Shadmehr and Krakauer, 2008; Turner and Desmurget, 2010; Amano et al., 2014). More specifically, a number of studies showed that transient inactivation or ablation of the globus pallidus internus, the principal output nucleus for the BG, reduced movement velocity and acceleration without altering hand path, reach accuracy, or movement sequencing (Anderson and Horak, 1985; Mink and Thach, 1991; Inase et al., 1996; Desmurget and Turner, 2010). In addition, the striatum could play a critical role in MT because it has been shown to be related to the temporal discounting of reward (Kobayashi and Schultz, 2008). Therefore, valuation of MT could be partly encoded in that area as it is a fundamental center to gauge effort-benefit situations (Croxson et al., 2009) even in the absence of extrinsic reward (Schouppe et al., 2014). It would be the purpose of future studies to investigate the neural correlates of time cost variations arising from task modifications. The present robust and automated methodology may offer such an opportunity by providing the genuine shape of CoT in neural motor control.

## Footnotes

This work was supported by a public grant overseen by the French National Research Agency as part of the Investissement d'Avenir program, through the iCODE Institute project funded by the IDEX Paris-Saclay, ANR-11-IDEX-0003-02. We thank Thomas Deroche, Carole Castanier, and Jean Jeuvrey for help during the acquisition of the experimental data; and François Bonnetblanc, Francesco Nori, and Jérémie Gaveau for useful comments on earlier versions of the manuscript.

The authors declare no competing financial interests.

- Correspondence should be addressed to Dr. Bastien Berret, Université Paris-Sud, Bâtiment 335, Laboratoire CIAMS, F-91405 Orsay, France. bastien.berret{at}u-psud.fr