Abstract
Although the orientation of an arm in space or the static view of an object may be represented by a population of neurons in complex ways, how these variables change with movement often follows simple linear rules, reflecting the underlying geometric constraints in the physical world. A theoretical analysis is presented for how such constraints affect the average firing rates of sensory and motor neurons during natural movements with low degrees of freedom, such as a limb movement and rigid object motion. When applied to nonrigid reaching arm movements, the linear theory accounts for cosine directional tuning with linear speed modulation, predicts a curl-free spatial distribution of preferred directions, and also explains why the instantaneous motion of the hand can be recovered from the neural population activity. For three-dimensional motion of a rigid object, the theory predicts that, to a first approximation, the response of a sensory neuron should have a preferred translational direction and a preferred rotation axis in space, both with cosine tuning functions modulated multiplicatively by speed and angular speed, respectively. Some known tuning properties of motion-sensitive neurons follow as special cases. Acceleration tuning and nonlinear speed modulation are considered in an extension of the linear theory. This general approach provides a principled method to derive mechanism-insensitive neuronal properties by exploiting the inherently low dimensionality of natural movements.
- 3-D object
- cortical representation
- visual cortex
- tuning curve
- motor system
- reaching movement
- speed modulation
- potential function
- gradient field
- zero curl
For natural movements, such as the motion of a rigid object or an active limb movement, many sensory receptors or muscles are involved, but the actual degrees of freedom are low because of geometric constraints in the physical world. For example, as illustrated in Figure 1, the rotation of an object alters many visual cues. How these cues vary in time is not arbitrary but is fully determined by the rigid motion, which has only 6 degrees of freedom. As a consequence, neuronal activity reflecting such natural movements also is likely to be highly constrained and to have only a few degrees of freedom.
This paper presents a theoretical analysis of how neuronal activity correlated with natural movements might be constrained by geometry. The basic theory, although essentially linear, can account for several key features of diverse neurophysiological results and generates strong predictions that are testable with current experimental techniques.
An emerging principle from this analysis is that neuronal activity tuned to movement often obeys simple generic rules as a first approximation, insensitive to the exact sensory or motor variables that are encoded and the exact computational interpretation. Such generic tuning properties are mechanism insensitive because they are better described as reflecting the underlying geometric constraints on movements rather than the actual computational mechanisms. This simplicity arises when sensory or motor variables represent changes in time rather than static values. In the example shown in Figure 1, the viewpoint was fixed and the object was rotated systematically around different axes. The focus is on how neuronal responses depend on the rotation axis in three-dimensional space, given approximately the same view of the object. It is possible to derive a simple cosine tuning rule for the rotation axis, although various visual cues may depend on the static geometrical orientation of the object in complex ways. Three-dimensional object motion is a specific example; the same principles also apply to several other biological systems, including nonrigid arm movement.
DIRECTIONAL TUNING FOR ARM MOVEMENT
Although the visual and the motor examples share similar mechanism-insensitive properties, the reaching arm movement has a simpler mathematical description and more supporting experimental results and will be considered first.
Ubiquity of cosine tuning
A directional tuning curve describes how the mean firing rate of a neuron depends on the reaching direction of the hand. As illustrated in Figure 2, broad cosine-like tuning curves are very typical in many areas of the motor system of monkeys, including the primary motor cortex (Georgopoulos et al., 1986), premotor cortex (Caminiti et al., 1991), parietal cortex (Kalaska et al., 1990), cerebellum (Fortier et al., 1989), basal ganglia (Turner and Anderson, 1997), and somatosensory cortex (Cohen et al., 1994;Prud’homme and Kalaska, 1994). Although the examples shown in Figure 2are two-dimensional, cosine tuning holds as well for three-dimensional reaching movement (Georgopoulos et al., 1986; Schwartz et al., 1988).
The ubiquity of cosine tuning is a hint that this property is generic and insensitive to the exact computational function of these neurons. For example, coding of muscle shortening rate is one theoretical mechanism that can generate cosine tuning (Mussa-Ivaldi, 1988). As another example, many somatosensory cortical cells related to reaching had cosine directional tuning, probably because of the geometry of mechanical deformation of the skin during arm movement (Cohen et al., 1994; Prud’homme and Kalaska, 1994). Because a cosine tuning function implies a dot product between a fixed preferred direction and the actual reaching direction (Georgopoulos et al., 1986), cosine tuning by itself suggests a linear relation with reaching direction (Sanger, 1994), which could arise as an approximation to the activity in a nonlinear recurrent network (Moody and Zipser, 1998). Therefore, cosine tuning curves should be common in a theoretical model that is approximately linear.
Basic theory
In this section we derive a general tuning rule for motor neurons and then discuss its basic properties. This example illustrates what is meant by mechanism-insensitive properties and the general theoretical argument based on geometric constraints.
Consider stereotyped reaching movement in which the configuration of the whole arm is determined completely by the hand position (x, y, z) in space. In other words, such movements have only 3 degrees of freedom. Assume that the mean firing rate of a neuron relative to baseline is proportional to the time derivative of an unknown smooth function of hand position in space. In other words: Equation 1where f is the firing rate, f0is the baseline rate, and Φ is an arbitrary function of the hand position (x, y, z). A possible small time difference between the neural activity and the arm movement may also be included, as appropriate.
The function Φ(x, y, z) could have any form and could include any function of arm configuration, such as muscle length, joint angles, or any combination of those. Mussa-Ivaldi (1988) first used muscle length to demonstrate the appearance of cosine tuning in a two-dimensional situation and pointed out that the argument could be generalized to include other muscle variables. This interesting example illustrates how cosine tuning property might emerge from some simple assumptions. The assumption in Equation 1 is more general and the formalism is simpler than that of Mussa-Ivaldi (1988) because joint angles are no longer used as intermediate variables in the derivation. This makes interpretation easier and more flexible and the curl-free condition more apparent (see below). The precise interpretation of Φ is not the focus of this paper; the only requirement is that it be a function fully determined by the hand position in the three-dimensional space.
We emphasize that although Equation 1 uses hand position as the only free variables, this does not require that the neuron must directly encode the hand position or end-point in particular or kinematic variables in general. Stereotypical reaching movements have only 3 degrees of freedom and can be conveniently parameterized by the hand position (x, y, z), although other parameters can also be used without affecting the final conclusion (see below and Appendix). A neuron related to reaching arm movement should be sensitive to changes of arm posture, which can always be expressed equivalently as changes in some functions of the hand position (x, y, z). The simplest estimate of such changes is the first temporal derivative given in Equation 1. In other words, the above assumption only postulates a general dependence of the firing rate of a neuron on changing arm posture as a first approximation, regardless of which parameters are encoded and how they are encoded.
The assumption in Equation 1 implies that the mean firing rate of a neuron should follow the tuning rule: Equation 2where v = (x˙, y˙, z˙) is the instantaneous reaching velocity of the hand, and the vectorp is the preferred reaching direction, given by: Equation 3The derivation of this result follows immediately from the chain rule: Equation 4For hand movements starting from the same position (x, y, z) in space, the tuning rule in Equation 2 implies cosine directional tuning and linear speed modulation (see Eq. 12). The preferred direction vector p = p(x, y, z) of the neuron may depend on the starting hand position. It can be regarded as a constant vector when the hand is close to its starting position.
For hand movements starting from different positions, the preferred direction vector may vary with the starting hand position (x, y, z) and thus can be visualized as a vector field (Caminiti et al., 1990; Moody and Zipser, 1998). It follows from the gradient formula in Equation 3 that this vector field of preferred direction must have zero curl: Equation 5because of the equality of mixed second partial derivatives of Φ. This means that the components of the preferred direction cannot vary arbitrarily with the starting hand position. An equivalent integral formulation of the curl-free condition is that the path integral of p vanishes along any closed curve in three-dimensional space: Equation 6with dl = (dx, dy, dz), assuming that there are no singularities in the vector field. This constrains how the preferred direction of a neuron should vary with the starting hand position. Any distribution with non-zero curl can be ruled out (Fig.3).
Human eyes are not reliable at judging whether a vector field is curl-free (see Fig. 10), so numerical computation is needed (Mussa-Ivaldi et al., 1985; Giszter et al., 1993). See Appendix for more discussion. A vector field is curl-free if and only if it can be generated as the gradient of a potential function. A more intuitive interpretation of the curl-free condition is that when a vector field is regarded as the velocity field of a fluid, there is no net circulation along any closed path in space.
Under the curl-free condition, the net spike count (integration of the firing rate with respect to baseline over time) can be used to recover the value of the unknown potential function: Equation 7where the integral depends only on the initial hand position (x0, y0, z0) at time 0 and the final position (x, y, z) at time T, not on the exact trajectory of hand movement. For each hand position, the firing rate is the largest when the hand moves along the local gradient of the potential function, which defines p.
Baseline firing rate
The theory in the preceding section does not constrain the baseline firing rate f0, which needs to be considered separately. By definition, the baseline firing rate is independent of the reaching direction, but it may be modulated by several other factors. For example, in the motor cortex, Kettner et al. (1988) have reported that the linear formula: Equation 8approximately described the baseline firing rate while the hand was held fixed at position (x, y, z) in the three-dimensional work space, where a0, a1, a2, a3 are constant coefficients. For reaching at speed v, a more general linear formula for the baseline firing rate is: Equation 9where the coefficients a0, a1, a2, a3, a are independent of the hand position (x, y, z) and the speed v, but may vary with task conditions. For instance, the baseline firing rate when the hand is held still (Fig. 2,horizontal lines) differs from the baseline rate defined as the average of the cosine curve during reaching. Moran and Schwartz (1999) showed that a linear speed term for baseline rate should be included in the fitting formula, although their analysis used the square root of firing rate instead of the raw firing rate. Indirect evidence for a linear speed term in baseline rate is provided by the linear effect of reaching distance (see below).
Note that in Equation 9, the baseline firing rate contains information about both the static hand position (x, y, z) and its speed v. As shown by Kettner et al. (1988), the spatial gradient of the spontaneous firing rate for static hand position tends to be consistent with the preferred direction of the same neuron. In the current theory, this means that the preferred directionp = ∇Φ tends to point in the same direction as the vector (a1, a2, a3) in Equation 9. Therefore, if the potential function Φ = Φ(x, y, z) can be approximated as a linear function in x, y, z, we can replace Equation 9by: Equation 10where k is a constant coefficient. In this case, the overall firing rate of a neuron that obeys the basic tuning rule in Equation 2 would convey two pieces of information: the baseline firing rate f0 would represent the static value of the potential function Φ, and the directionally tuned partp · v would represent the spatial gradient of the same potential function.
Linear theory without gradient
If we simply postulate that the firing rate of a neuron is linearly related to the components of reaching velocity v = (vx, vy, vz), we would have the same tuning rule: Equation 11where the components of the preferred direction, (px, py, pz) ≡ p, are three arbitrary functions of the hand position (x, y, z). For a single starting hand position, this tuning rule is locally indistinguishable from the prediction of the gradient theory. The difference is that now the preferred direction field is not required to be the gradient of any potential function so that its global distribution in hand position space is not constrained at all. In other words, this vector field need not be curl-free. The nongradient theory is more general, allowing a circular distribution of the preferred directions as in Figure 3. The necessary and sufficient condition for the gradient theory to be true is that the preferred direction field is curl-free. The existing data cannot distinguish the two theories (see discussion below and Appendix).
Comparison with experimental results
Data from a wide range of motor-related brain areas largely confirm the tuning rule in Equation 2 as a reasonable approximation, together with its various ramifications as follows. Theoretical predictions such as the curl-free distribution remain to be tested.
Cosine directional tuning and multiplicatively linear speed modulation
The tuning rule in Equation 2 captures two main effects: cosine directional tuning and multiplicatively linear speed modulation, as clearly seen in its equivalent form: Equation 12where v = ‖v‖ is the reaching speed, the proportional factor p = ‖p‖ is length of the preferred direction vector p, and α is the angle between the instantaneous reaching direction and the preferred direction. Because the hand trajectory is approximately straight in normal reaching, the instantaneous velocity v is a vector that points in the same direction as the reaching direction. If a tuning function is cosine in three-dimensional space, it must also be cosine in any two-dimensional subspace, as in the examples in Figure 2.
A cosine function is a good approximation to the directional tuning data, although a circular normal function (Eq. EA17), with one more free parameter, tends to fit the data slightly better (Fig. 2). The residual can be roughly accounted for by an additional Fourier term, cos 2α, with an amplitude less than ∼10% of that of the original term, cos α (see further discussion in Appendix ).
The speed modulation effect predicted by Equation 12 is multiplicative; that is, the firing rate should be higher for faster reaching speed without affecting the shape of the cosine tuning function. This is approximately true as shown by Moran and Schwartz (1999), who, however, used the square root of firing rate in analysis so that the linearity of speed modulation on raw firing rate was not directly quantified. Indirect evidence for linear speed modulation includes trajectory reconstruction and the curvature power law (see below).
Neuronal population vector
Suppose the firing rate of each neuron i in a population (i = 1, 2, … , N) follows the same tuning rule as considered above: Equation 13The population vector u is defined as the vector sum of the preferred directions pi weighted by firing rates relative to baselines (Georgopoulos et al., 1986): Equation 14where in the second step, Equation 13 is used. For the population vector u to be proportional to the true velocityv, namely: Equation 15the necessary and sufficient condition is that the preferred directions satisfy: Equation 16where λ is an arbitrary constant, I is 3 × 3 identity matrix, each pi is a column vector, and piT is a row vector (Mussa-Ivaldi, 1988; Gaál, 1993; Salinas and Abbott, 1994;Sanger, 1994). In particular, when pi are distributed uniformly, as is roughly true for cells in motor cortex (Georgopoulos et al., 1988), the condition in Equation 16 is satisfied so that Equation 15 follows as a consequence. Then the population vector approximates the reaching direction and reaching velocity (Moran and Schwartz, 1999).
Trajectory reconstruction
One implication of Equation 15 is that integration over the population over time can reconstruct the hand trajectory, up to a scaling constant: Equation 17where r(t) is hand position at time t. This is consistent with the finding that adding up the population vector head-to-tail approximately reproduced the shape of the hand trajectory (Schwartz, 1993, 1994), because head-to-tail addition is a discrete approximation to the continuous vector integration.
Curvature power law
While drawing, the hand moves more slowly when the trajectory is more highly curved, and obeys a power law: Equation 18where ω is instantaneous angular velocity with respect to an instantaneous center determined by the local curvature κ of the trajectory, and B is a constant (Lacquaniti et al., 1983).Schwartz (1994) showed that the changing direction of the population vector of cells in motor cortex of monkeys followed the same power law during drawing. This is consistent with Equation 15, which requires that the population vector u be proportional to the instantaneous hand velocity v, up to a possible time difference. This form of the power law involves only the direction of population vector u. To test its length u = ‖u‖ or the linearity of firing rate modulation by reaching speed, one may use the equivalent form of the power law: Equation 19where v = rω is the hand speed and r = 1/κ is the local radius for the curvature of the trajectory. The length of the population vector u is proportional to the hand speed v if and only if the population vector follows the same power law in Equation 19.
Reaching distance
Fu et al. (1993) reported a nearly linear correlation between firing rates of cells in motor cortex and reaching distance. Although this result was somewhat confounded by faster reaching for longer distances, it raises the question of the general effect of reaching distance. A linear distance effect would be consistent with the basic model in Equation 2, which implies that: Equation 20where vector d =r(T) − r(0) is the final displacement from the starting position r(0), assuming the preferred direction p is approximately constant along movement trajectory. The dot product p · dimplies a linear relation between the reaching distance and the total spike count above baseline, together with a cosine directional tuning, regardless of the exact time course of hand velocity.
Note that in Equation 20 the baseline rate f0has been subtracted. Because the baseline rate itself may contain a linear speed component as in Equation 9 (Moran and Schwartz, 1999), its contribution to total spike count should be: Equation 21where, for simplicity, a1 = a2 = a3 = 0 has been assumed to ignore the effect of static hand position. Because the last term is proportional to the reaching distance ‖d‖ but independent of the reaching direction, it might account for the observation that the modulation of overall firing rates by reaching distance was often linear but insensitive to the reaching direction (Fu et al., 1993; Turner and Anderson, 1997).
Curl-free distribution of preferred direction
Caminiti et al. (1990, 1991) reported that the preferred direction of a motor cortical neuron often varied with the starting point of hand movement. This is allowed by the gradient theory, provided that this vector field is curl-free, according to Equation 5 or 6. A constant preferred direction field is always allowed because it has zero curl. The curl-free condition constrains how the preferred direction of a neuron may vary in different parts of space. For example, it rules out the possibility of any circular arrangement of the preferred directions, such as that in the two-joint planar arm example shown in Figure 3. The existing data do not include enough points to compute the curl (see Appendix ). Further experiments would be needed to test whether the prediction of the gradient theory is correct.
Elbow position
Scott and Kalaska (1997) found that the preferred directions of some motor cortical cells were altered when the monkey had to reach unnaturally with the elbow raised to shoulder level. In the current theoretical framework, adding elbow position as a free parameter is equivalent to adding one rotation variable ϕ, for example, the angle between the horizontal plane and the plane determined by the hand, elbow, and shoulder. The same theoretical argument yields the tuning rule: Equation 22where K is a coefficient that may depend on both hand position and elbow position. This formula implies two new effects. The first is that now the preferred direction vector, both its direction and length, may depend on the elbow position ϕ as well as the hand position (x, y, z): Equation 23as reported by Scott and Kalaska (1997). The second effect, a new prediction, is that the firing rate may contain a component proportional to the angular speed dϕ/dt of elbow rotation.
How does this case relate to our earlier results with hand position as the only free parameter? In the preceding sections, reaching was assumed to be “stereotypical” in the sense that the elbow position can be determined completely by the hand position, ignoring forearm rotation. This assumption may not be true if the final posture sometimes depends also on the initial hand position (Soechting et al., 1995). However, when comparing reaching movements starting from the same initial hand position, it is reasonable to assume that for stereotypical reaching, the elbow angle ϕ can be completely determined by the hand position (x, y, z), or ϕ = ϕ(x, y, z). Then the time derivative of ϕ, after expanding by the chain rule, can be absorbed into the termp · v, yielding the original basic tuning rule in Equation 2. In other words, the assumption of stereotypical movement reduces the total degrees of freedom to 3, eliminating the elbow position as an independent variable. Although the elbow angle can still be used as a free parameter, it is no longer independent of the hand position. Only three parameters are independent in this case, and their exact choice does not affect the general form of the tuning rule (see Appendix for more discussion on coordinate-system independence).
Summary and discussion of more complex cases
As shown above, the basic tuning theory can naturally account for several important experimental results without making any specific assumptions about the exact variables encoded or details of the encoding. These results are generic properties independent of the exact functional interpretations. This generality makes sense because during stereotypical movement, redundant variables are inevitably constrained by the geometry and become highly correlated, so that they are likely to show similar tuning properties of the same general type. The theory presented here has formalized this intuition.
The relationship between cosine tuning properties and geometric constraints is also apparent in the studies of muscle activities and actions during reaching and isometric tasks. Basic properties resembling those for motor cortical cells have been reported, including approximately cosine directional tuning curves (but often with a small secondary peak opposite the preferred direction), speed sensitivity, and posture dependence (Flanders and Soechting, 1990; Flanders and Herrmann, 1992; Buneo et al., 1997).
The basic theory needs to be generalized in situations where the hand position is not the only free parameter. For example, force is one variable that is often correlated with the activity of motor cortex; recent examples related to directional tuning include tasks with static load (Kalaska et al., 1989) and varying isometric forces (Georgopoulos et al., 1992; Sergio and Kalaska, 1997).
As another example, preparatory activity in motor cortex before onset of movement can reflect the upcoming reaching direction, as is especially evident during instructed delay (Georgopoulos et al., 1989a), and can change rapidly in tasks requiring mental rotation (Georgopoulos et al., 1989b) or target switching (Pellizzer et al., 1995).
Moreover, when sensory and motor components were decoupled, some neurons even from primary motor cortex were more closely related to the visual movement of a cursor on the computer screen than to the joystick position or hand movements, in both one-dimensional (Alexander and Crutcher, 1990) and two-dimensional tasks (Shen and Alexander, 1997a). By contrast, in virtual reality experiments with visual distortion, motor cortical activity mainly followed the actual limb trajectory rather than the animal’s visual perception (Moran et al., 1995).
In addition, some differences exist among the neural activity from different brain areas, although they all show approximate cosine directional tuning (compare Fig. 2). For instance, compared with neurons in the motor cortex in a reaching task, the preferred directions in the cerebellum are more variable in repeated trials (Fortier et al., 1989), neurons in the parietal cortex are less sensitive to static load (Kalaska et al., 1990), and neurons in the premotor cortex are activated earlier, more transiently (Caminiti et al., 1991; Crammond and Kalaska, 1996), and affected more frequently by visual cues (Wise et al., 1992; Shen and Alexander, 1997b). In the motor cortex and elsewhere, there also exist neurons with complex properties that are either not task-related or hard to describe but still could have useful functions in a distributed network (Fetz, 1992;Zipser, 1992; Moody et al., 1998).
In most of these cases, there are additional free variables besides hand position. The linear theory may still yield useful results in these more complex cases after including these additional variables. For example, the planned movement direction is an independent variable, which could be used to describe some preparatory activity before overt hand movement. These new variables should be included when deriving the tuning rule, as demonstrated in the preceding section by adding the elbow position as a free variable in abducted reaching.
REPRESENTING RIGID OBJECT MOTION
The same geometric argument for arm movement can be applied to moving rigid objects, which have additional rotational degrees of freedom around an axis in space (Fig. 1). In the following, we derive a general tuning rule for rigid motion, discuss its basic properties, and then contrast the results with concrete models of visual receptive fields.
Description of rigid object motion
Arbitrary instantaneous motion of a rigid object can always be described by a rotation plus a translation (Fig.4), but given the same physical motion, this description is ambiguous up to an arbitrary parallel shift of the rotation axis. For example, translational velocity can always be aligned instantaneously with the angular velocity to obtain a screw motion by passing the rotation axis through the point of zero velocity in a perpendicular plane (Fig. 4).
This ambiguity disappears when the rotation axis is always required to pass through the same reference center in the object, say, the center of mass. We assume that the reference center has been chosen so that a rigid motion can be described uniquely by a translational velocity and an angular velocity. We return to this topic later.
The static position and orientation of a rigid object can be specified by six independent parameters: Equation 24where x, y, z describe the position of the reference center of the object with respect to a coordinate system fixed to the world, and θ1, θ2, θ3 are three angular variables that represent the object’s orientation. The translational velocity of the object is: Equation 25The angular velocity ω = (ωx, ωy, ωz)T in world coordinates is always linearly related to the time derivatives of the orientation variables θ˙ = (θ˙1,θ˙2,θ˙3)T: Equation 26where M is an invertible 3 × 3 matrix that depends only on the orientation (θ1, θ2, θ3). For example, when Euler angles are used to describe orientation (Fig.5), we have: Equation 27and Equation 28which is invertible as long as det M = sin θ ≠ 0 (Goldstein, 1980).
Only the abstract linear relation in Equation 26 is needed in the next section. The actual choice of (θ1, θ2, θ3) is unimportant here. Because the time derivatives of different sets of variables are linearly related by a Jacobian matrix, Equation 26 always holds regardless of the exact choice of the parameterization of orientation (see also Appendix on independence of the coordinate system).
Tuning rule for rigid motion
Consider neuronal activity associated with motion of a rigid three-dimensional object. Assume that the mean firing rate of a neuron relative to baseline, with a possible time delay, is proportional to the time derivative of a smooth function of the position and orientation of the object in three-dimensional space.In other words: Equation 29where f is the firing rate, f0is the baseline rate, and Φ is an arbitrary function of object position (x, y, z) and orientation (θ1, θ2, θ3), as described in the preceding section. This equation is analogous to Equation 1.
The exact form of function Φ need not be specified here. It may depend on both the receptive field properties of the cell and the visual appearance of the object and its surroundings. This formulation is quite general. For example, all the visual cues of the object illustrated in Figure 1 are functions of the position and orientation of the object that completely determine how light is reflected from various surfaces, whether diffuse (uniform scattering in all directions) or specular (energy concentrated around the mirror reflection direction), giving rise to various visual effects such as shading, shadows, specular reflections, and highlights (Watt and Watt, 1992). Given that all sensory cues are determined completely by the position and orientation of the object, we expect a motion-sensitive neuron to respond to changes of these variables. The simplest way to estimate these changes is to compute the first temporal derivative.
The assumption in Equation 29 allows us to derive a general tuning rule for neurons sensitive to three-dimensional object motion. Given a three-dimensional object moving at instantaneous translational velocityv and angular velocity ω, the mean firing rate of a generic neuron should depend on these variables in a highly stereotyped way: Equation 30where f0 is the background firing rate,p is the preferred translational direction, given by: Equation 31and vector q is the preferred rotation axis,given by: Equation 32with matrix M as in Equation 26, and: Equation 33is an intermediate vector variable, the transformed preferred rotation axis in the orientation angle space. Both the preferred translational direction p and the preferred rotation axisq are vectors in the physical space. They may depend on the object and its position and orientation but not on the translational velocity v and angular velocity ω. The derivation of Equation 30 follows from the chain rule: Equation 34where Equations 25 and 26 and the definitions in Equations 31-33have been used. The derivation of the tuning rule does not depend on which coordinate system is used (Appendix ).
Before explaining the meaning of the tuning rule in the next section, first consider the baseline firing rate, which is not constrained by the present theory and thus requires separate consideration. The baseline firing rate may itself be modulated by several factors, and the simplest linear model is: Equation 35where ai, bi, a, b are constants, and the position (x, y, z) and the orientation (θ1, θ2, θ3) of the object are included as possibly relevant factors related to the static view, together with the translational speed v and the angular speed ω for object motion, which may also be relevant. This linear equation generalizes Equation 9 for motor neurons. Similarly, Equation 10 can also be generalized by including angular position and speed. This assumes that the baseline firing rate in general may contain information about both the static configuration of an object and its instantaneous motion.
Cosine tuning and multiplicative speed modulation
The basic tuning rule in Equation 30 can be rewritten in its equivalent form: Equation 36where v = ‖v‖ is the speed of translation, ω = ‖ω‖ is the angular speed of rotation, p = ‖p‖ is the length of the preferred direction vector, q = ‖q‖ is the length of the preferred rotation vector, α is the angle between vectors p and v, and β is the angle between vectors q and ω.
In other words, given the particular view of a particular object, the response above baseline should be the sum of two components, one translational and one rotational. The translational component is proportional to the cosine of the angle between a fixed preferred translational direction and the actual translational direction. In addition, it is also modulated linearly by the speed of translation, which does not alter the shape of the tuning curve. Similarly, the rotational component is proportional to the cosine of the angle between a fixed preferred rotation axis and the actual rotation axis. In addition, the rotational component is also modulated linearly by the angular speed of rotation.
Distribution of preferred direction and preferred axis
Thus far, the view of the given object is assumed to be fixed. That is, the cosine tunings for both translation and rotation are defined with respect to a particular view of the object. When the view of the object changes, the preferred translational directionp and preferred rotation axis q of a motion-sensitive neuron may also change.
The theory constrains this change because the preferred translational direction p and the transformed preferred rotation axisq* are derived as gradient fields in Equations 31 and 33. Here the intermediate vector q* is related to the preferred rotation axis q in physical space by: Equation 37according to Equation 32. In three-dimensional space, where curl is defined, the gradient field implies that any three variables taken from the six variables (x, y, z, θ1, θ2, θ3) must be curl-free. For example, when the position (x, y, z) of the object is fixed, the distribution of the transformed preferred rotation axis in the orientation space (θ1, θ2, θ3) must be curl-free: Equation 38Any hypothetical neurons with non-zero curl can be ruled out by this condition (see below). For a gradient field, the zero curl is simply attributable to the equality of mixed second partial derivatives of the potential function, which holds also in higher dimensions. The equivalent path integral formulation is valid also in all dimensions: Equation 39along any closed curve in the six-dimensional space, where dl = (dx, dy, dz), and dθ = (dθ1, dθ2, dθ3). Another equivalent formulation is that the potential function Φ can be constructed by the path integral: Equation 40which depends only on the end points, not on the exact path. Here ξ = (x, y, z, θ1, θ2, θ3) is an arbitrary point in the parameter space, and ξ0 is the value at a given initial point. Therefore, in the gradient theory, how the preferred translational direction and the preferred rotation axis of a neuron change with the view of a given object cannot be arbitrary but is highly constrained. This can provide testable predictions (see below).
Linear nongradient theory
A more general theory can be obtained by directly assuming a linear relationship between the firing rate and the components of the translational velocity v = (x˙, y˙, z˙) and the time derivatives of the angular variables θ = (θ˙1,θ˙2,θ˙3). This yields the same tuning rule: Equation 41 where p = (px, py, pz) and q* = (q*1, q*2, q*3) are arbitrary vector fields, not necessarily gradient fields, and Equations 26 and 32 are used in the last step. This tuning rule gives the same response properties predicted by the gradient theory for a single view of the object. The difference shows up when the view changes. The nongradient theory imposes no constraint on how preferred translational direction and preferred rotation axis should vary with the view of the object. The gradient theory is more restrictive, and therefore makes stronger predictions.
Change of reference center
Because the description of the same physical motion of a rigid object is ambiguous up to a parallel shift of the rotation axis (Fig.4), we have assumed in the above that the rotation axis always passes through the same reference point c = (x, y, z) in the object to ensure uniqueness of description. When a different reference center c′ is chosen, the form of the basic tuning rule in Equation 30 remains valid, but the preferred rotation axis is affected in a predictable way: Equation 42where v′ and ω′ are the translational velocity and angular velocity for the new reference centerc′, and: Equation 43 Equation 44are the new preferred translational direction and rotation axis. One can readily verify that Equation 42 is valid under Equations 43 and44, using the relations: Equation 45 Equation 46Therefore, changing the reference center of an object has no effect on the preferred translational direction of a neuron (Eq. 43), whereas the preferred rotation axis is altered systematically in a completely predictable manner (Eq. 44). These relations arise purely from the ambiguity of the description of rigid motion, and thus apply to both the gradient and the nongradient theories.
Summary
Simple assumptions have led to a general tuning rule for how the mean firing rate of a neuron should depend on the instantaneous motion of an arbitrary rigid object. For each given view of the object, the firing rate is predicted to be the sum of two terms, one for the translational motion component and one for the rotational motion component, both with cosine directional tuning and linear speed or angular speed modulation. In general, the preferred translational direction and the preferred rotation axis may depend on the identity of the object as well as its view. This tuning rule is a linear approximation to the geometry of rigid motion and therefore should obtain regardless of the exact computational mechanisms involved. In other words, this rule is expected to be a robust property for motion-sensitive neurons responding to realistic moving objects. When the view of the object changes, both the preferred translational direction and the preferred rotation axis of a neuron may change as well. The gradient theory provides additional constraints on such changes, whereas the nongradient theory imposes no further constraints. As a consequence, these two theories can be distinguished by further experiments. Finally, although the description of rigid motion is ambiguous up to a parallel shift of the rotation axis, the effects on the tuning rule are completely predictable and therefore convey no additional information about the response properties of a neuron.
Examples of motion-sensitive receptive field models
Many neurons in visual cortex, particularly in the dorsal stream leading to parietal cortex, respond selectively to visual motion. Here we consider three-dimensional rigid motion and examine several simple computational mechanisms that yield explicit analytical formulas for the preferred translational direction and the preferred rotation axis. For each fixed view of the object, the results are consistent with the basic tuning rule in Equation 30. For different views, however, the global gradient-field condition for the preferred axes can be violated by the idealized velocity component detectors. This shows that the neuronal behavior predicted by the gradient theory is not always identical to that of an optic-flow detector.
Velocity component detectors
As illustrated in Figure6A, suppose the firing rate of an idealized neuron detects local motion on the image plane according to: Equation 47where f0 is the baseline firing rate,p is the preferred direction of visual motion, andu is the local velocity on the image plane inside a small receptive field of the detector. This local velocity detector resembles some neurons in the middle temporal area (MT) of monkey, as discussed in the section after Equation 72.
This idealized neuron obeys the basic tuning rule in Equation 30, namely: Equation 48in response to a textured rigid object moving at translational velocity v and angular velocity ω. Here the preferred direction p is the same constant vector as in Equation 47, and the preferred rotation axis is given explicitly by: Equation 49where c is a fixed reference center in the object (center of the sphere in Fig. 6B), and r is the coordinate of the point in the object that happens to fall into the vanishingly small receptive field of the detector. As shown in Figure6A, the geometry of the situation is quite simple, with the two orthogonal vectors q and r −c both lying on the horizontal plane. For fixed angular speed, using q as the rotation axis maximizes the response of the velocity detector.
To derive these formulas, note that a point in the object with coordinate r has the velocity: Equation 50on the image plane (Fig. 6A). The desired equations can be obtained by inserting this into Equation 47 and using the vector identity: Equation 51Next, consider a higher-order neuron whose response is the sum of the outputs of several local velocity component detectors: Equation 52where pi is the preferred direction of detector i, and ui is the image velocity in its receptive field. This neuron also obeys the same basic tuning rule in Equation 48, with the preferred translational direction and preferred rotation axis given by: Equation 53 Equation 54For example, when two detectors are arranged as shown in Figure6B, the preferred rotation axis q is perpendicular to the image plane, whereas the preferred translational axis vanishes (p = 0) because the image size is constant under orthographic projection.
Finally, the basic tuning rule in Equation 48 still holds for image motion of a rigid object under a perspective projection, which projects each point (x, y, z) in the real world toward the observer at the origin (0, 0, 0), leaving an image at (X, Y) in the image plane at z = η: Equation 55(Longuet-Higgins and Prazdny, 1980; Heeger and Jepson, 1990). On the image plane, suppose the velocity component detector i is located at (Xi, Yi) with preferred direction pi = (pi1, pi2), then the preferred direction and preferred rotation axis become: Equation 56 Equation 57where ( i, i, i) = (xi − x, yi − y, zi − z)/zi with (x, y, z) the reference center of the object, and zi is the z -coordinate in real world for the physical point that happens to activate detector i. Unlike the orthographic projection in Figure 6, this mechanism allows a neuron to respond to looming or shrinking images, because it does not confine the preferred translational direction p to the image plane.
Spatiotemporal receptive field
Now consider motion-sensitive linear spatiotemporal receptive fields that obey the basic tuning rule in Equation 30 with a known potential function. Let I(X, Y, t) describe the intensity of an image at location (X, Y) on the image plane at time t, ignoring color and stereo. Suppose the firing rate of a neuron with linear receptive field F(X, Y) is linearly related to how fast the overlap between the image and receptive field is changing: Equation 58where the inner product is defined by: Equation 59This inner product can serve as the potential function postulated in Equation 29: Equation 60Therefore, the basic tuning rule in Equation 30 must hold true, and the preferred translational direction and rotation axis are: Equation 61 Equation 62following Equations 31-33. Here the partial derivatives are with respect to the implicit variables (x, y, z, θ1, θ2, θ3) for the position and orientation of a moving object that generates the image.
More generally, consider a neuron with an arbitrary linear spatiotemporal receptive field G so that its firing rate is: Equation 63This becomes identical to Equation 58 for the kernel function: Equation 64where δ is Dirac delta function, which can be approximated by any narrow and normalized smooth function peaked at the origin. If Equation 64 is a reasonable approximation to the spatiotemporal receptive field, then the basic tuning rule in Equation 30 as well asp and q given by Equations 61 and 62 becomes valid. For nonlinear mechanisms, such as squaring (Adelson and Bergen, 1985) and normalization (Heeger, 1993), the above consideration may apply only after local linearization.
Existence of a global potential function
In all the concrete examples considered above, the basic tuning rule in Equation 30 holds true for each given view of an object. However, for a single view, the gradient and nongradient theories are indistinguishable. By assumption, the nongradient theory allows arbitrary preferred translational direction p and preferred rotation axis q. For a given view, the gradient theory can also generate any desired constant vectors p andq from the gradients of the following potential function: Equation 65where q* = qM is taken as a constant vector,c = (x, y, z) is the reference center of the object, and θ = (θ1, θ2, θ3) describes the orientation of the object.
The gradient theory is globally correct only when a potential function exists for all views of the object. This is the case for the linear spatiotemporal model, where the potential function can be given explicitly (Equation 60). By contrast, for the idealized velocity component detector, a global potential function in general does not exist, as shown in Example 1 below.
Because the existence of a potential function does not depend on the choice of the coordinate system (see Appendix ), we only need to show that a potential function does not exist in the Euler angle space: (θ1, θ2, θ3) = (θ, φ, ψ), assuming that the center of the object is fixed. In this three-dimensional space, a potential function exists if and only if the distribution of the transformed preferred rotation axis q* is curl free. Now consider two special examples that do not admit a global potential function:
Example 1: Constant preferred rotation axis fixed to the world. An explicit example is the model in Figure 6B, where the preferred rotation rotation axis q is the same regardless of the orientation of the spherical object. Here it is assumed that the velocity component detector has two vanishingly small receptive fields that can nevertheless detect the true local velocity components regardless of the orientation of the object. Without loss of generality, take the preferred rotation axis as a unit vector in the negative Y -axis: Equation 66and then compute its counterpart vector in the Euler angle space: Equation 67It is verified that curl q* as defined in Equation 38does not vanish. This proves that the desired potential function cannot exist for this hypothetical neuron (Fig.7).
Example 2: Constant preferred rotation axis fixed to the object. A possible example is a vestibular neuron receiving input from only a single semicircular canal without other influences such as that from the otolith. Then the firing rate has a cosine tuning with respect to a preferred rotation axis fixed on the head of the animal, regardless of the orientation of the head in the world (Baker et al., 1984; Graf et al., 1993). Without loss of generality, let the preferred rotation axis be a unit vector in the positive Z′ axis of the object (head), then in world coordinates this axis is: Equation 68so that its counterpart vector in the Euler angle space is: Equation 69Here curl q* is not zero, proving the nonexistence of the desired potential function.
Therefore, there are simple computational mechanisms that can violate the gradient theory. As shown above, the gradient theory prohibits a neuron from having a truly invariant preferred rotation axis fixed either to the world or to the object. These neurons, however, are allowed by the nongradient theory. In particular, the prediction of the gradient theory can differ from that of an optic-flow sensor. As another example, the preferred translational direction field for a small moving dot may also have non-zero curl when measured at different regions inside a large MST-like receptive field that has circular arrangement of local preferred directions (Saito et al., 1986). These idealized examples demonstrate that the global property of the gradient theory is quite restrictive, which, however, makes its prediction strong and refutable. Experiments could be performed to test whether the gradient theory accounts for the neuronal responses to three-dimensional object motion.
COMPARISON WITH EXPERIMENTAL RESULTS OF SINGLE NEURONS
The tuning rule for reaching arm movement in Equation 2 is a special case of the general tuning rule in Equation 30 without the rotational terms. Biological evidence from the motor system in support of the tuning rule has already been considered in the preceding sections. In this section we examine several additional biological examples that are consistent with some special cases of the general tuning rule and then discuss more comprehensive tests for moving rigid objects.
One-dimensional example: hippocampal place fields on a linear track
For one-dimensional movement, the linear tuning theory predicts only linear speed modulation, without further constraint on the tuning function. The firing rate is given by: Equation 70where x is the variable of interest, v = dx/dt is the speed, and φ(x) is the gradient of function Φ. This is a special case of the general tuning rule in Equation 36, keeping only the translational term, with cos α = 1 and p = φ(x). Function φ(x) is allowed to be completely arbitrary because a potential function: Equation 71can always be constructed. For example, the firing rates of hippocampal place cells are modulated by running speed when a rat moves on a narrow track (McNaughton et al., 1983). In this one-dimensional problem, Equation 70 does not constrain the tuning function φ(x), here interpreted as a place field, describing the mean firing rate at spatial position x. When averaged over a population of simultaneously recorded place cells to get enough spikes, the average firing rate was indeed remarkably linearly related to the running speed, with a correlation coefficient >0.95 over the full range of speeds (Zhang et al., 1998), in agreement with Equation 70. A similar linear relationship was also reported recently for rats on a running wheel (Hirase et al., 1998). The theory gives a correct tuning rule without reference to the underlying biological mechanisms.
Two-dimensional example: local translational visual motion
Neurons in middle temporal area (MT or V5) of monkeys respond selectively to the direction of local visual motion (Zeki, 1974;Maunsell and Van Essen, 1983; Albright, 1984), although they are also affected by other factors, such as surround motion (Allman et al., 1985; Tanaka et al., 1986; Raiguel et al., 1995), pattern motion (Movshon et al., 1985), transparency (Stoner and Albright, 1992; Qian and Andersen, 1994), and form cues (Albright, 1992). Consider the following formula obtained by keeping only the translational term in Equation 30: Equation 72which has been used for reaching arm movement (Eq. 2). This tuning rule also resembles the directional sensitivity of MT neurons (Zhang et al., 1993; Buračas and Albright, 1996), where p is the preferred direction of the neuron, and v is the velocity of the stimulus inside the receptive field.
This simple formula can capture two primary features of many MT neurons: a broad directional tuning curve, and speed modulation without changing the shape of the tuning curves (Rodman and Albright, 1987), while setting aside various other properties accounted for by more detailed models (Sereno, 1993; Nowlan and Sejnowski, 1995;Buračas and Albright, 1996; Simoncelli and Heeger, 1998). For many MT neurons, the tuning curves are often sharper than cosine, in which case a circular normal curve in Equation EA17 might provide better fit because of its closeness to a Gaussian (Albright, 1984). Linear speed modulation is probably a reasonable approximation for some neurons when the velocity is slow, but typically firing rates often decrease after reaching a peak at an optimal speed (Maunsell and Van Essen, 1983). It would be interesting to test whether speed modulation is linear when averaged over raw firing rates for a large population of neurons, especially under ecologically plausible stimulus conditions. The above consideration may also apply to many V4 neurons, which responded to visual motion response in an MT-like manner (Cheng et al., 1994). Cosine tuning curves for translational motion have also been described in the cerebellum (Krauzlis and Lisberger, 1996) and the parietal area 7a (Siegel and Read, 1997).
Three-dimensional object motion
Spiral motion
No direct experimental data are available on how a neuron responds systematically to a realistic moving three-dimensional object with arbitrary translation and rotation. One closely related example is the broad tuning of some neurons to spiral visual motion, which may be generated plausibly by a large moving planar object facing the observer.
As shown in Figure 8, neurons in monkey visual medial superior temporal area (MST), which receive a major input from area MT, typically respond well to wide-field random-dot spiral motion patterns (Graziano et al., 1994). Most neurons in the ventral intraparietal area (VIP) are also sensitive to visual motion (Colby et al., 1993), and some have tuning properties to spiral motion similar to those in area MST (Schaafsma and Duysens, 1996), probably due to input directly from MST and/or integration of inputs from area MT. Area 7a is at a higher level than MST and might have more complex response properties for optic flows (Siegel and Read, 1997). In theory, it is possible to build an MST-like neuron from MT-like local motion inputs, even with position-invariance properties (Saito et al., 1986; Poggio et al., 1991; Sereno and Sereno, 1991; Zhang et al., 1993). Tuning to spiral motion was predicted based on Hebbian learning of optic flow patterns (Zhang et al., 1993) and by other unsupervised learning algorithms (Wang, 1995; Zemel and Sejnowski, 1998).
To explain spiral tuning in terms of rigid motion, regard the environment itself as a large rigid object, moving relative to the observer. For the experiments mentioned above, the environment may be considered as a finely textured screen, oriented vertically, facing the observer. Translating this screen toward or away from the observer induces expansion or contraction, whereas rotating the screen around a perpendicular axis induces circular motion. According to the basic tuning rule in Equation 36, a neuron should respond to arbitrary motion of this screen with the firing rate: Equation 73where v is the translational speed, ω is the angular speed, and: Equation 74are constants, where α is the angle between the preferred translational direction and the actual translational direction, and β is the angle between the preferred rotation axis and the actual rotation axis.
To see why this accounts for spiral tuning, write Equation 73 in the equivalent form: Equation 75where s = (v, rω) is the spiral composition vector describing the stimulus, w = (P, Q/r) is the preferred spiral composition vectordescribing response properties of the neuron, r is a constant length, introduced to make v and rω of the same units, and γ is the angle between s andw. In the polar diagrams in Figure 8, the horizontal and vertical axes correspond to the two components of the stimulus spiral composition s, whose length: Equation 76was fixed during the experiments. It follows from Equation 75 that the response should fall off smoothly when the stimulus spiral composition becomes different from the preferred composition, in proportion to the cosine of the angle γ between them. This broad tuning to spiral motion is generally consistent with the data (Graziano et al., 1994; Schaafsma and Duysens, 1996), although a circular normal function in Equation EA17, with one more free parameter, may provide better fits than a cosine function (Fig. 8).
The above interpretation implies that firing rate should scale linearly with translational speed and angular speed independently of the spiral tuning curve. The responses of most MST neurons do indeed depend on speed (Tanaka and Saito, 1989; Orban et al., 1995), and many are monotonically increasing (Duffy and Wurtz, 1997a). It would be interesting to test quantitatively how well the linearity holds when averaged over a population of cells, especially for an ecologically relevant range of motion.
The tuning rule in Equation 73 also implies that the response should depend on the focus of expansion or the translational component in the optic flow, which also occurs for many MST cells (Duffy and Wurtz, 1995, 1997b). Adding a translational velocity vector to the stimulus corresponds to translating the stimulus screen sideways, which affects the angle α in Equation 74 and thus the response in Equation 73. Changing the translational direction and the rotation axis of the stimulus screen can alter both angles α and β in Equation 74 and thus the predicted response in Equation 73.
Motion-sensitive neurons in area MST may be used for purposes such as estimating heading or self-motion (Perrone and Stone, 1994; Lappe et al., 1996) or segmenting multiple moving objects (Zemel and Sejnowski, 1998). MST responses can be affected by various factors, including, for example, surround motion (Tanaka et al., 1986; Eifuku and Wurtz, 1998), disparity (Roy et al., 1992), eye position and movement (Newsome et al., 1988; Bradley et al., 1996; Squatrito and Maioli, 1997), vestibular input (Thier and Erickson, 1992), form cues (Geesaman and Andersen, 1996), the presence of multiple objects (Recanzone et al., 1997), and attention (Treue and Maunsell, 1996). Most experiments used simplified stimuli, although more realistic stimuli were tested recently (Sakata et al., 1994; Pekel et al., 1996). Because most of these examples contain parameters other than the object’s position and orientation, additional variables are needed to account for all of these effects in a model.
Further experimental test
Given all the contributing factors mentioned above, it is natural to ask how a neuron would respond to a more natural-moving three-dimensional object. A simple geometric stimulus is easier to specify and present but may lack important sensory cues needed to predict the response of a neuron to a natural stimulus. Our analysis relies on varying the translational direction and rotation axis and might provide a convenient basic description for response properties in terms of a preferred translational direction and a preferred rotation axis.
To test directly the basic tuning rule in Equation 30 or 36, one should present realistic images of a moving three-dimensional object to motion-sensitive neurons. The simplest way to test the theory is to oscillate slightly an object around a fixed axis. The oscillation should be sufficiently small so that salient visual cues are not occluded. For sinusoidal oscillations with frequency Ω and amplitude ρ: Equation 77and the angular speed is ω = dϕ/dt in Equation 36. Thus the theory predicts that the firing rate as a function of time should be: Equation 78where β is the angle between the actual rotation axis and the preferred rotation axis, q is a constant coefficient, and τ is the latency for visual response. The translational term in Equation 36 vanishes because there is no translational motion (v = 0). Systematically changing the orientation of the rotation axis (angle β) while keeping the view fixed allows the tuning function and the preferred rotation axis to be measured for the neuron.
Similarly, the response to translation in three-dimensional space could be tested by oscillating the whole object along a straight line: Equation 79The translational speed is v = dx/dt in Equation36, and the firing rate is: Equation 80where α is the angle between the actual translational direction and the preferred translational direction, and p is a constant coefficient. The preferred translational direction can be measured by systematically changing the translational direction (angle α) while keeping the view fixed.
For more efficient tests, the object could be rotated continuously with varying angular speed, covering all relevant views, first with respect to a fixed axis, and then systematically changing the axis. If the basic tuning rule is correct and the system is essentially linear, the tuning function and the preferred rotation axis could be computed for each view of the object. An even more efficient test is possible with a continuously time-varying rotation axis that generates tumbling movements of the object (Stone, 1998).
Eye position is one implicit factor that may affect the preferred translational direction and preferred rotation axis. The present theory allows an eye position effect but provides no additional constraints.
The linear response properties of a neuron for a given object are specified completely by its preferred translational direction, preferred rotation axis, and baseline firing rate for each given view of the object, as well as how these parameters depend on the view. All of these properties are experimentally testable and can be compared with the theoretical predictions in the preceding sections. For example, with the center of the object fixed, the curl-free condition for a given neuron should be tested by measuring its preferred rotation axis for four or more different orientations of the object. For a full test in six-dimensional space, both the preferred translational direction and the preferred rotation axis of the neuron should be measured for seven or more different positions and orientations of the object. See Appendix for further discussion.
If motion-sensitive neurons with similar tuning properties are clustered in the brain, then it might be possible to use functional magnetic resonance imaging techniques to test the predicted properties of the tuning rule in animal and human subjects using realistic images of moving 3-D objects as visual stimuli.
DISCUSSION
An explanation for cosine tuning
The remarkable ubiquity of approximately cosine tuning curves for a wide range of neural responses in the visual and motor systems suggests that there may be a common explanation that transcends the specific mechanisms that generate these response properties. We have shown that the low dimensionality of the geometric variables that underlie object motion and body movements could account for these observations. The gradient formulation of this general principle provides a rigorous framework for unifying the dependence of tuning curves on the axes and speeds of rotation and translation.
This theoretical framework makes a number of specific predictions. The primary prediction is the existence of preferred axes of rotation and translation for moving objects, which can be determined by systematically rotating and translating objects in the receptive field of cortical neurons. The firing rate of a neuron should fall off in proportion to the cosine of the angle between the preferred rotation axis and translational direction and the true rotation axis and translational direction. In addition, the speed and angular speed should modulate the firing rate multiplicatively without changing the shape of the directional tuning functions, somewhat related to the multiplicative gain fields in the parietal cortex for eye position (Andersen et al., 1997; Salinas and Abbott, 1997) and recent evidence for distance modulation of responses in visual cortex (Trotter et al., 1992; Dobbins et al., 1998). A secondary prediction is that the fields of preferred directions of rotation and translation for each individual neuron are curl free; these are global conditions on the overall pattern of vectors. The curl-free condition is relaxed in nongradient theory, which provides a theoretical alternative that can be experimentally tested.
Cosine tuning for the direction of arm movement characterizes many neurons in the motor cortex as well as in other parts of the motor system. If this cosine tuning with direction mainly reflects the geometrical constraint of moving in a three-dimensional space, as we propose, then the specific functions of these neurons in guiding and planning limb movements must be sought in other properties. One way to obtain this information is to measure how the preferred direction varies in space for different hand positions, because this vector field completely specifies the properties of a neuron in a linear theory. When the preferred direction field is curl free, the underlying potential function that generates the gradient field can be constructed empirically.
Cosine tuning function is an approximation to biological data. For example, the averaged directional tuning curves in Figure 2 are all slightly sharper than a cosine function. Such systematic deviation can be accounted for only by nonlinear theories (see Appendix ). Ultimately, the underlying neural mechanisms that generate the tuning properties need to be considered in more detailed theories. These tuning properties might be the outcome of learning processes based on correlated neuronal activities induced by movement and motion. In this paper, we have focused on several analytically tractable situations to emphasize the existence of general neuronal tuning properties that are insensitive to the actual mechanisms.
How preferred rotation axes may be used to update a static representation
If the cosine tuning of motion-sensitive neurons is determined essentially by geometric constraints regardless of the actual computational functions, then what is the value of these simple response properties? For three-dimensional object motion, a population of neurons tuned to translational direction and rotation axis should carry sufficient information to determine the instantaneous motion of any given object and therefore could be used to update the static view represented elsewhere. This allows future sensory and motor states to be predicted from the current static state.
Information about the static view of an object is represented in the ventral visual stream in the monkey cortex, leading to the inferotemporal (IT) area (Ungerleider and Mishkin, 1982). The response of a view-sensitive neuron in IT area typically drops off smoothly as the object is rotated away from its preferred view, around either the vertical axis (Perrett et al., 1991) or other axes (Logothetis and Pauls, 1995). View-dependent representations for three-dimensional objects have been studied theoretically (Poggio and Edelman, 1990; Ullman and Basri, 1991) and have motivated several recent psychophysical experiments (Edelman and Bülthoff, 1992;Bülthoff et al., 1995; Liu et al., 1995; Sinha and Poggio, 1996). The general idea of a view-dependent representation in the IT area is consistent with recent neurophysiological results, including single-unit recordings (Perrett et al., 1991; Logothetis and Pauls, 1995; Logothetis et al., 1995) and optical imaging data (Wang et al., 1996).
Information about the instantaneous motion of an object is represented in the dorsal visual stream, including areas MT, MST, superior temporal polysensory area, and the parietal cortex, such as area 7a. Given a population of motion-sensitive neurons tuned to translation and rotation, it should be possible to extract complete information about the instantaneous motion of any object. For example, a six-dimensional population vector can be used to reconstruct rigid motion. More efficient reconstruction methods may also be used and implemented by a biologically plausible feedforward network (Zhang et al., 1998). The same set of neurons can extract the motion of different objects by combining the activities of input neurons differently. Instantaneous translation and rotation determine how the current view of this object is changing at the moment and could be used to update the static view representation in the IT area.
Broad tuning to static views logically implies that each static view of an object elicits a certain activity pattern in the temporal cortex, and that as the view changes, the pattern of activity also changes smoothly, depending on the axis of rotation and direction of translation. A complete representation of the dynamic state of an object would require representing information about both the current view and how the view is changing, so that the system can effectively update its internal state in accordance with the movement of the object. Such motion information might help improve the speed and reliability of the responses of view-specific neurons to a three-dimensional object during natural movements.
Conclusion
We have shown that simple generic tuning properties arise when an encoded sensory or motor variable reflects changes rather than static configurations. By linearizing the system locally for movement-sensitive neurons, the analysis reveals mechanism-insensitive tuning properties that mainly reflect the geometry of the problem rather than the exact encoding mechanisms, which could be much more complicated. Although a nonlinear analysis is also considered (Appendix), the basic linear theory already captures some essential features of the biological data, such as sensory responses to visual pattern motions and directional tuning for reaching movements. The analysis predicts the existence of a preferred translational direction and a preferred rotation axis in space with cosine tuning functions for representing arbitrary three-dimensional object motion. For natural movements that have an intrinsically low dimensionality, combinations of variables become highly constrained and cannot be changed arbitrarily. It is precisely for these constrained movements that the mechanism-insensitive properties studied here may become useful. By contrast, the analysis may not apply for artificial movements such as computer-generated visual motion stimuli that do not satisfy any simplifying geometry constraints that occur in the real world. The brain should have more efficient representations for those stimulus features that are consistent with commonly encountered configurations in the real world. The analysis presented here may help to predict tuning properties of motion-sensitive neurons in unknown situations by providing a basic description of expected properties with which more detailed characterizations as well as potential deviations can be contrasted.
EXTENDED THEORIES
We first reformulate the linear tuning theory for motion-sensitive neurons in general terms and then make nonlinear extensions. Here it is assumed that the natural movements of interest can be parameterized by a low-dimensional vector variable: Equation A1with D degrees of freedom. For example, D = 3 for reaching with a stereotypical arm posture, and D = 6 for rigid object motion. D can be larger when more independent variables are included.
Linear gradient theory
Assume that the mean firing rate of a motion-sensitive neuron is linearly related to the time derivative of a potential function Φ(x) of the state variable x. This leads to the tuning rule: Equation A2where f0 is the baseline rate, the gradient: Equation A3is the generalized preferred direction, and: Equation A4is the generalized velocity. A gradient is often treated as a vector, although mathematically it is more properly called a one-form (Flanders, 1989), a distinction, however, that has no practical consequence for this paper. The necessary and sufficient condition for the preferred direction field g = (g1, g2, … , gD) to be a gradient field is that: Equation A5This is equivalent to the condition that: Equation A6for any closed path, or that the potential function can be reconstructed from the preferred direction by: Equation A7where the integral depends only on the end points x andx0, not on the path. In three-dimensional space, these conditions are equivalent to zero curl.
Linear nongradient theory
Assume directly a linear relationship between the firing rate and the components of the generalized speed velocity v. This leads to the tuning rule: Equation A8without further constraint on the distribution of the preferred direction g(x). In particular, it may not be the gradient of a potential function. Thus, in general: Equation A9and the path integral may depend on the path. For a fixedx, the local tuning property of this neuron is the same as that predicted by the gradient theory. The difference is that the nongradient theory allows an arbitrary global distribution of the preferred direction field, whereas the gradient theory admits only a gradient field.
Coordinate-system independence
Both the gradient and the nongradient theories are independent of which variables are chosen to parameterize the movements. Suppose the old vector variable x and a new variable x̃ are related by: Equation A10then the velocities v = dx/dt andṽ = dx̃/dt in the two coordinates are related linearly by: Equation A11where J(x̃) = dh(x̃)/dx̃ is the Jacobian matrix.
The tuning rule has the same form in both coordinate systems: Equation A12where the new preferred direction is: Equation A13If a potential function exists such that the old preferred direction is a gradient field: Equation A14then the preferred direction field is still a gradient field with respect to the new variable x̃: Equation A15where the new potential function is: Equation A16The choice of variable is arbitrary, but convenience favors coordinates that make the results easiest to interpret.
Nonlinear theory: circular normal tuning
The circular normal tuning function for firing rate has the general form: Equation A17which has one more free parameter and tends to fit data better (compare Figs. 2 and 8) than a cosine tuning function of the general form: Equation A18where α is the angular variable of interest, and A, B, K, A′, B′ are parameters.
The circular normal function can mimic either a cosine or a Gaussian. When K is very small, exp (K cos α) ∼ 1 + K cos α so that the circular normal function in EquationEA17 approaches the cosine function in Equation EA18 with A′ = A + B and B′ = BK. When K is large, cos α ∼ 1 − α2/2 so that the circular normal function approaches a narrow Gaussian function with the variance 1/K.
How can we generate a circular normal tuning function? Because the time-derivative equation for firing rate: Equation A19can yield a cosine tuning function, as shown before, the modified equation of the form: Equation A20should yield a circular normal tuning function, where a and b are parameters. The result is a tuning rule of the form: Equation A21where angle α specifies movement direction and v is movement speed. This modification introduces the following testable effects on speed modulation. If parameter p is very small, the tuning function is close to a cosine, and the firing rate increases approximately linearly with speed, as before. When the parameter p is larger, the tuning function is closer to a Gaussian, and the firing rate should increase faster than linearly with speed. The directional tuning also becomes narrower with higher speed. These effects are similar to those shown in Figure9A for quadratic terms. See related discussion in the next section.
Although Equation EA20 can lead to a circular normal function, it does not specify how this occurs. One plausible biological mechanism is a recurrent network with appropriate lateral connections, which can generate a tuning curve closer to a circular normal function than to a cosine function (Pouget et al., 1998).
Nonlinear theory: acceleration tuning and quadratic speed modulation
In this section the basic tuning theory is generalized by including the second temporal derivative. This leads to acceleration tuning, nonlinear speed modulation, and departure from perfect cosine directional tuning.
Assume that the firing rate of a neuron contains not only the first temporal derivative of a potential function, but also the second temporal derivative of another potential function: Equation A22where f0 is the baseline rate, Φ = Φ(x) and Ψ = Ψ(x) are two unknown potential functions of a vector variable x = (x1, x2, … , xD). The special case Ψ = 0 reduces to what has been considered before.
This assumption leads to the new tuning rule: Equation A23where v = (v1, v2, … , vD) = dx/dt is generalized velocity, a = dv/dt is generalized acceleration,p1 = ∇Φ is the preferred direction for velocity, p2 = ∇Ψ is the preferred direction for acceleration, and hij = ∂2Ψ/∂xi∂xj is the Hessian matrix of second derivatives.
Example 1: Reaching movement
Here the vector variable x = (x1, x2, x3) = (x, y, z) describes the hand position. The dot product terms in Equation EA23 mean that both the velocity and the acceleration have preferred directions and cosine directional tuning functions, together with multiplicative linear modulation by the speed or the magnitude of acceleration. Ashe and Georgopoulos (1994) included acceleration terms in a different regression formula and found a small number of cells related to hand acceleration. Systematic tests are needed to determine whether the acceleration tuning predicted by Equation EA23 really exists.
Example 2: Rigid object motion
Here the vector variable x = (x1, x2, … , x6) = (x, y, z, θ1, θ2, θ3) describes the object’s position and orientation in space. By transforming (θ˙1,θ˙2,θ˙3) into the angular velocity in physical space, the tuning rule becomes: Equation A24 where v = (x˙, y˙, z˙) is the translational velocity, a = dv/dt is the translational acceleration, ω = (ω1, ω2, ω3) is the angular velocity, α = dω/dt is the angular acceleration, and Hij, Iij, Jij are determined by the Hessian matrix and matrixM as in Equation 26. Therefore, angular acceleration should also have a cosine tuning with a preferred directionq2. Although there is some evidence that some MT cells can provide information about acceleration of a visual target (Movshon et al., 1990; Lisberger et al., 1995), systematic tests for acceleration tuning with realistic objects have not been performed.
Effects of quadratic terms
The quadratic speed terms imply both nonlinear speed modulation and higher-order Fourier components for directional tuning that are speed dependent. To see this, consider a two-dimensional reaching example with the hand velocity: Equation A25where v is the speed and φ is the angle of movement direction. The quadratic terms can always be written as: Equation A26 where h12 = h21, and the coefficients a, b and the phase shift ϕ0depend only on hij. For comparison, the linear speed term can be written asp1 · v = pv cos(φ + ϕ1), with a phase shift ϕ1.
The second Fourier component can either sharpen or broaden the original cosine function, depending on its sign. As illustrated in Figure 9, if the tuning curve is sharpened by the second component, it becomes even sharper as the speed increases; if the tuning curve is broadened, then it becomes even broader as the speed increases. However, the amplitude of the second component (cos 2θ) should be no more than one-fourth of that of the first one (cos θ) to ensure that the tuning curve has only a single peak. This limits the effects from the second Fourier term. For speed modulation, the quadratic speed factor produces only a slight bend (Fig. 9) and is too weak by itself to produce ∪-shaped or ∩-shaped curves for some neurons in area MT (Maunsell and Van Essen, 1983; Rodman and Albright, 1987) and MST (Orban et al., 1995; Duffy and Wurtz, 1997a). Rodman and Albright (1987) also showed that the average tuning widths of MT neurons were insensitive to speed, although the typical experimental errors for individual neurons might mask the small effects shown here. Thus, the second Fourier component with squared speed may help improve data fitting (compare Fig. 2), but probably only within a narrow range of speeds.
LINEAR VECTOR FIELD FOR DATA ANALYSIS
Linear vector field from experimental data
As shown in the main text, the preferred direction and the preferred rotation axis may be generated by gradient fields of a potential function. Here we consider how to test the gradient condition experimentally. In two and three dimensions, where the curl can be defined, a vector field generated as the gradient of any potential function is always curl free (Fig. 10). In particular, a two-dimensional vector field (u(x, y), v(x, y)) has both zero curl and zero divergence if and only if Equation B1is analytic or differentiable with respect to the complex variable z = x + iy, because of the Cauchy-Riemann differentiability conditions. One can use templates of curl-free vector fields generated by gradient of potential functions to interpolate arbitrary vector fields (Mussa-Ivaldi, 1992). Therefore, the curl-free condition alone is too flexible because it can locally distort fields to accommodate any sparsely sampled vector field data.
For testing the gradient-field condition or the curl-free condition with sparsely sampled data points, an additional smoothness constraint is needed. Linearity is a reasonable smoothness requirement, at least for a local region, such as in measurement of local force fields (Mussa-Ivaldi et al., 1985; Giszter et al., 1993), and in local optic flow analysis (Koenderink and van Doorn, 1976). A linear vector field has the general form: Equation B2where the column vector x = (x1, x2, … , xD)T contains the variables of interest, A is a constant matrix: Equation B3and b is a constant vector. If the linear vector field is the gradient of an unknown potential function, then this function should have the general quadratic form: Equation B4which has the gradient: Equation B5with b a constant column vector, and T for transpose. To be consistent with Equation EB2, matrix A must be symmetric: Equation B6in which case we can set B = A. This symmetry condition in Equation EB6 is the necessary and sufficient condition for a linear vector field to be a gradient field. It includes the curl-free condition for D = 3 as the special case, because the curl: Equation B7vanishes if and only if the matrix A is symmetric.
To determine A and b from data vectorsp1, p2, … , pN sampled at positionsr1, r2, … , rN, respectively, we require the total number of data points: Equation B8in D -dimensional space, because matrix A and vector c contain D2 + D scalar unknowns, whereas N data points provide DN scalar values. For example, the two-dimensional case requires a mini-mum of three data points, and the three-dimensional case requires a minimum of four data points.
The least-square solution is: Equation B9where Equation B10is the Moore-Penrose pseudoinverse, and the matrices are defined as: Equation B11 Equation B12which are made of column vectors as shown, and: Equation B13 Equation B14 Equation B15The above solution minimizes: Equation B16which vanishes only when all the data points can be fit exactly by the linear model.
Examples of the simplex method
In this section, we illustrate an alternative formulation of the curl-free condition with minimal data points in three-dimensional space, which can be extended readily to other dimensions. For local interpolation in three-dimensional space, a vector field should at a minimum be sampled at four locations 1, 2, 3, 4, not all lying in the same plane (Fig. 11). Suppose the coordinates of the four points are x1,x2, x4,x4, and the corresponding data vectors are p1, p2,p3, p4. The simplex method is based on the fact that any point x in three-dimensional space can be expressed as: Equation B17where the coefficients are unique, satisfying the constraint: Equation B18The linearly interpolated vector at this position is: Equation B19Similar equations with fewer variables hold for the one- and two-dimensional cases.
To derive an integral formula for the curl-free condition, first integrate along the straight line segment fromx1 to x2 with a linearly interpolated vector, yielding: Equation B20To evaluate the closed path integral along the triangle 123 in Figure 11, use this formula and the equalitya1 + a2 +a3 = 0, with a1 ≡x3 − x2, and so on, to obtain: Equation B21which must vanish if the field is curl free. This formula is valid in both two- and three-dimensional cases. For the tetrahedron in Figure11, a similar closed path integral along the edges should vanish. Because there are only three independent loops, we can choose, for instance: Equation B22Although equivalent to the matrix symmetry in Equation EB6, the path integral formulas are more intuitive and express the curl-free condition in terms of directly measurable quantities, as in EquationEB21.
In the experiment by Caminiti et al. (1990), the preferred direction of a motor cortical neuron was sampled at three points at equal distance, similar to the case in the bottom diagram in Figure 11. Here the linearity of the vector field only entails that: Equation B23which is a special case of Equation EB19. The curl-free condition cannot be tested here. With linear interpolation, at least four starting hand positions should be sampled (Fig. 11).
Footnotes
We are grateful to T. D. Albright, G. T. Buračas, G. E. Hinton, R. J. Krauzlis, K. D. Miller, A. B. Schwartz, M. I. Sereno, M. P. Stryker, R. S. Turner, D. Zipser, and two anonymous reviewers for helpful comments on the analysis presented here.
Correspondence should be addressed to Dr. Terrence Sejnowski, Computational Neurobiology Lab, The Salk Institute, La Jolla, CA 92037.