Abstract
The planning and control of sensory-guided movements requires the integration of multiple sensory streams. Although the information conveyed by different sensory modalities is often overlapping, the shared information is represented differently across modalities during the early stages of cortical processing. We ask how these diverse sensory signals are represented in multimodal sensorimotor areas of cortex in macaque monkeys. Although a common modality-independent representation might facilitate downstream readout, previous studies have found that modality-specific representations in multimodal cortex reflect upstream spatial representations. For example, visual signals have a more eye-centered representation. We recorded neural activity from two parietal areas involved in reach planning, area 5 and the medial intraparietal area (MIP), as animals reached to visual, combined visual and proprioceptive, and proprioceptive targets while fixing their gaze on another location. In contrast to other multimodal cortical areas, the same spatial representations are used to represent visual and proprioceptive signals in both area 5 and MIP. However, these representations are heterogeneous. Although we observed a posterior-to-anterior gradient in population responses in parietal cortex, from more eye-centered to more hand- or body-centered representations, we do not observe the simple and discrete reference frame representations suggested by studies that focused on identifying the “best-match” reference frame for a given cortical area. In summary, we find modality-independent representations of spatial information in parietal cortex, although these representations are complex and heterogeneous.
Introduction
Reaching to pick up a coin or to transfer a coin from one hand to another without looking requires similar movements but uses different sources of sensory information. In the first case, vision of the coin enters the nervous system as a site of stimulation on the retina, with location defined in a retinotopic or eye-centered reference frame. In the second case, information about target hand position enters the nervous system as signals about muscle and joint states. These can be used to compute the position of the hand relative to the torso, a body-centered reference frame. These different sensory signals are processed in the primary visual or somatosensory cortices, respectively, before converging in the parietal sensorimotor cortex (Seltzer and Pandya, 1980; Caminiti et al., 1996; Johnson et al., 1996; Rizzolatti et al., 1997). Because the spatial information encoded by these two sensory streams must undergo costly (Sober and Sabes, 2005; Schlicht and Schrater, 2007; McGuire and Sabes, 2009) reference frame transformations to be compared (Soechting and Flanders, 1991), it is unclear what reference frame representation should be used in the parietal cortex.
The representation of movement plans in the superior parietal lobule (SPL) has been studied extensively for reaches to visual targets. The form and heterogeneity of spatial encoding in the SPL have been the subject of debate. However, there is agreement that area 5 receives more proprioceptive input and appears to have more hand- or body-centered coding compared with the medial intraparietal area (MIP), which receives more visual input and appears to have more eye-centered coding (Lacquaniti et al., 1995; Colby and Duhamel, 1996; Kalaska, 1996; Batista et al., 1999; Marconi et al., 2001; Buneo et al., 2002; Ferraina et al., 2009; Chang and Snyder, 2010).
These observations suggest that the reference frame used by an area to encode spatial variables depends, at least in part, on the reference frames of its sensory inputs. This idea is supported by studies of other multimodal cortical areas (Jay and Sparks, 1987; Stricanne et al., 1996; Avillac et al., 2005; Mullette-Gillman et al., 2005; Fetsch et al., 2007). Alternatively, it has been argued that a common representation across sensory modalities is important for movement planning (Cohen and Andersen, 2002; Stein and Stanford, 2008) (but see Avillac et al., 2005). Some evidence for modality-independent representations has been observed in the SPL: across studies, similar spatial representations were observed for reaches to auditory and visual targets (Batista et al., 1999; Cohen and Andersen, 2000). Still, we do not know the extent to which spatial representations in multimodal areas are shared across sensory modalities, nor do we know whether the answer differs across cortical areas.
To begin answering these questions, we directly compared the neural activity during reaches to visual and proprioceptive targets in area 5 and MIP and tested whether representations in these areas vary with sensory inputs or are an invariant property of the neurons or the cortical area. We found that the representations in these areas do not depend on the sensory information available, and neurons in both areas exhibit heterogeneous tuning that does not correspond to “pure” reference frame representations.
Materials and Methods
Experimental setup.
Two adult male rhesus macaque monkeys (12–15 kg) were used in this experiment. All procedures were approved by the University of California, San Francisco Institutional Animal Care and Use Committee and followed the National Institutes of Health guidelines for care and treatment of laboratory animals.
The monkeys were trained to make reaches in a virtual reality setup allowing control of visual information during the task (Fig. 1A). The monkeys were seated in a primate chair with an open front panel to allow arm movements. Head position was fixed (Adams et al., 2007), with animals facing a mirror through which visual targets and feedback about hand position were presented. A digital video projector (NEC HT1100) displayed visual stimuli on a rear projection screen located directly above the mirror. The mirror and screen were positioned so that all visual objects appeared in the plane of the upper horizontal table where the reaching arm rested. Eye position was monitored using an ISCAN infrared eye-tracking system. The monkeys were trained to wear a mesh jacket with stiff gloves that kept the hand prone. Radio-frequency sensors were attached to the gloves, and hand position was monitored with a Polhemus Liberty tracking system. The arm contralateral to the recording chamber was used for reaching. It rested on top of a thin (6 mm) horizontal table ∼15–16 cm below the eyes. The ipsilateral arm rested horizontally 5.5 cm below the upper table and was secured to a custom motor-driven sleigh that moved the arm passively between target locations. Behavioral and neural event times were recorded with a signal acquisition system that includes a programmable processor (Tucker Davis Technologies). Experiments were controlled with custom routines in Matlab (MathWorks).
Target modalities and array.
The monkeys were trained to reach to three different types of targets: visual (VIS), proprioceptive (the unseen ipsilateral hand, PROP), and visual and proprioceptive (the seen ipsilateral hand, VIS+PROP) targets (Fig. 1). Visual targets were a presented as filled disks 2 cm in diameter. The disks were green during VIS trials and blue during VIS+PROP trials to distinguish purely visual trials from trials in which the visual target coincided with the position of the ipsilateral hand. The proprioceptive targets were located at the distal joints of the two middle fingers of the ipsilateral hand, which was moved to the correct location with the sleigh. Trials were performed for the same set of reach conditions (i.e., reach target location, fixation point, and start location) for each target modality.
Reach targets were located in an arc on the table equidistant from the cyclopean eye (Fig. 1A,B). The sleigh rotated about a point on the bottom table located approximately below the cyclopean eye (center was ∼1–2 cm forward from the animals chest, ∼20–22 cm below the eyes). The distance of the targets from this point was determined by the exact position of the hand in the sleigh (average radius of 26 cm for monkey C and 22 cm for monkey E). Targets were positioned at 10° intervals along the arc from −30° to +30° from midline (spanning ∼26 cm in x for monkey C and 22 cm in x for monkey E) (Fig. 1B).
During the planning and execution of reaches, the monkeys were required to maintain fixation at one of two fixation points located ±10° from straight ahead (∼9 cm apart for monkey C and 7.5 cm apart for monkey E). The fixation point was a filled red disk 8 mm in diameter. For each fixation point, reaches were made to only six of the seven potential target locations (Fig. 1B).
All reaches were made from a visual start location, with initial visual feedback of the reaching hand: start location was a green disk, 2.4 cm in diameter, and feedback was a white disk, 1 cm in diameter, positioned on the distal joints of the two middle fingers. During most recording sessions, a single start location located on the midline was used for all trials (15 cm distal from the center of the target arc for monkey C and 11 cm distal for monkey E). In a subset of recording sessions, two additional start locations were used to examine the effects of initial hand position on neural responses. These were located ±20° from straight ahead (∼5 cm left and right of the central start for monkey C and 4 cm for monkey E) at the same distance from the origin as the midline start location. In those sessions, reaches from the two additional start locations were made to a limited subset of the reach targets (−30°, −10°, +10° targets for −20° start location and −10°, +10°, +30° targets for +20° start location).
Trial presentation order.
Trials were presented in blocks, with a single repetition of all reach conditions completed in each block. The target modalities were separated into sub-blocks: a repetition of each PROP reach was performed, then each VIS+PROP reach, and finally each VIS reach before starting the next block (repetition). This arrangement had the benefit that the animals could anticipate the target modality, reducing the possibility of uncertainty about trial type, in particular the possibility that, during VIS trials, the animals execute, or even tentatively plan, movements to the their ipsilateral hand. Indeed, this arrangement was necessary to achieve good performance. Note that only a single repetition of each reach condition was completed in each sub-block. This ensured that, at a coarse scale, the three target modalities were evenly distributed throughout the recording session and facilitated the equalization of reward volumes across modalities (reward scheme is described below). Within each sub-block, the trial conditions (target position, fixation point, and start location) were presented in random order. Error trials were repeated before moving on to the next sub-block. All trial conditions had to be successfully completed or a maximum number of unsuccessful trials (typically five) had to be reached before the next sub-block began. The maximum number was adjusted daily to optimize performance, and the animals typically completed all trials before the maximum was reached.
Trial and reward structure.
To successfully complete a trial, the monkeys had to move their contralateral hand to the reach target without failing to complete any of the sequence of positional holds and delay periods enumerated here (Fig. 1C). (1) Start location acquisition: The monkey moved its hand to the visual start location and held position for 500 ms. (2) Target hand positioning: On PROP and VIS+PROP trials, the ipsilateral hand was moved to the target location. (3) Fixation point acquisition: The fixation target appeared and the monkey fixated it, maintaining fixation within a 10–12 mm window of the target in the x-coordinate (∼2.5° visual angle) and a 40 mm window in the y-coordinate. The relatively lax control of eye fixation in y was necessary because the oblique angle between line-of-site and the horizontal plane made the precision of eye tracking poor along the y-coordinate. Importantly, no difference in fixation across target modalities was observed. (4) Visual target presentation: After a 700 ms fixed delay, the visual target appeared on the VIS and VIS+PROP trials. The same delay was used in PROP trials, although no visual target appeared. (5) Instructed delay: The monkey maintained fixation and position at the start location for an additional variable delay of 500–1000 ms. (6) Go signal: A go tone sounded and the start-location disk was extinguished, indicating that the monkey should move the contralateral hand to the reach target. (7) Reaction time: The monkey began the reach after the go tone. When the hand first moved 1 cm from the initial position, feedback of the hand was extinguished to eliminate the possibility of stimulating cells with visual motion. (8) Movement: The monkey had to reach without stopping to a point within a set distance from the center of the reach target (monkey C: 4 cm VIS and VIS+PROP, 5 cm PROP; monkey E: 3 cm VIS and VIS+PROP, 4.5 cm PROP). (9) Target hold: The final position was held for 200 ms to successfully complete a trial. (10) Reach feedback: On successful trials, the fixation point was extinguished and visual feedback for the reaching hand was turned back on for 500 ms, providing visual feedback about endpoint distance from the reach target. On PROP trials, feedback of the target was also turned on at this time. Animals almost always made a saccade toward the target location during this interval. On unsuccessful trials, a visual signal of 1 s duration indicated to the animal what error had occurred. (11) Reward: Monkey C received a water or fruit juice reward. Monkey E received a food reward in the form of a slurry of monkey biscuits, apple juice, and banana. Unsuccessful trials had no reward and a 1–5 s timeout before the next trial began.
To encourage accuracy, reward size depended in part on the distance from the reach endpoint to the center of the reach target, i.e., a graded reward schedule. Reaches within the inner third of the target window received a full reward. Outside this range the reward size scaled linearly from half of the full reward to zero at the edge of the target window, although when the linear schedule would have resulted in a reward below a set minimum value, the minimum was given. Reward size was controlled by the duration of the on-state of the delivery system (monkey C liquid reward delivered at ∼1–1.2 ml/s, monkey E slurry reward delivered at ∼2.6–3.4 ml/s). The minimum reward was set to a delivery time of 50 ms. The maximum reward time increased at predetermined intervals throughout the day to keep the monkeys motivated. Increases always occurred at block boundaries to ensure similar rewards across target modalities.
A small portion of trials (10% in monkey C and 5% in monkey E) served as catch trials. In these trials, the monkeys were rewarded without making a reach if they successfully maintained fixation through the final delay period (item 5 above), which was extended to 1.5 s on these trials. Catch trials served the dual purpose of encouraging the animals to maintain fixation and preventing them from anticipating the go signal on long delay trials.
Recording cylinders.
Both monkeys were trained extensively on the tasks before physiological recordings began. Before the start of recording, an 18-mm-inner-diameter titanium recoding cylinder was placed over a craniotomy over the intraparietal sulcus (IPS), with the axis of the cylinder aligned approximately orthogonally to the dural surface below. The craniotomy and cylinder were positioned with aid of a previously obtained structural magnetic resonance (MR) image (monkey C: 11 mm left, −4 mm posterior; monkey E: 12 mm right, −8 mm posterior, interaural stereotactic coordinates). Monkey E previously had chronic recording arrays (Blackrock Microsystems) implanted over its left motor and premotor cortices. All surgical procedures and postoperative care followed University of California, San Francisco and National Institutes of Health guidelines.
All recording sessions occurred within 10 months of cylinder implantation. Periodically, the dura mater was thinned to allow electrode penetration. Mitomycin C, an antimitotic agent, was applied to the recoding chamber of monkey E to minimize tissue growth and reduce the frequency of dural thinning.
Neural recording.
Single electrode recordings were used for all data collection. A Narishige Microdrive was used to lower a 2 mΩ (nominal) Alpha Omega tungsten electrode into cortex. All well isolated neurons that appeared modulated by the task were recorded without preselection for directional tuning. Neurons were recorded until the monkey completed six to eight blocks (repetitions) or until isolation was lost. All neurons for which at least four blocks of trials were completed were included in additional analysis. Included neurons had anywhere from 4 to 13 blocks, with a mean of 6.2 blocks and a median of 6 blocks.
After recording, spikes were classified into identified neurons using Plexon Offline Sorter. The nominal boundaries between area 5 and MIP and between the SPL and inferior parietal lobule (IPL) were determined from a combination of the preimplantation MR images, the stereotactic coordinates used for cylinder implantation, and observed neural responses to eye and hand movements (Fig. 2) (data from the IPL are not considered further in this paper). Specifically, neurons recorded <2000 μm from the surface of the cortex in the SPL were nominally localized to area 5, whereas neurons recorded below this depth were nominally localized to MIP. This boundary was chosen because it typically corresponded to a region without units, in which the electrode was presumably passing through white matter. It is important to note that the naming conventions in the SPL are not standardized. The area we call area 5 is also referred to as area 5d or PE. The area we call MIP is also referred to as PEip and would be considered by some to span the border between MIP and area 5v.
Behavioral epochs for analysis of neural data.
Average neural firing rates were computed within three different behavioral epochs (Fig. 1C), defined as follows. (1) The instructed delay (“delay”) for VIS and VIS+PROP trials started at visual target onset and ended at the go signal. Delay for PROP trials started at fixation point acquisition and ended at the go signal. (2) The reaction time (“reaction”) started at the go signal and ended at the initiation of movement, defined as the first time the reaching hand velocity exceeded 10 mm/s after the go tone or the first time the hand moved >5 mm away from its start position. (3) The movement period (“move”) is from the start of movement to the end of movement, defined as the first time hand velocity dropped below 10 mm/s after movement start.
Most of the analyses in the paper were performed separately for the activity of each cell during each behavioral epoch for trials with each target modality. We call this unit of analysis a “modality–epoch–cell.” For each cell, there are nine modality–epoch–cell analyses. Using this analysis unit, we also measure differences across epochs for each “modality cell” and differences across modalities for each “epoch cell.”
Tuning curve fits: reference frame shift and gain dependence on eye position.
A tuning curve function (firing rate vs target location and fixation point) was fit separately for each modality–epoch cell, using only reaches from the central start location. Because our targets sampled only a limited range of positions compared with a standard center-out target array, our data would poorly constrain the standard Gaussian or cosine curves. Instead, we used a quadratic function that described response patterns well: Here R is the firing rate on a given trial, and T and E are the reach endpoint and fixation point, respectively, both defined as angular positions along the target arc, with zero at the midline (supplemental Fig. 1A, left column, available at www.jneurosci.org as supplemental material), although alternative reference frames are considered below. The parameters α1, α2, α3, and δ were fit to the firing rates using nonlinear least squares optimization (Matlab function lsqcurvefit). The predicted value of R was allowed to be negative, and fits with negative R values were obtained in 18% of fits, although these values were almost all greater than −5 Hz and typically occurred only at the outer edges of the workspace. Although using either target position or reach endpoint for T provided very similar fits, reach endpoint generally yielded slightly higher R2 values, and so this variable was used for all subsequent analyses.
Much of our analysis will focus on the tuning shift term (δ) of Equation 1, which serves as a continuously valued, unit-free, measure of reference frame. When δ = 0, firing rate varies as a function of T, corresponding to a hand- or body-centered reference frame (the two cannot be distinguished with a single start location). When δ = 1, firing rate varies as a function of T–E, corresponding to an eye-centered representation. During fitting, δ was constrained to the range −0.5 to 1.5. We computed 95% confidence intervals and SDs for tuning curve parameters using a bootstrapping procedure with resampled trials (Efron and Tibshirani, 1993). For each epoch cell, we determined that a significant difference in shift was present between two modalities when, for each modality, the 95% confidence intervals of the shift term did not include the value fit to the other modality. To be included further in the shift analysis, a modality–epoch–cell had to satisfy two criteria. First, reach-endpoint tuning had to be significant (p < 0.05/9 = 0.0056, Bonferroni's correction, 3 modalities × 3 epochs), as determined by a permutation test (Good, 2000), where T is the permuted variable and the sum square error (SSE) was the measure of fit. Second, the 95% confidence interval δ was required to have a width of <1.5 within the total allowable range of 2 (if one limit was equal to a boundary of the fitting range, the interval was assumed to be symmetric about the best-fit value). We chose this lax criterion to include more cells in the shift analysis; the results of the analysis were qualitatively the same when this range was decreased or increased.
We also considered an alternative model that includes an eye-position-dependent change in tuning curve gain instead of a tuning curve shift: The gain term, γ, served as a secondary measure of eye-position effects on firing rate. Some cells appeared to show both reference frame shift and gain effects, and so we considered a third tuning curve model that includes both shift (δ) and gain (γ) effects: Two separate analyses of eye-position effects were performed, one for comparing tuning curve shift effects and one for comparing gain-modulation effects. For the shift analysis, we fit the models in Equations 1 and 3 to the firing rates for each modality–epoch–cell. We determined whether the gain factor in Equation 3 added significant explanatory power using a permutation test (Good, 2000) in which the value of E in the gain factor was permuted across trials, and fit was measured with the SSE. Equation 3 was then used for an epoch cell if it significantly improved the fit for at least two of the three target modalities; otherwise, Equation 1 was used. This criteria meant that the same model was used across modalities for each epoch cell, ensuring that differences across modality are not attributable to differences in model selection. Equation 3 was used for 253 of 908 modality–epoch–cells in the shift analysis. The same procedure was used to select to between Equations 2 and 3 for the gain analysis. In this case, the value of E in the shift terms of Equation 3 were permuted. Equation 3 was used for 261 of 1448 modality–epoch–cells in the gain analysis.
For the tuning curves analyses just described, the target and fixation angles in Equations 1–3 are defined with respect to a body-centered origin, located below the cyclopean eye (supplemental Fig. 1A, left column, available at www.jneurosci.org as supplemental material). By design, the targets lie on an arc about this origin, with uniform angular spacing and distance across the target array. Of course the target array looks different with respect to other reference frame origins. If the target angle were defined with respect to the initial hand location, inter-target angular spacing and target distance would vary across the target array (hand-centered origin) (supplemental Fig. 1A, middle column, available at www.jneurosci.org as supplemental material). If the target angles were defined with respect to the fixation point, the targets would lie approximately along a line passing through the origin rather than along an arc (gaze-centered origin) (supplemental Fig. 1A, right column, available at www.jneurosci.org as supplemental material). Thus, the sampling of space is substantially different across these references frames. Our use of body-centered angles for the tuning curve analysis could therefore affect the power of our analysis and lead to inaccurate results in determining neural reference frames, if the true neural tuning were a function of hand-centered or gaze-centered target angles. To test for potential biases in our analysis, we generated artificial datasets with tuning curves defined with respect to each origin (body, hand, and eye), and then performed the tuning curve analysis described above on each of these datasets. We found that an incorrect assumption about the true reference frame origin can lead to a small bias in the estimated shift value, although the variability of the estimates were not affected (supplemental Fig. 1B, available at www.jneurosci.org as supplemental material). Importantly, the potential biases were not large enough to qualitatively affect our results. Furthermore, a bias in the estimated tuning shift would equally affect all trial conditions and would therefore have no effect on comparisons across conditions. Thus, although the need to pick a particular angular origin for the tuning curve analysis potentially introduces a bias in our results, this potential does not effect the conclusions of the paper.
Direct rate comparison of reference frames.
It is possible that a poor tuning curve fit could result in miscategorization of the reference frame of a cell. We therefore wanted to examine the dependence of firing rate on eye position without relying on tuning curve fits. To do this, we asked whether firing rates were significantly different (permutation test; Bonferroni's correction, p < 0.05/3 = 0.0167) between condition pairs whose targets were matched (supplemental Fig. 10A, available at www.jneurosci.org as supplemental material) in the following: a hand/body-centered reference frame (targets aligned relative to the body), an eye-centered reference frame (targets aligned when plotted relative to fixation position along the target arc, i.e., shifted over two targets in Fig. 1B), or an intermediate reference frame (between hand/body-centered and eye-centered, i.e., shifted over one target in Fig. 1B). Each modality–epoch–cell was categorized in the reference frame (hand/body, intermediate, eye) or set of reference frames (hand/body and intermediate, intermediate, and eye) for which there were no significant differences between paired firing rates. The modality–epoch–cell was considered uncategorized if significant differences were found for none of the three comparisons (poor power) or all three (poor model fit). However, to account for eye-position-dependent gain modulation, putatively uncategorized modality–epoch–cells were retested after normalization for each eye position. For those cases, we corrected for six comparisons rather than three. Of the 870 modality–epoch–cells that were ultimately categorized with this analysis, the normalization step was performed on 275. Excluding these cases did not affect the qualitative pattern of reference frames observed.
Direct rate comparison of target versus movement vector representations.
We used an analogous test to categorize the response properties of cells recorded during reaches from multiple start locations. This subset of cells allowed us to determine whether responses were related to reach target position or movement vector (i.e., the combination of start position and target position) and the reference frame in which target or movement vector are encoded. Using a permutation test, we tested for significant differences in firing rate between reaches that had (1) matched body-centered movement vector but different eye positions, (2) matched eye-centered movement vector but different body-centered positions, or (3) matched target and eye position but different movement vectors (i.e., different start positions). These comparisons provided a measure of the dependence of responses on movement vector and target position. As above, cells could be categorized depending on the comparisons for which there were no significant differences between firing rates, i.e., depending on which reference frames could not be rejected (Fig. 3). Here, however, the three comparisons provide complementary information about tuning. If the responses for a modality–epoch–cell rejected eye-centered movement vector coding (comparison 2), but not body-centered movement vector coding (comparison 1) and not target coding with different movement vectors (comparison 3), then we categorize the pattern as body-centered target coding. Similarly, if only comparison 1 was rejected, the response was categorized as eye-centered target coding. If only comparison 3 was rejected, then we categorize the pattern as hand-centered movement vector, because neither position relative to body nor eye matter, but the relative locations of start and target positions do matter. As above, putatively uncategorized modality–epoch–cells were retested after normalization for each eye position, and, for these cases, we corrected for six comparisons. Of the 296 modality–epoch–cells that were ultimately categorized with this analysis, the normalization step was performed on 36. Excluding these cases did not affect the qualitative pattern of reference frames observed.
Results
Behavioral performance
Both monkeys were trained until performance reached a plateau for all three modalities before neural recordings began. Monkey C achieved a typical success rate of 60–75% correct trials a day. The low percentage of correct trials was attributable primarily to a tendency to break fixation, particularly on visual reaches to the central target. Monkey E achieved a typical success rate of 90–97% correct trials a day. Both monkeys would typically perform 800–1400 total trials (including error trials) in a day.
The largest behavioral difference between the three target modalities was the increase in reach endpoint variance for PROP targets relative to VIS and VIS+PROP targets (Table 1). In principle, this difference could be attributable to use of different task strategies, for instance, reaching to remembered target locations for PROP trials rather than using the proprioceptive cues. However, there are several pieces of evidence that suggest this is not the case. First, there were slight variations in proprioceptive target location across trials and the animals reach endpoints were significantly closer to the actual target-hand position than the nominal target position across trials (p < 0.0001, paired permutation test, both monkeys). Furthermore, across trials, the reach endpoint was correlated with the actual target-hand location, after conditioning on the nominal target location (r = 0.55 for monkey C, r = 0.41 for monkey E, p < 0.0001). Next, the differences in endpoint variance across target types are similar to those observed in humans using visual versus proprioceptive information for reaching (McGuire and Sabes, 2009) and match the differences expected as a result of the differential reliability of the sensory modalities specifying target location (van Beers et al., 1998, 1999). Finally, VIS+PROP endpoint variability is significantly smaller than either VIS or PROP endpoint variability (p < 0.0001, t test, both monkeys). For these reasons, it seems probable that the differences in endpoint variability across target modality reflect differences in the reliability of the sensory signals and not differences in the way the animals were performing the task. Furthermore, these results suggest that the monkeys were using both sources of sensory information about target location, to some behavioral benefit in VIS+PROP trials. It should also be noted that the endpoint variance for all modalities was smaller than the distance between targets, so the reach endpoint distributions for neighboring targets did not overlap.
Aside from movement variability, reaching behavior was quite similar across the three target modalities, with the animals performing smooth, rapid reaches with bell-shaped velocity profiles (supplemental Fig. 2, available at www.jneurosci.org as supplemental material). Although we found slight differences in the peak velocity and reaction times between the target modalities, the differences are only a fraction of the within-modality SD for each measure (Table 1). Similarly, the graded reward schedule (as a function of movement accuracy; see Materials and Methods) successfully resulted in approximately equal rewards across the three modalities. From these results, we conclude that differences in behavior and reward are unlikely to account for any potential differences in neural activity observed between the three trial types. Differences in neural responses to VIS, VIS+PROP, and PROP reaches should instead reflect the effects of sensory modality of the target.
Cell tuning across areas and modalities
We recorded from two areas in the posterior parietal cortex, area 5 and MIP, which were distinguished by depth of recording (see Materials and Methods) (Fig. 2). We recorded a total of 375 cells: 193 cells from area 5 (101 for monkey C, 92 for monkey E) and 182 cells from area MIP (95 for monkey C, 87 for monkey E). Of those cells, 160 cells had significant tuning in area 5 (78 for monkey C, 82 for monkey E), and 164 cells had significant tuning in MIP (86 for monkey C, 78 for monkey E). Results were qualitatively the same across the two monkeys so data are combined across animals for all subsequent analyses.
Cells that had significant tuning for reach endpoint (p < 0.05/9 = 0.006, permutation test, Bonferroni's correction; see Materials and Methods) were typically tuned for multiple modalities and multiple trial epochs, as illustrated in Figure 4. Within a given epoch, there were slight differences in the proportion of cells tuned for the three modalities, but most of these differences were not significant (p > 0.4, χ2 test) (Fig. 4). The one exception is tuning during delay in area 5, which showed significantly fewer VIS tuned cells than PROP or VIS+PROP tuned cells (p = 0.001 < 0.05/6 = 0.008, χ2 test, Bonferroni's correction). Many cells were tuned in more than one modality in a given epoch, and the proportion of cells tuned in all three modalities was significantly greater than expected by chance given the tuning for each modality (p < 0.001 < 0.008 in area 5 and MIP for all epochs, χ2 test, Bonferroni's correction). Many cells were also tuned in more than one behavioral epoch (Fig. 4, right column) (63% in area 5 and 88% in MIP), and the proportion tuned for all three epochs was significantly greater than would be expected by chance (p < 0.001 < 0.05/2 = 0.025 in area 5, p = 0.0178 < 0.025 in MIP, χ2 test, Bonferroni's correction). The smaller proportion of cells tuned for multiple epochs and modalities in area 5 may simply reflect the fact that on average firing rate modulation was smaller than that observed in MIP (supplemental Fig. 3, available at www.jneurosci.org as supplemental material). These results suggest that most cells were tuned across target modalities and epochs, but, in many cases, the modulation did not achieve statistical significance. In any case, the large number of cells tuned across multiple modalities and epochs provides ample data to examine modality- and epoch-dependent differences in neural activity.
Reference frame of neural responses
We next quantify the reference frames in which target information is represented and compare these reference frames across target modality, trial epoch, and cortical area. Because reference frames within the parietal cortex have been the subject of debate (Lacquaniti et al., 1995; Andersen et al., 1998; Caminiti et al., 1998; Batista et al., 1999; Burnod et al., 1999; Buneo et al., 2002, 2008; Battaglia-Mayer et al., 2003; Fattori et al., 2005; Mullette-Gillman et al., 2005, 2009; Buneo and Andersen, 2006; Ferraina et al., 2009; Chang and Snyder, 2010), we used three different analyses to examine reference frame and eye-position effects. We first used a tuning-curve-based analysis that estimates the reference frame by the fit shift value (see Materials and Methods). We next used a similar approach that focuses on eye-position-dependent gain modulation rather than tuning shift. Finally, we analyzed reference frames using a direct rate comparison, without first fitting tuning curves (see Materials and Methods). These three analyses identified qualitatively similar modality, epoch, and area effects.
Tuning curve shift: example tuning curves
We fit a tuning curve to the response of each cell in a given modality and epoch (modality–epoch–cell) (see supplemental Fig. 4 for distribution of R2 values, available at www.jneurosci.org as supplemental material). These fits captured the reference frame of the response in terms of a single parameter, the tuning shift (δ; see Materials and Methods). A value of δ = 0 indicates hand- or body-centered coding, and δ = 1 indicates eye-centered coding. Importantly, however, shift is a continuous variable that can capture “intermediate” reference frames. When fitting the shift to the data, the value was allowed to range between −0.5 and 1.5, which encompassed almost all of the best-fit values for our data (see below). Estimates of the variability and bias in the best-fit values of δ are given in the supplemental data (supplemental Fig. 1, available at www.jneurosci.org as supplemental material).
Figure 5 shows two examples of this analysis. For the example cell from area 5, the VIS and VIS+PROP targets (Fig. 5A,C) have tuning curves for the two fixation points that are well aligned when plotted as a function of hand/body-centered reach endpoint, and the best-fit shift values are near zero. For PROP targets (Fig. 5E), the responses are not well aligned as a function of either hand/body-centered or eye-centered endpoint, and the shift value takes on an intermediate value. Intermediate tuning is seen for all three target modalities for the example cell from MIP (Fig. 5B,D,F). These examples illustrate three important features that are common across the dataset and that will be quantified below. First, we observe many cases in which responses are best described by intermediate shift values. Second, tuning curves are typically similar across modalities, in terms of both preferred directions and shift values. Nonetheless, we do observe small differences in shift across modalities, and these differences are sometimes significant (as in the example from area 5). Third, the shift values observed in area 5 are typically smaller than those observed in MIP. However, the neural populations in both cortical areas were highly heterogeneous. The supplemental data includes additional examples of “typical” cells from area 5 (supplemental Fig. 5, available at www.jneurosci.org as supplemental material) and MIP (supplemental Fig. 6, available at www.jneurosci.org as supplemental material), as well as examples of atypical response patterns (supplemental Fig. 7, available at www.jneurosci.org as supplemental material).
Tuning curve shift: comparison across target modality
We performed the shift analysis for all modality–epoch–cells. Only modality–epoch–cells that had a well defined best shift (see Materials and Methods) were included in subsequent analyses of the shift parameter. Figure 6 shows pairwise comparisons of these shift values across target modality. We first asked whether, at a population level, tuning shift varies across target modalities for cells tuned in both modalities. One might expect that reference frame would change with target modality depending on the native reference frames of each modality. Specifically, reference frame shifts for VIS might be closer to δ = 1 (more eye-centered) and reference frame shifts for PROP might be closer to δ = 0 (more hand/body-centered). However, this expectation is clearly not met: in Figure 6A, neither the mean shift (p = 0.543, paired permutation test) nor the shift distribution (p = 0.150, Kolmogorov–Smirnov test) are different between the VIS and PROP modalities. The same results are obtained when the data are analyzed separately for each epoch and area (supplemental Table 1, available at www.jneurosci.org as supplemental material). Thus, there appears to be no difference at a population level between the shift values observed for the three target modalities, and, more specifically, the shift values do not reflect the reference frames in which the sensory information is naturally represented.
We next analyzed differences in shift value between target modality for individual epoch–cells. Across epoch–cells, large and highly significant correlations are observed in shift value between modalities (Spearman's correlation, p < 0.001 < 0.05/3 = 0.017 with Bonferroni's correction) (Fig. 6). For the majority of cases, the shift is not significantly different across target modality (open vs filled data points in Fig. 6; p = 0.05, no correction for multiple comparisons; see Materials and Methods), showing that reference frame is essentially conserved across modalities for individual cells. Nonetheless, a third of the epoch–cells in this analysis exhibited significant differences in shift between the VIS and PROP conditions, and we briefly consider the origin of these differences. We first note that the distribution of these differences is not significantly biased toward either side of the identity line (binomial test, p = 0.19). The distribution of differences in shift could be attributable to slight differences in the monkeys' behavior between VIS and PROP trials, coupled with heterogeneous tuning properties across cells. For example, passive movement of the target arm during proprioceptive reach trials could cause the animals to make small movements with that arm, which might then affect tuning curves in ipsilateral cortex (Chang et al., 2008). However, comparison of the VIS+PROP shifts with the unimodal values shows that any such movements could at most account for only small effects on shift (Fig. 6B,C). Specifically, if the differences were attributable to movement of target arm, then the VIS and VIS+PROP comparisons should have differences similar to the VIS and PROP comparisons, whereas the PROP and VIS+PROP comparisons should be very similar. In fact, Figure 6, B and C, shows the opposite pattern, with VIS and VIS+PROP shifts showing the greatest similarity and the presence or absence of visual feedback having the largest effect on shift values. These small unsystematic differences notwithstanding, the principle observation of Figure 6 is that the shift values are well correlated between modalities. In other words, cells use similar reference frames across target modalities.
Tuning curve shift: comparison of population histograms across modality, epoch, and area
The analyses above showed that there are no significant differences in shift distribution across modality when cells are tuned for the two modalities being compared. This leaves open the possibility that, if we include epoch–cells with responses that are tuned for only one of the VIS or PROP modalities, we might see shifts that are preferentially biased toward the reference frame of the current target modality. In fact, we found no significant differences between the mean (p > 0.3, permutation test) or distribution (p > 0.2, Kolmogorov–Smirnov) test of shifts for the three modalities in either area 5 or MIP (Fig. 7A,C). This again shows that the reference frame of neural responses is not biased toward the reference frame of the sensory input.
In contrast, there is a slight trend in both areas toward higher shift values in later behavioral epochs relative to the delay responses (Fig. 7B,D). This trend does not reach significance in area 5 (p = 0.049 > 0.05/6 = 0.008, delay vs reaction; p = 0.017 > 0.008, delay vs move, permutation test, Bonferroni's correction) but is significant in MIP (p < 0.001 < 0.008 delay vs move, permutation test, Bonferroni's correction).
The largest effect in Figure 7 is the difference in the distribution of shifts between area 5 and MIP (Fig. 7, compare top and bottom rows). Area differences in both mean (p < 0.001, permutation test) and distribution (p < 0.001, Kolmogorov–Smirnov test) are highly significant. Area 5 has a more hand/body-centered representation, whereas MIP has a more eye-centered representation (mean shift value of δ = 0.25 in area 5, δ = 0.51 in MIP). Note that the heterogeneity of tuning shifts in both areas is not simply attributable to experimental variability, corrupting our view of a “pure” reference frame representation (δ = 1 or 0). First, even accounting for the potential biases in our shift analyses, the distribution of tuning shifts is not at all what we would expect for pure representations (compare Fig. 7 and supplemental Fig. 1, available at www.jneurosci.org as supplemental material). Second, the distribution of shift values observed in these areas is reflected in the information that can be read out from the respective populations (supplemental Fig. 8, available at www.jneurosci.org as supplemental material). Specifically, when the cells from area 5 and MIP were separately used to decode target position in either hand/body- or eye-centered reference frames, we found that area 5 contained more information about body-centered targets, whereas MIP performed similarly for both target reference frames. Thus, although the representations are clearly different across the two areas, neither area is well described by a single reference frame. The neural populations were highly heterogeneous with considerable overlap in the distribution of shift values between areas.
The majority of modality–epoch–cells shown in Figures 6 and 7 have shift values that fall between or are not significantly different than 0 or 1. However, a fraction of cases exhibit shifts that are significantly outside that range (69 of 908 tuning shifts; 5.8% in area 5, 8.8% in MIP) (supplemental Fig. 7B, available at www.jneurosci.org as supplemental material). Although many of these cases may be attributable to experimental variability (supplemental Fig. 1, available at www.jneurosci.org as supplemental material), it is likely that two other factors play a role. Some of these cells could represent “complex interactions” between eye position and reach endpoint (Mullette-Gillman et al., 2005, 2009), meaning that the response is not well described by a shift-and-gain model and would not be expected to yield a shift between 0 and 1. A simpler alternative explanation for these extreme shift values is that the eye-position effects for these cells principally take the form of gain modulation. In the present analysis, the tuning shift parameter was given preference over the gain parameter (we fit and compared Eqs. 1 and 3; see Materials and Methods). In this case, there is no a priori reason why the true gain changes should project to a shift value between 0 and 1. Indeed, many of our cells do not exhibit sufficient curvature in their tuning to permit a clear distinction between reference-frame shift and gain modulation. Although we cannot fully disambiguate gain and shift effects, in the next section we ask whether any potential misidentification of these effects would have led to different conclusions.
Tuning curve gain modulation
In the analysis above, we found that target modality had no net effect on shift values, suggesting that the eye-position effects are essentially modality independent. To ensure that our conclusions were not biased by focusing on tuning shift, we repeated the analysis above with a focus on gain modulation (fitting and comparing Eqs. 2 and 3; see Materials and Methods). The results of this analysis are shown in supplemental Figure 9 (available at www.jneurosci.org as supplemental material). We found that, when eye-position effects are measured as gain modulation, gain does not appear to depend on target modality. Furthermore, nearly identical percentages of modality–epoch–cells exhibit significant differences across target modalities, as seen for the shift analysis. The gain parameters were highly correlated between modalities, and the correlation coefficients were nearly identical to those obtained in the shift analysis. Thus, our main conclusions are independent of the model we use to account for eye-position effects.
Direct rate comparison of reference frames
Many researchers have attempted to determine the reference frame of neural responses by comparing firing rates across trial pairs that are the same in one reference frame but not others (Batista et al., 1999; Cohen and Andersen, 2000; Buneo et al., 2002; Mullette-Gillman et al., 2005, 2008). This approach provides a means of characterizing reference frame without having to fit a particular tuning curve model. To ensure that our results are not dependent on tuning curve fit, we compared the results of our tuning curve shift analysis to those obtained by direct rate comparisons. The results of the two analyses were in good agreement for the modality–epoch–cells in which both analyses successfully assigned a reference frame (supplemental Fig. 10, available at www.jneurosci.org as supplemental material). The direct rate comparison also yields results consistent with the tuning curve analyses on the dependence of reference frame on target modality, behavioral epoch, and cortical area. There was no significant difference in the direct rate reference frame across modalities (p = 0.806, Kruskal–Wallis test), and there were significant differences in reference frame across behavioral epochs (more eye-centered coding in later epochs, p < 0.001 < 0.05/3 = 0.017, Kruskal–Wallis test, Bonferroni's correction) and between area 5 and MIP (more eye-centered coding in MIP, p < 0.001 < 0.017, Kruskal–Wallis test, Bonferroni's correction). Thus, the main findings of this study are supported by both approaches to characterizing neural reference frames.
Tuning curve shape compared across modalities
We showed above that reference frames are typically independent of target modality; here we ask the more general question of whether the overall tuning curve shape is also independent of modality. The majority of cells we recorded had similar tuning curves and preferred directions across target modality (see the examples in Fig. 4 and supplemental Figs. 5, 6, available at www.jneurosci.org as supplemental material). However counter-examples were also observed (supplemental Fig. 7C, available at www.jneurosci.org as supplemental material). As a quantitative measure of the similarity in the shape and preferred direction of tuning curves in VIS and PROP trials, we computed the correlation between these modalities in mean firing rates across target and eye position for each epoch cell (Mullette-Gillman et al., 2005). In both area 5 and MIP, these correlations were mostly strong and positive (Fig. 8). To interpret the distribution of correlation coefficients, we compare it against two limiting cases. First, to estimate the distribution we would expect if tuning curves were completely unrelated across modalities, we computed correlations between pairs of modality–epoch–cells randomly selected from the dataset. The resulting distributions (Fig. 8, red histograms) are quite different from the actual data, which are clustered toward large, positive values. Second, to estimate the distribution we would expect if the tuning curves were identical across modalities (modulo a gain factor or offset term), we computed for each modality–epoch–cell the correlation between two tuning curves obtained by randomly dividing the trials into two halves. The resulting distributions (Fig. 8, blue histograms) are similar to those obtained from the empirical data. This indicates that tuning curve shape is primarily conserved across target modality for most cells in both area 5 and MIP.
Encoding multiple movement parameters
Up to this point, we have focused exclusively on the representation of target position. However, neural responses in the SPL are also known to encode other parameters about the movement, such as initial hand position and movement vector, although the extent of these effects and their representations are a matter of ongoing debate (Lacquaniti et al., 1995; Buneo et al., 2002; Chang et al., 2009; Ferraina et al., 2009; Chang and Snyder, 2010). Here, we ask whether including these variables affects our conclusions about neural reference frames in area 5 and MIP.
To determine whether cells are better characterized as encoding target or movement vector and in which reference frame, we added additional start locations to the list of task conditions for a subset of recording sessions (see Materials and Methods). We collected these extended datasets for 87 cells in area 5 (20 for monkey C, 67 for monkey E) and 57 cells in MIP (12 for monkey C, 45 for monkey E). Of these, 62 area 5 cells and 51 MIP cells were significantly tuned (p < 0.05/9 = 0.006, ANOVA, Bonferroni's correction across epochs and modalities).
We characterized each of these modality–epoch–cells using a direct rate comparison. For this analysis, three sets of rate comparisons were performed, each on pairs of conditions that were spatially matched for a given reference frame or variable: same body-centered movement vector but different eye position, same eye-centered movement vector but different location with respect to body, and same target (with respect to both eye and body) but different start positions. The responses of cells were then categorized based on the comparison or set of comparisons that were rejected because putatively matched pairs had significantly different rates (Fig. 3) (see Materials and Methods). A portion of modality–epoch–cells could not be characterized with this analysis (Fig. 9A), because either none of the candidate reference frames comparisons were rejected (insufficient power, 28%) or all comparisons were rejected (incorrect model, 3%). However, for most modality–epoch–cells, only one or two of the comparisons were rejected, and so responses could be identified with one of six candidate reference frames. Three of these reference frames encode the movement vector of the reach, conditioned on the body-centered or eye-centered location of the movement or independent of both (a “hand-centered” movement vector representation). The remaining three reference frames encoded target position independent of initial hand position in body-centered, eye-centered, or mixed eye and body reference frames (response depends on both eye and body coordinates).
We first focus on the coarse comparison between movement vector and target coding. There are no significant differences between the target modalities in the distribution of modality–epoch–cells that encode movement vector versus target (Fig. 9B) (p = 0.204, χ2 test). A comparison of coding across trial epochs shows a trend toward increasing target coding and decreasing movement vector coding later in the trial (p = 0.006 < 0.05/6 = 0.008, χ2 test, Bonferroni's correction). The finer six-category analysis (supplemental Fig. 11, available at www.jneurosci.org as supplemental material) indicates that the largest component of this increase in target coding is an increased eye-centered target, a trend also observed in the tuning shift (Fig. 7B,D) and direct rate comparison analyses above.
The most striking difference in the distribution of reference frame categories is between cortical areas. Area 5 has a tendency toward more movement vector coding, whereas MIP shows more target coding (Fig. 9D). Furthermore, the finer six-category analysis shows that area 5 has a distinct peak at hand-centered movement vector coding, whereas MIP has a distinct peak at eye-centered target coding (Fig. 9E). The difference between areas is significant for both for the six-category comparison and the coarser movement vector versus target comparison (p < 0.001 < 0.008, χ2 test, Bonferroni's correction). The distribution of responses across area 5 and MIP are consistent with the reference frame analyses presented above in that area 5 shows more hand/body-centered coding and MIP shows more mixed and eye-centered coding. However, even with the rough categorizations used in Figure 9, both areas show a range of reference frame representations. Thus, any attempt to define the coding of these areas in a single reference frame will miss important properties of the population responses.
Discussion
The main purpose of this study was to investigate whether reference frames in the SPL depend on sensory modality, as seen in other cortical areas (Jay and Sparks, 1987; Stricanne et al., 1996; Avillac et al., 2005; Mullette-Gillman et al., 2005; Fetsch et al., 2007). To investigate this possibility, we compared reaches to visual targets and proprioceptive targets (the ipsilateral hand) in two parietal areas, nominally area 5 and MIP, which are involved in sensory integration for movement planning. We found that, although reach representations differ between these two cortical areas, they do not appear to depend on the sensory modality used to specify the movement in either area.
We found that neural representations of reaching in area 5 and MIP are not consistent with a simple reference frame framework: responses are heterogeneous in both areas, and although most activity patterns are described by a gaze-dependent tuning curve, they typically do not fall into the neat categories of eye- or body-centered tuning. Our results are grossly consistent with previous findings that area 5 exhibits primarily body-centered (Lacquaniti et al., 1995) or hand-centered (Ferraina et al., 2009) coding, whereas activity within the IPS exhibits mixed or intermediate eye and hand (Chang and Snyder, 2010) or eye- and head/body-centered (Mullette-Gillman et al., 2005, 2009) coding. Cells in the IPS have also been shown to have gain fields for eye and hand position (Chang et al., 2009), suggesting similar mixed coding schemes. Other studies have claimed that reach-related areas in the SPL use single, consistent reference frames: an eye-centered target representation for the parietal reach region (PRR) (which includes part of MIP) (Batista et al., 1999) and an eye-centered movement vector representation for area 5 (Buneo et al., 2002). Although we do find a plurality of eye-centered target encoding cells in MIP (Fig. 9E) and a majority of cells with movement vector coding in area 5 (although most do not have eye-centered movement vector coding) (Fig. 9D,E), our results provide strong counterevidence for claims of single consistent reference frames within these cortical areas. The divergence between our observations and those of the previous papers is likely attributable to the overly regimented characterization of neural responses by previous studies, which emphasized the identification of a best reference frame. As we showed here for area 5 and MIP and as Chang and Snyder (2010) showed for PRR, when a more suitable, continuous measure of reference frame is used, a wide distribution of responses in seen.
There is an increasing body of evidence that such mixed or intermediate reference frames are used throughout the motor system and that there is a gradient of representations across areas within the cortical reach circuit (Lacquaniti et al., 1995; Colby and Duhamel, 1996; Duhamel et al., 1997; Kalaska et al., 1997; Caminiti et al., 1998; Burnod et al., 1999; Graziano, 2001; Kakei et al., 2001, 2003; Mullette-Gillman et al., 2005, 2009; Pesaran et al., 2006; Wu and Hatsopoulos, 2006; Batista et al., 2007; Wu and Hatsopoulos, 2007; Ferraina et al., 2009; Chang and Snyder, 2010). In particular, there appears to be a continuous gradient in the distribution of reference frames across the SPL (Burnod et al., 1999), with broadly eye-centered coding in the posterior IPS (PRR) (Chang and Snyder, 2010), intermediate eye and body coding further forward in the IPS (MIP), and more hand/body-centered coding on the dorsal surface (area 5). Indeed, a distribution of reference frames may provide some computational benefit for motor planning, for example, reducing variability (McGuire and Sabes, 2009). An explanation for the absence of pure reference frames comes from modeling studies showing that mixed representations arise naturally in networks of neurons performing reference frame transformations (Deneve et al., 2001; Avillac et al., 2005). Thus, although simple discrete reference frames are an attractive hypothesis, they do not appear to be used in the cortical reach circuit, and from a theoretical standpoint they may not be necessary for the reliable encoding of movement variables.
We found that the distribution of representations shifted slightly over the course of movement planning and execution, with trends toward more target coding, especially eye-centered target coding, in later behavioral epochs. Although two previous studies saw no change in reference frame during reach planning and initiation (Buneo et al., 2008; Chang and Snyder, 2010), these studies did not include an analysis of activity during movement, in which we see the biggest change. The epoch effect that we observed could be an artifact attributable to the saccade to the target location that animals were permitted to make after the 200 ms post-reach hold period. Alternately, these changes might reflect the fact that the initial hand position becomes less relevant as the hand approaches the target, or might reflect the change from movement planning to feedback control of movement in these areas (Desmurget et al., 1999; Desmurget and Grafton, 2000; Mulliken et al., 2008; Archambault et al., 2009). Whatever the underlying cause, the differences seen across epochs were small compared with the broad distribution of representations seen in area 5 and MIP.
Our finding that reference frame is independent of sensory modality is consistent with the results of two previous studies (Groh and Sparks, 1996; Cohen and Andersen, 2000). However, these observations are difficult to reconcile with a number of other studies showing that representations are skewed toward the reference frame of the incoming sensory information (Jay and Sparks, 1987; Stricanne et al., 1996; Avillac et al., 2005; Mullette-Gillman et al., 2005; Fetsch et al., 2007). These studies all focus on spatial representations that are critical for movement, and so there appears to be no general rule about the modality dependence of spatial representation for movement planning. However, one common feature among the studies showing modality-independent representation is that they all involve proprioceptive localization of the arm, either to move it (Cohen and Andersen, 2000), as a movement target (Groh and Sparks, 1996), or both (our study).
We argue that, for the SPL, the utility of a modality-independent representation may be related to two of the principal functions of the region: eye–hand coordination and sensory integration. It has been shown previously that neurons in the SPL have common preferred tuning directions for eye and hand movements, which may represent a strategy for coordinating the movements of multiple effectors in space (Burnod et al., 1999; Battaglia-Mayer et al., 2003; Mascaro et al., 2003). This “global tuning” is conceptually similar to the modality-independent representations that we observed and suggests that common representations may be important in areas involved in coordinating movements in space. The SPL is also important for integrating vision and proprioception of the hand (Graziano et al., 2000) and for maintaining calibration between these sensory inputs (Clower et al., 1996). It has been suggested that integrating multimodal information for movement planning requires a common representation (Cohen and Andersen, 2002; Stein and Stanford, 2008), and it has been shown that neurons in the dorsal medial superior temporal area that exhibit modality-independent tuning are more likely to contribute to multimodal behavior (Gu et al., 2008). We have argued previously that modality-independent representations in multiple reference frames are beneficial for sensory integration during reach planning (McGuire and Sabes, 2009). Thus, the key benefit of common effector- and modality-independent representations may be in the ability for downstream areas to use a common readout of this information independent of the sensory origin or intended use.
Finally, we have made frequent use of the term “reference frame,” because it is standard in the literature. However, we do not find this term to be clarifying. From a physiological perspective, it evokes the idea of a canonical eye- or body-centered representation, something that is rarely seen in higher sensorimotor areas. From a mathematical standpoint, the term suggests that the various representations differ in a trivial manner, by nothing more than a “rotation.” Ultimately, the term conflates two separable issues: the sensory information that is contained in a representation and the functional form of that representation. The principal aim of this work was to study the former: the difference between eye- and body-centered representations is whether eye-position information is integrated, and the difference between target and movement vector coding is whether initial hand position is also encoded. As more information is added to a representation, the representational demand grows exponentially, yet cortical sensorimotor circuits are able to integrate many streams of information and use this information to plan and execute movements. The finding of common, modality-independent representations for reaching may be an important clue for understanding this ability.
Footnotes
-
This work was supported by National Eye Institute Grants R01 EY015679 and T32-EY007120, National Institute of Mental Health Conte Center Grant P50-MH-077970, Whitehall Foundation Grant 2004-08-81-APL, and a National Science Foundation Graduate Fellowship (L.M.M.M.).
- Correspondence should be addressed to Philip N. Sabes, Department of Physiology, 513 Parnassus Avenue, University of California, San Francisco, San Francisco, CA 94143-0444. sabes{at}phy.ucsf.edu