Abstract
Neural networks of the brain involved in the planning and execution of grasping movements are not fully understood. The network formed by macaque anterior intraparietal area (AIP) and hand area (F5) of the ventral premotor cortex is implicated strongly in the generation of grasping movements. However, the differential role of each area in this frontoparietal network is unclear. We recorded spiking activity from many electrodes in parallel in AIP and F5 while three macaque monkeys (Macaca mulatta) performed a delayed grasping task. By analyzing neural population activity during action preparation, we found that state space analysis of simultaneously recorded units is significantly more predictive of subsequent reaction times (RTs) than traditional methods. Furthermore, because we observed a wide variety of individual unit characteristics, we developed the sign-corrected average rate (SCAR) method of neural population averaging. The SCAR method was able to explain at least as much variance in RT overall as state space methods. Overall, F5 activity predicted RT (18% variance explained) significantly better than AIP (6%). The SCAR methods provides a straightforward interpretation of population activity, although other state space methods could provide richer descriptions of population dynamics. Together, these results lend support to the differential role of the parietal and frontal cortices in preparation for grasping, suggesting that variability in preparatory activity in F5 has a more potent effect on trial-to-trial RT variability than AIP.
SIGNIFICANCE STATEMENT Grasping movements are planned before they are executed, but how is the preparatory activity in a population of neurons related to the subsequent reaction time (RT)? A population analysis of the activity of many neurons recorded in parallel in macaque premotor (F5) and parietal (AIP) cortices during a delayed grasping task revealed that preparatory activity in F5 could explain a threefold larger fraction of variability in trial-to-trial RT than AIP. These striking differences lend additional support to a differential role of the parietal and premotor cortices in grasp movement preparation, suggesting that F5 has a more direct influence on trial-to-trial variability and movement timing, whereas AIP might be more closely linked to overall movement intentions.
Introduction
In the sport of fencing, rapid actions are required on the millisecond scale. Small rotations of the wrist can make the difference between a hit and a complete miss. The response of athletes to various attacks is highly variable, despite the rigorously trained nature of their skill set. What are the factors that contribute to the variability of such complex actions? It is known that voluntary movements are prepared before they are executed (Kutas and Donchin, 1974; Wise, 1985; Ghez et al., 1997). A benefit of longer preparation is a reduction in reaction times (RTs), which is the time between a go signal and the initiation of a movement (Rosenbaum, 1980; Riehle and Requin, 1989). Nevertheless, RT varies even for similar amounts of preparation.
The ideal design for studying motor preparation is the delayed reaching task, in which a movement must be planned and withheld for a certain time. Studies have shown that population activity of neurons in the dorsal premotor cortex (PMd) of the primate brain, recorded either sequentially (Riehle and Requin, 1993) or in parallel (Churchland et al., 2006c; Afshar et al., 2011; Churchland, 2015), can explain a large portion of the variability in reach RT and reach velocity (Churchland et al., 2006a,b). Similar results have been obtained using sequential recordings in the parietal reach region (Snyder et al., 2006) and lateral intraparietal area (Janssen and Shadlen, 2005). However, a comparative study of the frontoparietal network has not been undertaken.
To analyze RT variability, an understanding of preparatory activity is vital. A number of models have been proposed to explain how preparation of movement is undertaken. Initial models related the preparatory activity of single neurons to behavior by suggesting that subpopulations of neurons may hold activity near a movement threshold that is crossed to initiate movement (Riehle and Requin, 1993; Hanes and Schall, 1996; Erlhagen and Schöner, 2002), whereas more recent models use a state space framework of population activity. In the latter framework, the firing of each neuron represents a dimension in a high-dimensional space of all neurons. Hence, the firing of all neurons at a particular time represents a single point in the state space, de-emphasizing the importance of tuning properties of individual neurons (Fetz, 1992). The “optimal subspace” hypothesis posits that a preparatory state is achieved during planning and that deviations from this state may delay movement (Churchland et al., 2006c; Churchland and Shenoy, 2007a). The “initial condition hypothesis” augmented this view by further stating that trial-to-trial fluctuations in the neural trajectory are correlated with RT (Afshar et al., 2011).
To elucidate the role of the frontoparietal network in preparation, the established hand grasping circuit (Luppino et al., 1999) consisting of the hand area (F5) of the ventral premotor cortex (PMv) and the anterior intraparietal area (AIP) were investigated using a delayed grasping task. Neural activity in these areas is modulated strongly by visual object properties (Murata et al., 1997, 2000), extrinsic goals (Kakei et al., 2001), performed grip types (Baumann et al., 2009; Fluet et al., 2010), and preparatory activity in these areas can be used to decode the visual properties of objects and complex hand shapes required to grasp a diverse range of objects (Carpaneto et al., 2011; Townsend et al., 2011; Schaffelhofer et al., 2015).
In the current study, we analyzed population activity in a delayed grasping task with multiple grip types to evaluate how population activity of simultaneously recorded units in F5 and AIP might inform subsequent behavior. Preparatory activity in F5 could explain up to 18% of the variability in trial-to-trial RT, a significant finding, whereas AIP could explain only up to 6%. By demonstrating a significant advantage of F5 over AIP in RT prediction, our results support the concept that the encoding of RT is represented primarily in the frontal and not the parietal lobe, at least when grasping in the dark.
Materials and Methods
Basic procedures.
Neural activity was recorded simultaneously from area F5 and area AIP in one male and two female rhesus macaque monkeys (Macaca mulatta, animals B, S, and Z; body weight, 11.2, 9.7, and 7.0 kg, respectively). Animal care and all experimental procedures were conducted in accordance with German and European law and were in agreement with the Guidelines for the Care and Use of Mammals in Neuroscience and Behavioral Research (National Research Council, 2003).
Basic experimental methods have been described previously (Townsend et al., 2011; Schaffelhofer et al., 2015). We trained animals to perform a delayed grasping task. They were seated in a primate chair and trained to grasp a handle with the left hand (animals B and Z) or the right hand (animal S; Fig. 1D). This handle was placed in front of the monkey at chest level and in the vertical position at a distance of ∼26 cm, i.e., the monkeys had to reach a distance of 26 cm to grasp the handle. The handle could be grasped either with a power grip (opposition of fingers and palm) or precision grip (opposition of index finger and thumb; Fig. 1E). Two clearly visible recessions on either side of the handle contained touch sensors that detected thumb and forefinger contact during precision grips, whereas power grips were detected using an infrared light barrier inside the handle aperture. The monkey was instructed which grip type to make by means of two colored LED-like light dots projected from a thin-film transistor (TFT) screen (CTF846-A; screen size, 8 inches, digital; resolution, 800 × 600; refresh rate, 75 Hz) onto the center of the handle via a half mirror positioned between the animal's eyes and the target. A mask preventing a direct view of the image was placed in front of the TFT screen and two spotlights placed on either side could illuminate the handle. Apart from these light sources, the experimental room was completely dark. In addition, one or two capacitive touch sensors (model EC3016NPAPL; Carlo Gavazzi) were placed at the level of the animals' midtorso and functioned as hand-rest buttons. The non-acting arm of animals B and Z were placed in a long tube, preventing it from interacting with the handle. Monkey S was trained to keep its non-acting hand on an additional hand-rest button.
Eye movements were measured using an infrared optical eye tracker (model AA-ETL-200; ISCAN) via a heat mirror directly in front of the monkey's head. To adjust the gain and offset, red calibration dots were shown at different locations at the beginning of each session for 25 trials that the animal fixated for at least 2 s.
Eye tracking and the behavioral task were controlled by custom-written software implemented in LabView Realtime (National Instruments) with a time resolution of 1 ms. An infrared camera was used to monitor behavior continuously throughout the entire experiment.
Behavioral paradigm.
Animals B and S performed Task 1 (Fig. 1E), whereas animal Z performed Task 2 (Fig. 1F). The following is an explanation of the trial course of Task 1. Trials started after the monkey placed the acting hand on the resting position and fixated a red dot (fixation period). The animal was required to keep the acting hand, or both hands (animal S), completely still on the resting position until after the go cue. After 400–700 ms, two flashlights illuminated the handle for 300 ms, followed by 600 ms of additional fixation. In the cue period, a second light dot was then shown next to the red one to instruct the monkey about the grip type for this trial (grip cue). Either a green or white dot appeared for 300 ms, indicating a power or a precision grip, respectively. After that, the monkey had to memorize the instruction for a variable memory period. This memory period lasted for 0–1300 ms (i.e., the go cue could appear simultaneously with the grip cue), in discrete memory period bins of 0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, or 1300 ms, which were pseudorandomly sampled with an equal number of trials from each condition. Regardless of memory period length, the grip cue was always shown for 300 ms. Switching off the fixation light then cued the monkey to reach and grasp the target (movement period) to receive a liquid reward. Animals were required to hold the appropriate grip for 300 ms. Additionally, catch trials were interleaved randomly (∼8% of trials), in which a go cue was never shown and the animal only received a reward if it maintained fixation and the hands on the hand rests for 2000 ms after the grip cue. All trials were interleaved randomly and in total darkness.
The differences between Task 1 and Task 2 are as follows. In Task 2, there was only one fixation period that lasted for 600–1000 ms. In Task 2, the illumination of the handle took place at the time of grip cue. In the instructed version of Task 2, the grip cues were identical to Task 1. In the free-choice version, both a green and white dot appeared simultaneously, indicating that the monkey was free to choose between the two grip types. This was followed by a memory period lasting 400–600 ms, and then either the green or white dot reappeared for 300 ms in 50% of all free-choice trials, which turned the free-choice task into a delayed-instructed task and was followed by a second memory period (duration, 400–600 ms). In all other trials (instructed or free choice), only the red fixation dot was shown during the second cue period, making it impossible to distinguish the first and second memory periods. The hold period in Task 2 was 200 ms as opposed to 300 ms in Task 1. Importantly, during free-choice trials, the reward was reduced every time the monkey repeatedly chose the same grip type.
Surgical procedures and imaging.
After completion of behavioral training, each animal received an MRI scan to locate anatomical landmarks for subsequent chronic implantation of microelectrode arrays. Each monkey was sedated (e.g., 10 mg/kg ketamine and 0.5 mg/kg xylazine, i.m.) and placed in the scanner (GE Healthcare 1.5T or Siemens Trio 3T) in a prone position. T1-weighted volumetric images of the brain and skull were obtained as described previously (Baumann et al., 2009). We measured the stereotaxic location of the arcuate and intraparietal sulci to guide placement of the electrode arrays.
Chronic electrode implantation.
An initial surgery was performed to implant a head post (titanium cylinder; diameter, 18 mm). After recovery from this procedure and subsequent training of the task in the head-fixed condition, each animal was implanted with floating microelectrode arrays (FMAs; MicroProbes for Life Science) in a separate procedure. Animal S was implanted with 32 electrode FMAs and received two arrays in each area (Fig. 1B). The arcuate sulcus of animal S did not present a spur, but in the MRI a small indentation was visible in the posterior bank, ∼2 mm medial to the knee, which we treated as the spur. We placed both anterior FMAs lateral to that mark. Animals B and Z were implanted with six electrode arrays in the right hemisphere, each with 32 electrodes (Fig. 1A,C). Two such arrays were implanted in area F5, two in area AIP, and two in the primary motor cortex (M1). FMAs consisted of nonmoveable monopolar platinum–iridium electrodes with initial impedances ranging from 300 to 600 kΩ at 1 kHz measured before implantation. Post-implantation measurements in the first months after implantation confirmed these values in vivo. Lengths of electrodes were 1.5–7.1 mm.
All surgical procedures were performed under sterile conditions and general anesthesia (e.g., induction with 10 mg/kg ketamine, i.m., and 0.05 mg/kg atropine, s.c., followed by intubation, 1–2% isoflurane, and analgesia with 0.01 mg/kg buprenorphene, s.c.). Heart and respiration rates, electrocardiogram, oxygen saturation, and body temperature were monitored continuously, and systemic antibiotics and analgesics were administered for several days after each surgery. To prevent brain swelling while the dura was open, the animal was hyperventilated mildly (end-tidal CO2, ∼30 mmHg), and mannitol was kept at hand. Animals were allowed to recover fully (∼2 weeks) before behavioral training or recording experiments commenced.
Neural recordings and spike sorting.
Signals from the implanted arrays were amplified and stored digitally using a 128 channel recording system (sampling rate, 30 kS/s; 0.6–7500 Hz hardware filter; Cerebus; Blackrock Microsystems). Data were first filtered using a median filter (window length, 3 ms), and the result was subtracted from the raw signal. Afterward, the signal was low-pass filtered with a causal Butterworth filter (5000 Hz; fourth order). To eliminate movement noise (i.e., common component induced by reference and ground), principal component analysis (PCA) artifact cancellation was applied for all electrodes of each array (as described by Musial et al., 2002). To ensure that no individual channels were eliminated, PCA dimensions with any coefficient >0.36 (with respect to normalized data) were retained. Spike waveforms were extracted and semiautomatically sorted using a modified version of the offline spike sorter Wave_clus (Quiroga et al., 2004; Kraskov et al., 2009).
Unit isolation was evaluated using four criteria: (1) the absence of short (1–2 ms) intervals in the interspike interval histogram for single units, (2) the degree of homogeneity of the detected spike waveforms, (3) the separation of waveform clusters in the projection of the first 17 features detected by Wave_clus, and (4) the uniqueness of the shape of the interspike interval distribution.
After the semiautomatic sorting process, redetection of the average waveforms (templates) was done to detect overlaid waveforms (Gozani and Miller, 1994). Filtered signals were convolved with the templates starting with the biggest waveform. Independently for each template, redetection and resorting was run automatically using a linear classifier function (MATLAB function classify). After the identification of the target template, the shift-corrected template (achieved by up and down sampling) was subtracted from the filtered signal of the corresponding channel to reduce artifacts for detection of the next template. This procedure allowed a detection of templates up to an overlap of 0.2 ms. As a control, unit isolation was evaluated again as described previously to determine the final classification of all units into single units or multiunits. In case of ambiguity, a unit was not classified as single. Stationarity of firing rate was checked for all units, and, in case the firing rate was not stable over the entire recording period (>30% change in firing rate between the first 10 min and the last 10 min of recording), the unit was excluded from additional analyses (<3% of all single units).
Data preprocessing.
In all datasets trials with outlying RTs, >700 ms in Task 1 and >500 ms in Task 2 and <200 ms in either task were excluded. In animals B and S, these trials comprised <1% of the data and <3% in animal Z. Clearly, all animals were careful to wait for the appropriate go cue and did not act preemptively. We used this conservative check on outlier RTs to safely exclude the possibility that animals were acting in anticipation of the go cue.
Crucially, for all analyses of Task 1, trials with memory periods <500 ms were excluded from analysis. These short memory period trials were removed to ensure that animals had sufficient time to fully plan the movement before acting. Such an exclusion criteria was not used in Task 2, because the animal never had <700 ms to plan (delayed-instructed condition) and was trained for many months to acquire this timing scheme.
All recorded units (single unit and multiunit) were used in our main analyses. After spike sorting, spike events were binned in overlapping 100 ms windows and sampled every millisecond to produce a continuous firing rate signal (1 kHz). This means that firing rates at the time of the go cue considered spikes occurring 50 ms before to 50 ms after the go cue. Because it is unlikely that (sensory) responses to the go cue would be represented in AIP or F5 already at 50 ms after presentation, we believe this binning does not bias the predictive power of RTs. In fact, our conclusions do not change when using a binning that does not extend beyond the go cue (data not shown).
Dimensionality reduction.
Dimensionality reduction was performed for the purposes of visualization only. All quantitative analyses relied on the full dimensionality of the data. Gaussian-process factor analysis (GPFA) was performed on the neural data from cue presentation to movement onset (Yu et al., 2009). This method performs smoothing of spike trains and dimensionality reduction simultaneously within a common probabilistic framework. It assumes that the activity of each unit is a linear function (plus noise) of a low-dimensional neural state whose evolution in time is well described by a Gaussian process. This methods allows for better visualization on the single-trial level than other published methods (Yu et al., 2009). The data were reduced to 12 dimensions (the optimal number of latent dimensions in the data as determined by cross-validation) using 20 ms nonoverlapping spike bins to produce the trajectories in Figure 3A. In this reduction, the three displayed dimensions explain 63% of the total variance. In this figure, a rotation of the first three latent dimensions is shown (equivalent to a linear combination of the three dimensions explaining the most variance overall).
Similarly, neural trajectories in Figure 9 were generated by performing PCA on the peristimulus time histograms of all units for each grasp condition separately. All individual trials were then transformed into the two principal components explaining the most variance and binned into slow, medium, and fast RTs. All trials were aligned to the go cue and plotted from 350 ms before to 280 ms after the go cue.
Projection methods.
As can be seen in Figure 3A, trials of the same condition tend to follow a stereotypical trajectory through neural space. Following the study by Afshar et al. (2011), we reasoned that the farther the neural state had advanced along the mean neural path at the time of the go cue would be predictive of subsequent performance. To test this hypothesis, we projected neural activity of individual trials at the go cue on the mean neural trajectory of similar trials (excluding the tested trial of the same condition). The projection is denoted in Figure 3B with the symbol α. The vector formed between the mean firing rate at the go cue and the firing rate at the go cue of an individual trial is projected onto the vector between the mean firing at go and the mean firing at go ± some Δt. The data were tested empirically to determine the optimal Δt values over all datasets. Selected Δt values ranged from 300 ms before to 300 ms after the go cue.
Additionally, as depicted in Figure 3D, the instantaneous velocity of the neural data, [tgo − (tgo − 20)], in the high-dimensional neural space of individual trials was projected onto the mean neural trajectory. Similar to the projection method, the velocity projection method hypothesized that trials in which the neural space is changing in the direction of the mean trajectory will have shorter RTs. Importantly, trials were segregated into 100–200 ms bins based on the length of the memory period to minimize the effect of memory period length on neural position, i.e., the mean trajectory used as a reference for each trial was calculated solely on other trials within the same memory period bin.
Euclidian distance method.
The Euclidian distance method was performed also equivalently to the study by Afshar et al. (2011). Single-trial RT was correlated with the Euclidian distance between the high-dimensional firing rate at the go cue on the single trial and the mean high-dimensional firing rate of all other trials of the same condition at some time offset, Δt, as depicted in Figure 3C.
The optimal subspace method, as originally reported by Churchland et al. (2006c), was also performed. It is equivalent to the Euclidian distance method with a time offset of Δt = 0 ms. Both of these methods are based on the hypothesis that trials in which firing rates are close to the mean rates observed for similar trials have shorter RTs.
Average rate method.
The average rate (AR) method is based on the simple hypothesis that trials during which particular units have higher firing rates will be associated with shorter RTs. This method posits that neural activity increases during preparation and crosses a movement threshold to initiate a movement, also known as the rise-to-threshold hypothesis (Erlhagen and Schöner, 2002). Under the assumptions of this method, higher preparatory activity would always be associated with shorter RTs. Four implementations of this method were tested initially. The trial-by-trial RT was correlated with the following: (1) the signed difference between firing rate at go cue and at cue onset (i.e., an approximation of baseline firing), averaged across all units; (2) the same method but using the unsigned difference (absolute value); (3) the average firing rate at the go cue across all units; and (4) the average firing rate at the go cue across all units for their preferred grip type only. The third version, which does not rely on baseline firing rate or unit preferences, was the best performing (data not shown) and was therefore the one used for additional analysis. For clarity, we opted to name our implementation of the rise-to-threshold hypothesis as the AR method.
Sign-corrected average rate method.
As hypothesized by the AR method, if units that increase their activity (relative to the mean) during movement preparation are associated with trials having short RTs, then they are negatively correlated with RT. However, if the activity of some units were in fact reduced (relative to the mean) for trials with short RT, this would result in a positive correlation. If many of each of these types exists in the same population, which is averaged to produce an RT prediction, these two inverted populations would be in conflict and cancel out each other, thereby causing poor RT prediction.
To overcome this obstacle, we introduced the sign-corrected average rate (SCAR) method. It is identical to the simple AR method as described in the previous section; however, the signal of all units was first multiplied with a sign-correction vector. That is, units that were correlated positively with RT were inverted to produce a negative correlation. To decide which units were to receive a negative value in the sign-correction vector, (1) the firing rates at the go cue of individual units were correlated with RT (twofold cross-validated) over all conditions. (2) Units received a −1 value in the sign-correction vector if they possessed on average (over all conditions) an r value >0. All other units received a value of 1 in the correction vector. This method preserves the absolute magnitude of the mean firing rate across trials because no normalization is performed. A number of inversion criteria were tested; however, we found that it was sufficient to invert only units with a positive r value (data not shown). It is important to note that units were not tested separately for each condition (grip type/decision condition), i.e., a unit that was inverted for a precision grip would also be inverted during a power grip. Testing on each condition separately would have increased RT prediction further.
RT correlations.
When correlating single-trial neural metrics, i.e., the previously described methods excluding the AR and SCAR methods, with RT, we did not include the neural data from that trial in the calculation of the mean neural trajectory used for that prediction, as in the study by Afshar et al. (2011). The predicted and observed RTs were then correlated with each other. This technique, termed leave-one-out cross-validation, ensured that predictions of the RT of each trial were not based on movement activity from that trial.
Whenever average RT variance explained was calculated across an average of datasets, each average was weighted by the number of trials in each dataset.
Partial RT correlations.
In our tasks, memory period length was correlated highly with RT (Fig. 2). To disentangle the relationship of memory period length to RT and the relationship of our neural prediction metrics on RT, partial correlation, which bares much similarity to multiple regression, was performed (Cramér, 1946). Partial correlation is a method for determining the correlation between two variables while controlling for one or more other variables. The partial correlation between two variables, while controlling for a single other variable, is described by where ρ is the partial correlation of a neural prediction metric (N) with RT (B), while controlling for the effect of the length of the memory period (M). rXY is the standard Pearson's correlation between vectors X and Y.
Cross-validation.
The results of all methods were twofold cross-validated. All trials of each dataset were first randomly segregated into two sets of equal size and methods performed separately on each set. Furthermore, the SCAR method required the preevaluation of preparatory correlations with RT to determine which units should have their firing rates inverted. To avoid false-positive results, SCAR was first trained on a training set of trials and always tested on trials that were not used for training. All analyses were twofold cross-validated by flipping the role of both sets. Segregating the data into more than two cross-validation folds would severely reduce the number of test trials in each condition and therefore the reliability of prediction.
Multiple linear regression.
To determine whether a combination of the tested methods could improve the amount of variance explained in RT, a number of regressions was performed. Multiple regression was performed using the leave-one-out technique, in which regressing on all other trials generated the prediction for each trial, and this process was repeated for each individual trial. First, the same model as described by Afshar et al. (2011) was used, which consisted of the projection method on both the pre-go and post-go cue axes, as well as the velocity projection method on both the pre-go and post-go cue axes. Alternatively, a number of simpler combinations were tested, although most are not presented here because they yielded poor results.
To test whether or not a multivariate model could explain significantly more variance than a simpler model, the F test was used. The F test is ideally suited to compare models (regressions) that use nested predictors, that is, models that use a subset of predictors of a more complex model. However, because we wanted to compare models over a number of conditions (each with varying degrees of freedom), we had to generate a nonstandard F distribution for testing. Therefore, the F statistic comparing each pair of models was calculated separately and then summed. Additionally, because each cross-validation fold contained different trials, each fold was considered as a separate condition for a total of four conditions in Task 1 (two behavioral, two folds) and 12 conditions (six behavioral, two folds) in Task 2. To generate a testing distribution the probability density functions (pdfs) of each corresponding F statistic were convolved with each other to form a new distribution. We then calculated the likelihood of observing the calculated sum of F statistics and from there derived the p value.
To extend this test over all datasets and reach general conclusions, the sum of F statistics was summed across all conditions and datasets and tested on an F distribution of convolved pdfs over all conditions and datasets.
Chance-level calculation.
Many individual correlations were computed in the current study. To ensure that all relevant methods were truly identifying relationships between neural data and RT, all correlations were tested against a chance distribution. For each method and condition, chance distributions were generated by correlating the prediction of each method with a corresponding vector of randomly shuffled RTs (1000 repetitions). We could then calculate the probability of observing the empirical R2 given our shuffled distributions and use this as a p value. In the case in which a significance calculation was required over multiple conditions and datasets, the generated chance distributions were convolved with each other to form a new distribution, precisely as with the F statistic for testing multiple linear regression. The p value for significance was fixed at 0.01.
For the SCAR method, an additional control was performed. The SCAR method involved the inversion of the firing rates of some units. To ensure that this inversion did not artificially produce our results, the following control was performed. A random sample of units of the same size as in the real data was inverted and the method performed as normal (1000 repetitions, permutation test). The resulting chance-level distributions could be tested against the empirical results as was done for the other chance-level calculations.
Variance selection.
All recorded units were included in the main analyses. To determine whether one could select a subset of units that would perform equally or better than the entire population, a variance selection was performed. The units with higher variances in spike count (at the go cue) across trials were preferentially included first. In addition, a random unit selection was performed alongside the first analysis with the same number of units per test. The random selection of units was performed 1000 times per percentage value. Data were interpolated to the range of 0–100% to facilitate averaging between datasets.
Significance testing was performed by summing the R2 over all datasets and testing the likelihood of obtaining this value against the distribution of convolved pdfs over all datasets as generated by the random unit selection, precisely as was done with the F statistic for testing multiple linear regression. The significance level was set at 0.05 and Bonferroni's corrected for the number of percentages tested (100).
Bayesian information criterion.
The Bayesian information criterion (BIC) is a well known model selection criterion (McQuarrie and Tsai, 1998). It is described by the following: where L̂ is the posterior likelihood of the data given the best-fit model, p is the number of parameters used to generate the model, and N is the number of observations used. A smaller BIC is associated with a better explanatory model. BICs were calculated for single conditions and averaged either over conditions or over conditions and datasets.
Results
Behavior
All three animals performed the task successfully. After initiating trials to the point of obtaining task information, i.e., receiving a grip cue, animals B, S, and Z successfully completed those trials 96, 98, and 95% of the time, respectively. Catch trials in which the animal was required to withhold movement were included in Task 1. Animals B and S completed these catch trials successfully 95 and 98% of the time, respectively. Figure 2 plots the RTs of all animals as a function of memory period length. The memory period in Task 1 lasted 0–1300 ms, whereas the memory period in Task 2 was relatively longer (a minimum of 1400 ms in the instructed condition including the grip cue) to facilitate a second cue period in the delayed-instructed condition. RTs were reduced during longer memory periods, consistent with the established hypothesis that motor preparation improves over time (Rosenbaum, 1980; Riehle and Requin, 1989). The exception to this was the 1300 ms memory condition in Task 1, in which RT slightly increased, likely because of the expectation of a catch trial, which appeared periodically and lasted 2000 ms. For animals B, S, and Z, the correlation coefficients over all datasets between memory period length and RT were −0.55, −0.57, and −0.33, respectively. Similar experiments have shown that saturation of RT, i.e., the minimum length of memory period after which RT does not significantly improve, is at least 100–200 ms (Churchland et al., 2006c) in a reaching task. In Task 1, we observed RT saturation, but we did not observe this in Task 2.
In contrast, there was no significant correlation between memory period length and movement time, which is the time between the hand leaving the hand-rest button and making contact with the handle, indicating that animals only initiated a movement when the movement was fully prepared. The only exceptions are the movement times of animal B, which were slightly longer in the 1300 ms memory period condition than in shorter memory periods, potentially an effect of decreased attention for long memory periods. For all animals, the hands remained completely stationary on the hand-rest buttons before the go cue. The experiments from which these data were collected were originally designed to assess hypotheses that are not presented here and will be addressed elsewhere.
Neural recordings
The analyzed datasets include a collection of 18 recording sessions, six from each animal. In animal B, an average ± SD of 63 ± 17 and 28 ± 18 units were recorded in F5 and AIP, respectively, as well as in animal S (mean ± SD, 132 ± 15 and 131 ± 26) and animal Z (85 ± 18 and 81 ± 24). An average of 483 trials per dataset met the inclusion criteria, as described in Materials and Methods. This corresponded to an average of 77 trials per condition and cross-validation fold overall. In animals S and Z, there was no significant difference between the two brain regions in the number of units recorded per dataset (p = 1 and p = 0.56, Wilcoxon's signed-rank test). However, in animal B, significantly more units were obtained in area F5 (p = 0.03), which may have affected the quality of RT decoding in area AIP. The majority of units obtained in all animals were identified as multiunits (52% in animal B, 60% in animal S, 70% in animal Z). All recorded single units and multiunits were included in additional analyses.
Although the response characteristics of each individual unit are not analyzed here in detail, it is worth noting that the overall tuning characteristics of units in F5 and AIP were very similar regardless of the task design used (Task 1 or 2). Furthermore, both tasks were able to elicit strong grip type tuning in both F5 and AIP. An average of 32% of recorded units were significantly tuned for grip type during the late memory period in F5, whereas 26% were tuned in AIP (p < 0.05, two-sample t test), which did not differ between areas (p = 0.09, Kruskal–Wallis ANOVA), although differences were seen between animals (p = 0.002, Kruskal–Wallis ANOVA), with animal B showing slightly less tuning overall (24% in F5 and 18% in AIP). This finding is particularly robust when considering that there are no visual cues present in the memory period, and, therefore, grip type tuning tends to reach a minimum during this epoch.
Low-dimensional visualization of single-trial trajectories
To visualize how neural data evolves on single trials, a low-dimensional representation of the full neural space of both brain areas combined is shown in Figure 3A for an exemplar dataset (instructed precision grip, dataset Z120829). Dimensionality reduction was performed using GPFA, as described in Materials and Methods. Single trials tended to evolve from cue onset to a preparation state and further to a movement state after the go cue. Conversely, it did not appear that variability between trials decreased in a systematic way when comparing the size of the confidence ellipses at cue onset, go cue, and movement onset. To determine whether the trajectory of an individual trial could be related to RT, three methods were formulated, as depicted in Figure 3B–D. These three methods, the projection method, Euclidean distance method, and velocity projection method, are presented here virtually identical to how they were performed by Afshar et al. (2011) and are described in detail in Materials and Methods. Although GPFA aids the visualization of single trials, in the following section, these RT prediction methods are tested in the high-dimensional state space of all recorded units.
Finding optimal reference points for trajectory-based methods
As can be seen in Figure 3B–D, each of the three high-dimensional state space methods relies on a reference time point, or Δt, on the mean trajectory. To find the optimal reference point, the time domain from 500 ms before to 500 ms after the go cue was tested. For purpose of determining optimal Δt values, this time window was further limited to ±300 ms, because reference points become more unreliable between conditions and datasets at large offsets. Each method was performed with this range of Δt values, and the predictions correlated with RT, as seen in Figure 4. The mean of all conditions is shown with a thick trace, and the used offsets before and after the go cue are marked with open circles (limited within ±300 ms). Inset histograms show all individual correlation coefficients [datasets (6) × conditions (2–6) × cross-validation folds (2)] before squaring and averaging and for each animal separately. The darker bars indicate correlations that are statistically significant (p < 0.05, Pearson's correlation). The results from animals B and Z are very similar and use identical offsets, whereas the results of animal S differed significantly.
As seen in Figure 4A, the projection method using references both before and after the go cue have correlation distributions with nonzero median in F5: one distribution is shifted to the negative and one to the positive. This finding is consistent with our hypothesis, because trials that are farther along a mean trajectory going forward in time should lead to shorter RTs and therefore an overall negative correlation between our neural predictor and RT. In AIP the projection method also performed significantly, although the resulting R2 is much lower than in F5 (Fig. 4B). Based on this analysis, the best positive and negative Δt values, which were then used in all subsequent analysis, were −290 and 60 ms in F5 for animals B and Z and −170 and 260 ms for animal S. In AIP, values of −210 and 200 ms were used for animals B and Z, and values of −40 and 60 ms were used for animal S.
The Euclidian distance method performed similarly to the projection method but explained overall less variance in RT (Fig. 4C,D). In both F5 and AIP, reference points generally produced correlation histograms that were shifted significantly from zero. In most cases the pre-go distribution was shifted to the negative direction and the post-go to the positive direction, again consistent with the hypothesis that trials that are closer to the state of the network after the go cue will have shorter RTs, with the notable exception of animal B on the pre-go axis, a point that is returned to later. Additionally, when using a time offset of 0 ms, identical to the so-called optimal subspace method (Churchland et al., 2006c), the correlation distribution tended to be only marginally significantly shifted from zero in F5 and AIP. Despite this, the optimal subspace method was not used in additional analyses, because the Euclidean distance method outperformed it in every case. Based on this analysis, the Δt values that were used in additional analysis were −300 and 170 ms in F5 for animals B and Z and −270 and 270 ms for animal S. In AIP, values of −90 and 300 ms were used for animals B and Z and values of −100 and 300 ms for animal S.
The velocity projection method performed poorly overall, accounting for <1% of the variance in RT (Fig. 4E,F). Only rarely do reference points in F5 or AIP have correlation distributions significantly shifted from zero. Furthermore, accounting for the effect of memory period length on RT using partial correlation completely eliminates this effect (data not shown). Therefore, for most of our additional analyses, the velocity projection method was excluded. It should be noted that trials were segregated into bins based on memory period length, as described in Materials and Methods. However, when all trials are pooled together, the resulting predictions of RT are still nonsignificant (data not shown).
Population firing rate-based methods
In addition to our high-dimensional trajectory-based methods, we also tested simpler methods based on averaging the activity of all units around the go cue. Such methods still depend on simultaneously recorded units, because they require an estimation of the population neural state for each trial. The first method we tested is the AR at go, which is our implementation of the rise-to-threshold hypothesis, as described by Afshar et al. (2011). The correlation histograms obtained by the AR method are shown in Figure 4, G for F5 and I for AIP. The median of the correlation distribution is significantly shifted negatively in two of the three animals in F5 (Wilcoxon's signed-rank test), suggesting that higher firing rates around the go cue led to shorter RTs. However, in AIP, the distribution was only shifted for one of the three animals. However, in all cases in which a significant shift was present, this shift was in the negative direction, suggesting that higher firing rate tended to be related to shorter RTs.
The AR method relies on averaging. Therefore, if some units in the population are correlated negatively with RT whereas others are correlated positively, these effects could cancel out at the population level. To deal with this issue, we first correlated the firing rate at the go cue of each unit with RT on a set of training trials. Then, as described in Materials and Methods, on a set of testing trials we inverted the firing rates of units that had a positive correlation in the pretesting (twofold cross-validated). Briefly, the process consists of multiplying the firing of all units by a sign-correction vector (see Materials and Methods). This new method was termed the SCAR method. The correlation histograms of the SCAR method are shown in Figure 4, H and J, for F5 and AIP, respectively. In both areas the median of the correlation distribution was shifted strongly into the negative domain (three of three animals in both areas, Wilcoxon's signed-rank test). Over all datasets, the average number of units whose activity was inverted was 38% in F5 and 42% in AIP, a large portion of the total unit count. The number of units inverted was less for animal S, in which the performance of the AR method was already considerably high.
Pooling of multiunits and single units does not bias RT prediction
To ensure that the previous results were not attributable to the sole contribution of either multiunits or single units, we repeated the analysis using only multiunits or single units. Results are presented as a performance ratio of average fraction of RT variance explained using only single units or multiunits versus the pool of all units (where 1 represents identical performance). In F5, when including only single units, the pre-go and post-go projection methods produced ratios of 0.81 and 0.84, respectively. In AIP, the pre-go and post-go projection methods produced ratios of 0.76 and 0.91, respectively. The pre-go and post-go Euclidean distance methods produced ratios of 0.65 and 0.95 for F5 and 0.72 and 0.81 for AIP single units.
When including only multiunits, in F5, the pre-go and post-go projection methods produced ratios of 0.64 and 0.69 in F5 and 0.77 and 0.69 in AIP. The Euclidean distance methods produced ratios of 0.87 and 0.87 in F5 and 1.13 and 0.59 in AIP.
The same analysis was done for the population-based methods (AR and SCAR). The AR method had performance ratios of 0.74 and 0.79 when using single units only in F5 and AIP, respectively, whereas multiunit only performance was 1.25 and 1.07. The SCAR method had performance ratios of 0.76 and 0.87 when using single units only in F5 and AIP, respectively, whereas multiunit only performance was 0.66 and 0.63.
In almost every case, including only multiunits or single units in the analysis reduced the overall performance. Using only single units caused a performance reduction of 9–36%. Similarly, using multiunits caused a reduction of 13–41%, with the exception of the Euclidean distance method (before go cue) in AIP and the AR method in both areas, which increased slightly. Overall, the prediction of RT cannot be explained based solely on the contribution of either single units or multiunits. However, the AR method seems to perform best using multiunits, suggesting that compounding single-unit responses stabilizes the performance of this method.
Neural activity predicts trial-by-trial RT
Together, the results of all methods based on optimal Δt selection are shown in Figure 5. The mean R2 is plotted for all 18 datasets in both F5 and AIP. Open bars mark methods that did not perform above chance level, as described in Materials and Methods. The average R2 over all methods is significantly higher in F5 than AIP (p < 0.001, Kruskal–Wallis ANOVA), suggesting that the population activity in F5 better encodes the variability in grasping plans. However, it is important to note that the current tasks include a large reaching component, which is also represented strongly in F5 and AIP (Lehmann and Scherberger, 2013), although more so in AIP. Therefore, some similarities between the behavioral and neural results of the current study and previous reaching studies are expected.
Not all methods achieved the same level of performance overall (p < 0.001, Kruskal–Wallis ANOVA). The best method on average, SCAR, was able to explain 18% of the variance in RT in F5 and 6% in AIP. The SCAR method and the projection (before go cue) method performed best in F5 for animals B and Z, explaining 18 and 16% of the variance in RT, respectively. In animal S, this pattern differed in F5, because the best performing methods were SCAR and Euclidean distance (after go cue), explaining 17 and 13% of variance in RT, respectively. The mean RT prediction for each animal is summarized in Figure 6A. There was no effect of grip type (p = 0.69, Kruskal–Wallis ANOVA) in all animals, suggesting that RT could be predicted equally well regardless of grip. Additionally, there was no effect of cross-validation fold (p = 0.93, Kruskal–Wallis ANOVA), confirming that segregating the data into training and testing trials did not introduce inconsistencies into the results.
As described previously, Task 2 contained different task types (instructed, free choice, and delayed instructed). There was a significant effect of task type on RT prediction (p < 0.001, Kruskal–Wallis ANOVA) over all methods, although the effect size was very small (effect size, η2 = 0.018). The worst performing decision condition was the delayed-instructed condition, in which a second cue was presented later in the memory period. This small, but significant, effect on RT prediction is likely attributable to the disruptive effect of a second cue close to the end of the memory period. Interestingly, there was no difference in RT prediction between the instructed condition and the free-choice condition (p = 0.80, Wilcoxon's rank-sum test), suggesting that the way in which a motor plan is selected does not affect the relationship between preparatory activity and RT.
To summarize the number of individual correlations that have significant p values (p < 0.05; equivalent to the black bars in the histograms of Fig. 4), the total fraction of significant correlations is plotted in Figure 6C. In F5, between 21 and 96% of the correlations were significant for each method, whereas this range was between 4 and 67% in AIP, therefore confirming the overall better predictability of RT in F5.
Given the success of the SCAR method, an interesting question arises. If it is effective to predict RT by calculating a weighted mean of all units, in which the weights are either exactly −1 or 1, would performance improve if weights were not restricted in any way? This idea can be tested directly by using linear regression to fit a set of weights to all units (MATLAB function regress). The results of this analysis, when cross-validated identically to the main analysis (twofold), show that a linear regression over all individual units can explain 3.9% of variance in RT in F5 and 2.2% in AIP, with a significant advantage of F5 over AIP (p < 0.001, Kruskal–Wallis ANOVA). However, this performance is only one-quarter of the projection or SCAR methods overall. Because the number of available units frequently outnumbers the number of available trials, coefficients cannot be ideally identified. For this reason, the regression often excluded up to 15% of the units in each dataset by assigning them a coefficient of zero.
To deal with the small number of trials available, it is also possible to use stepwise linear regression to add or remove units based on how their inclusion affects the model (MATLAB function stepwisefit). To produce an optimal solution, the model was initialized with only a constant term and units were subsequently added if they significantly improved the model (F statistic, p < 0.05). The results of this analysis show that a stepwise linear regression over all individual units can explain 12.1% of variance in RT in F5 and 3.4% in AIP, with a significant advantage of F5 over AIP (p < 0.001, Kruskal–Wallis ANOVA). However, in this analysis, between 77 and 97% of units were excluded from the model to produce an optimal fit. Together, the linear regression results are consistent with previous analyses showing an advantage of F5 over AIP and are similar to the results obtained by selecting units by variance over trials (see Fig. 8). However, their usefulness is limited, at least in datasets with a restricted number of trials.
Removing the effect of the memory period does not eliminate RT prediction
The length of the memory period was strongly negatively correlated with RT in all tasks and animals (Fig. 2). To ensure that a straightforward encoding of the memory period in the firing rates of individual units was not responsible for our findings, all methods were retested using partial correlation. As described in Materials and Methods, partial correlation allows for the correlation of two variables while controlling for the linear effects of one or more additional variables. Here we controlled for the effect of memory period length on RT. Figure 6B shows the mean R2 over all datasets while controlling for the effect of memory period length. Partial correlation reduces the performance of all methods, but almost all methods remain above chance level in F5. In AIP, all methods are reduced to chance level in at least one animal, with the exception of the SCAR method. The largest reduction in performance caused by partial correlation was 66% over all methods in animal B, suggesting a strong reliance on the memory period length and consistent with the unexpected direction of the shift in the correlation coefficient distribution of animal B in Figure 4C. The smallest reduction in performance was 25% in animal Z. In AIP, results of each animal were never reduced by >38%. For comparison, the mean R2 using the standard correlation metric is shown in Figure 6A. Similarly, the number of significant correlations was reduced when using partial correlation as illustrated in Figure 6D.
Anterior AIP outperforms posterior AIP
A number of recent studies have highlighted that the anterior (aAIP) and posterior (pAIP) subdivisions of AIP differentially encode visual task parameters (Baumann et al., 2009; Romero and Janssen, 2014) and differ drastically in their effective connectivity (Premereur et al., 2015). Because it is not well understood how these two areas differ in their contribution to preparatory activity for grasping, we further segregated our units into aAIP and pAIP corresponding to the anterior and posterior implanted arrays, respectively, and repeated the main analyses.
Unlike the comparison between F5 and AIP, the number of units recorded on each array within AIP differed significantly for all animals (p < 0.05, Wilcoxon's signed-rank test). Therefore, for each dataset, units were discarded randomly from the larger set until an equal number of units were present from each subarea (stratification).
If the same RT prediction methods used in the main analysis are applied to subdivisions of AIP, there is a small, but significant, advantage of aAIP over pAIP (p = 0.021, Kruskal–Wallis ANOVA). Most of this advantage comes from the projection (pre-go) method, with an average R2 of 0.031 in aAIP and 0.019 in pAIP (p < 0.01, Wilcoxon's signed-rank test). In agreement with the main results, there was no significant difference in RT prediction between grip conditions or cross-validation folds (p = 0.36 and p = 0.86, Kruskal–Wallis ANOVA). These findings are in line with the emerging view that a gradient of visual to motor processing exists between pAIP and aAIP.
Multiple regression does not improve RT prediction
By combining multiple prediction methods in a multiple regression, it is possible to capitalize on the potential orthogonality between different predictors. To test whether a multiple regression could increase overall prediction of RT, we first replicated the regression described by Afshar et al. (2011), which consists of a regression of the pre-go and post-go cue versions of the projection and the velocity projection methods. Because the velocity projection method performed poorly in our analysis, it was not expected for this regression to significantly improve RT prediction. In fact, this four-factor multiple regression only out-performed simpler unimodal and bimodal regressions consisting of subsets of these factors in 16.7% of all datasets in F5 and 11.1% in AIP (F test). Furthermore, this regression never achieved a lower BIC score than more parsimonious regressions in any dataset or brain area, suggesting that combining these four factors in a regression is not justified in our dataset.
A number of other regressions were tested, but in no case were >50% of datasets in F5 and 16.7% of datasets in AIP able to significantly outperform simpler regressions (F test). Furthermore, none of these multiple factor regressions achieved a lower BIC in >11.1% of datasets in F5 and in none of the datasets in AIP.
Because multiple regression performs best when individual variables are independent, it would be unlikely to explain significantly more variance in RT if our predictors are highly correlated. In fact, most methods are highly correlated with one another in our dataset (minimum R2 > 0.14), with the exception of the velocity projection method (R2 < 0.03), which performed poorly in the main analysis.
No alternative reference point can outperform SCAR
The SCAR method relies on first correlating the firing rate of each unit with RT and then inverting based on the resulting correlation coefficient. Because this method relies on cross-validation, it would be preferable to perform a method that does not rely on previous information. To ensure that this alternative was not possible, a control was performed. The mean firing rate at multiple time points (up to 2 s) before the go cue was subtracted from the firing rate of each single trial, and the absolute value of the resulting signal was taken. Subsequently, the firing rate on each trial was averaged over units and correlated with RT. This method has the effect of inverting the activity of each unit relative to the mean firing rate at some previous time point. In no case was such a method able to explain more or equal variance in RT than the SCAR method, suggesting that there exists no trivial alternative to pretesting each individual unit with respect to measured RT. The same was true if grip cue-aligned activity was used for reference.
Consistency of the sign-correction vector during movement
Because the SCAR method relies on previous information of the relationship between firing rate and RT, we wondered whether the learned sign-correction vector, which is used to invert the activity of positively-correlated units, revealed a natural alignment of firing rates across time or whether it simply fit the data well at the time of the go cue. To test this, we used the sign-correction vectors determined in the main analysis to repeat the SCAR method using neural activity not just from the go cue but also at variable time points from 500 ms before to 800 ms after the go cue. The results of this analysis are depicted in Figure 7. As can be seen in Figure 7C for animal S in F5, maximal RT prediction is achieved shortly before median movement onset (R2 = 0.56). If neural activities of many units peak shortly before movement onset, then the SCAR method should perform best at this time. Such a result would suggest that trials on which activity drifted toward the firing rate observed at movement onset were more likely to be trials with a short RT, in line with the rise-to-threshold hypothesis (Erlhagen and Schöner, 2002).
However, as can be seen in Figure 7A, the results of animals B and Z differed significantly in F5. In this case, the peak RT prediction occurs precisely around the time of go cue. In contrast to the results of animal S, a peak at the go cue suggests that, although our sign correction was able to properly align the firing of each unit at the go cue, it does not necessarily represent a consistent pattern in the firing of the underlying units.
In AIP, peak RT prediction was achieved in all animals shortly before median movement onset, i.e., a higher (sign-corrected) firing rate on single trials tended to lead to shorter RTs. Maximal RT prediction before movement onset can be explained by the idea that activity either rises during the memory period to achieve peak activity during the movement or that activity is decreased during the memory period to reach a minimum during movement. This result in AIP is consistent with a study showing significant RT prediction from activity in AIP shortly before movement onset (Verhoef et al., 2015).
Variance selection allows high performance with a subset of units
Because all recorded units were included in the previous analysis, we were curious whether a subset of units could be selected that performed equally well or better than the entire population. To test this, a variance selection of units was performed. Units were discarded from the analysis in order of increasing variance in spike count (at the go cue) across trials. This way, units with higher variances were preferentially included. For the two best performing methods, SCAR and projection (before go cue), the variance selection performed significantly better than chance (p < 0.05, Bonferroni's corrected) in F5 for all animals (Fig. 8A,C). In AIP, only variance selection using the SCAR method outperformed chance (Fig. 8B,D).
In all cases, selecting units by variance did not improve maximal performance, as expected. In fact, when comparing performance using all units to a smaller subset in F5, using a variance-selected subset of only 32 or 18% of recorded units, for the SCAR and projection (before go cue) methods, respectively, suffered only a 5% decrease in performance. For the SCAR method, it was only necessary to use a subset of 23% of the available units in F5 to attain 95% of maximal performance. Together, these results suggest that, when units are selected by variance at the go cue, only relatively small subsets of the recorded units are required to attain virtually maximal performance. More importantly, including all units in the population does not appear to add noise to these methods, because maximal performance is achieved when including all units, suggesting that they properly describe the relationship between preparatory activity and RT at the population or network level.
Variability of RT axis from day to day and animal to animal
When considering each dataset separately, it became clear that the day-to-day variability in RT prediction for each method is relatively high (Fig. 5). To elucidate how neural trajectories, which are presumably very similar over sessions, could explain very different amounts of variance in RT, we visualized a few individual sessions using PCA. This second dimensionality reduction method was introduced, in addition to the GPFA used in Figure 2A, to visualize average trajectories as opposed to single trials. In contrast to GPFA, which applies many different and sometimes large smoothing kernels, PCA allows more direct control over the amount of smoothing over time. In Figure 9, the first two principal components of individual conditions of individual recording sessions are shown. The mean trajectory over all trials is depicted along with the mean trajectory of trials binned into slow, medium, and fast RT trials. In every subplot, a visualization of the projection (after go cue) method is presented from the data. In this visualization, the position of single trials along the dashed projection axis would determine our measurement for how far along the mean neural trajectory this trial is. The subsequent length of the projection of each single trial onto this axis would then be used to predict RT. In Figure 9A, trajectories of a power grip condition are shown from dataset B140509. It appears that the fast and slow RT trials are located distantly to each other along the projection axis, suggesting that this axis would be valuable in explaining trial-to-trial RT variability. This was in fact the case, because the projection (after go cue) method was able to explain 27% of the variance in RT in the main analysis of this dataset. However, note that the position of the fastest RT trials is less far along the mean trajectory than slow trials, directly contradicting the predictions of our hypothesis. As we noted in Figure 6B, much of the RT prediction obtained in animal B was eliminated by controlling for the effect of memory period length. Based on the trajectory in Figure 9A, it seems that trials with longer memory periods tended to continue along the projection axis instead of lingering near the mean trajectory. Because longer memory periods led to slower RTs for the most extreme memory period lengths (1300 ms), trials that have progressed farther along the mean indicated slower RT trials.
Plotted in Figure 9B is the mean trajectory of the precision grip on the very next dataset (B140515). The mean trajectory for this condition is very similar to that of Figure 9A. However, the orientation of the projection axis is approximately orthogonal to that of an axis running through the slow and fast RT trials, suggesting that this axis would explain only a small amount of variability in trial-to-trial RT. The projection (after go cue) method performed at chance level for this dataset, only explaining 3% of the variance in RT. Therefore, it seems plausible that, even when trajectories are similar, it is possible for RT variance to be rotated relative to the mean trajectory, suggesting that the mean trajectory may not always be an ideal reference. In this case, the Euclidean distance (after go cue) method performed significantly better because trials with shorter RT were located closer to the movement onset state in the state space.
The trajectory in Figure 9C shows a precision grip from dataset S1209013. It seems to differ substantially from the other trajectories. In this case, there is no clear progression of the preparatory trajectory near the go cue, and there is also no abrupt change in the directionality of the trajectory after the go cue. Therefore, it is not surprising that both projection methods performed quite poorly on this dataset. Only by selecting a Δt for the projection (after go cue) method that was quite large (210 ms) could improve RT prediction. In this case, the projection (after go cue) and Euclidean distance (after go cue) methods performed similarly, which is not surprising because projection and distance become mathematically similar for large Δt values.
Figure 9D represents an ideal trajectory of a power grip from dataset Z120921. In this case, trials that are farther along the projection axis correspond to trials with shorter RTs, in line with the predictions of the projection method.
Discussion
Using simultaneous neural recordings from three animals, we have shown that preparatory activity in both premotor and parietal cortices is correlated with trial-to-trial variability in RT. However, the activity in F5 is far more predictive of RT than in AIP. Although the length of the memory period facilitated RT predictability, our findings cannot be explained purely based on this relationship. The use of a state space framework, made possible by the parallel recording of many units, represents a major step forward in understanding the relationship between preparatory activity and behavioral parameters.
Trial-to-trial RT prediction
Although response characteristics and tuning properties of AIP and F5 neurons can be very similar (Baumann et al., 2009; Fluet et al., 2010), we have shown that their trial-to-trial relationship with RT differs greatly (Fig. 5). The current result is not trivial, because F5 and AIP are densely and reciprocally connected (Luppino et al., 1999; Borra et al., 2008). However, the level of RT predictability found in the preparatory activity of AIP units is consistent with previous studies in nearby areas (Janssen and Shadlen, 2005; Snyder et al., 2006).
Together, the relative advantage of F5 over AIP is not altogether surprising given the fact that F5, which has projections to the spinal cord (He et al., 1993; Borra et al., 2010) and a facilitation effect on M1 (Shimazu et al., 2004), must naturally be involved in the transition between preparation and movement execution.
Comparing prediction methods
High performance of the projection method, matched only by the SCAR method, is consistent with the “initial conditions” hypothesis formulated by Afshar et al. (2011) from activity in PMd. Even after controlling for the effect of memory period length, the projection method still performs above chance level; however, the SCAR method can outperform the projection method, especially in Task 1. SCAR offers an alternative explanation for the relationship between single-unit firing and RT. In this framework, most individual units have a consistent relationship with RT, i.e., higher or lower firing rates before the go cue are associated with shorter RT. Controlling for the sign of this relationship was able to increase RT prediction up to four times and follows well from the observation that the preparatory activity in PMd is both positively and negatively correlated with RT (Riehle and Requin, 1993), as in F5.
Could subpopulations of these units explain prediction of RT? If this were the case, we would expect units that fire more during the delay would continue to rise during the movement. In two of three animals, the SCAR method peaked in RT prediction at the go cue, suggesting that this sign correction was a local property and not a consistent property of each unit (Fig. 7A,C). Indeed, activity is often higher in the delay period than during the movement (Crammond and Kalaska, 2000), suggesting that the relationship between firing during preparation and movement is complex (Churchland and Shenoy, 2007b).
Interestingly, maximal performance is always achieved for the projection and SCAR methods when including all recorded units and not a variance-selected subset in F5 (Fig. 8), supporting the conclusion that both methods accurately describe population-level features and are not simply dominated by specific subpopulations of units.
Previously, the best performing method was a multiple regression of projection and velocity projection components (Afshar et al., 2011). We did not find significant performance of the velocity projection method or any multiple regression. When examining our neural trajectories, it seems that in many cases the speed of change in neural signal remained high or even increased during the memory period, especially in Task 1 (our unpublished data). This may represent an interesting quality of F5 activity that differs from activity in PMd. Such memory-related activity could mask relationships between trial-to-trial neural velocity and RT, especially after factoring in the length of the memory period.
Differences between PMv and PMd
To our knowledge, the preparatory activity recorded in F5 in our study explains more trial-to-trial variance in RT than any other published study. However, the results obtained in nearby PMd are quantitatively comparable (Afshar et al., 2011). A number of studies have systematically contrasted PMv and PMd (for review, see Hoshi and Tanji, 2007). It may be that F5 is more involved in the specific timing and execution of reaching movements than PMd, as evidenced by chemical inactivation (Kurata and Hoffman, 1994). Nevertheless, both PMv and PMd are essential for grasping movements (Raos et al., 2004, 2006). Electrical microstimulation in PMd during preparation (Churchland and Shenoy, 2007a), and potentially F5 (Gerits et al., 2012), delays movement onset.
Although PMd and PMv are part of relatively distinct parietofrontal networks, they both have an important effect on behavioral timing. Additionally, both PMd and PMv project to similar locations within M1 and lack a clear hierarchy (Dum and Strick, 2005), suggesting that their roles are complementary and not sequential.
Limitations
To rule out premature muscle contractions as an explanation for RT prediction obtained during a delay, electromyographic recording of relevant muscles has been used in the past (Churchland et al., 2006c; Afshar et al., 2011). Such recordings were not undertaken in the current study; however, we do not believe that premature muscle contractions are a likely cause of the RT prediction observed here for three reasons. First, the hands of all animals remained completely still on the hand rest buttons until after the go cue had been given, as confirmed by infrared monitoring. Second, the RTs of all animals were well above 200 ms in at least 97% of trials, suggesting that they appropriately awaited the go cue. Third, animals successfully withheld movement during the catch trials, suggesting that they were properly awaiting the go signal.
Although the primary interest of the current study was grasping actions, all movements included a large reaching component as well. It remains a possibility that the relative advantage of F5 over AIP could in part be attributable to a larger role of F5 in reaching than AIP. However, previous studies dissociating reaching and grasping have shown that PMv is greatly involved in the representation of grasping without a reach component (Hepp-Reymond et al., 1994) and is potentially even less involved in reach encoding than AIP (Lehmann and Scherberger, 2013). Therefore, finding higher RT prediction accuracy in F5 rather suggests a larger influence of the grasping component in the neural signal.
Implications for models of motor preparation
It is clear that the most dominant factor in the neural trajectories of animal B is the length of the memory period itself (Fig. 9A), which seems to act counter to the notion of an optimal subspace, because trials do not congregate within an area of low variability. It has been shown that variability is decreased by external stimuli, which was observed in PMd (Churchland et al., 2006c) and a number of other cortical areas (Churchland et al., 2010). If F5 neurons were multiplexing many factors in addition to a motor plan such as anticipation of the go cue, similar to hazard rate (Janssen and Shadlen, 2005), or variability in attention over longer periods of time, trial-to-trial variability might be increased at go cue. Furthermore, encoding of the length of the memory period clearly increased RT predictability in F5 and AIP, as evidenced by the decrease in predictability when using partial correlation. Additional work is needed to determine the extent to which F5 and AIP encode cue anticipation or attention-related factors.
Alternatively, it could be that the subspace required to successfully complete the grasping movement is sufficiently large to allow trajectories to lie in a relatively wide space. The absence of a static prepare-and-hold state is consistent with the augmented view of the initial conditions hypothesis posited by Ames et al. (2014), who showed that the memory state is bypassed in PMd when time to prepare an action is lacking. In this view, the subspace required to successfully complete an action, i.e., with no penalty in movement generation, but a possible penalty in RT, could be quite broad. However, it is clear that F5 firing rates do not necessarily congregate in a specific part of the state space given enough time, as would be predicted by an attractor model of preparatory dynamics. The interesting question of determining whether such a prepare-and-hold state is necessary in F5 or AIP, along with whether the observed preparatory processes set the initial conditions of a dynamical system, as they do in PMd and M1 (Churchland et al., 2012; for review, see Shenoy et al., 2013), are left to future works.
Recently, the ability to record activity from many neurons simultaneously has opened up new possibilities in the investigation of the motor and premotor cortices (for review, see Churchland et al., 2007). The current study explores the relationship between preparatory activity in large populations of neurons and subsequent behavior, shedding light on the differential role of parietal and frontal cortices in this process.
Footnotes
This work was supported by German Research Foundation Research Grant SCHE 1575/1-1. We thank Natalie Nazarenus, Ricarda Ahlert, and Matthias Dörge for technical assistance.
The authors declare no competing financial interests.
- Correspondence should be addressed to Hansjörg Scherberger, Neurobiology Laboratory, German Primate Center, Kellnerweg 4, D-37077 Göttingen, Germany. HScherberger{at}dpz.eu