Abstract
Animals adapt action-selection policies when the relationship between possible actions and associated outcomes changes. Prefrontal cortical neurons vary their discharge patterns depending on action choice and rewards received and undoubtedly play a pivotal role in maintaining and adapting action policies. Here, we recorded neurons from the medial precentral subregion of mouse prefrontal cortex to examine neural substrates of goal-directed behavior. Discharge patterns were recorded after animals developed stable action-selection policies, wherein four possible action sequences were invariably related to different reward magnitudes and during adaptation to changes in the action–reward contingencies. During the adaptation period, when the same action sequence resulted in different reward magnitudes, many neurons (38%) exhibited significantly different discharge patterns for identical action sequences, well before reaching the reward site. In addition, trial-to-trial reliability of ensemble pattern production leading up to reward was found to vary both positively and negatively with increases and decreases in reward magnitude, respectively. Pairwise analyses of simultaneously recorded neurons revealed that decreased reliability in part reflected fluctuations between different ensemble activity patterns as opposed to within-pattern variability. Increases in reliability were related to an increased probability of both selecting highly rewarding actions and completing such actions without pause or reversal, whereas decreases in reliability were associated with the opposite pattern. Thus, we suggest that both the spatiotemporal pattern and fidelity of prefrontal cortical discharge are impacted by action–outcome relationships and that each of these features serve to adapt action choices and maintain behaviors leading to reward.
Introduction
Prefrontal cortical neurons integrate inputs from widespread brain regions to produce firing patterns discriminating many features of goal-directed behaviors. For example, prefrontal neurons signal reward expectancy, the ordering of actions, action selection, reward delivery, and sensory cue presentation (Jung et al., 1998; Schoenbaum et al., 1998; Shima and Tanji, 1998; Procyk et al., 2000; Pratt and Mizumori, 2001; Shidara and Richmond, 2002; Matsumoto et al., 2003). Furthermore, prefrontal discharge patterns are suggestive of both mnemonic and planning functions in that they reflect the recent history of sensory cues or motor actions and predict behavioral choices (Fuster and Alexander, 1971; Kubota and Niki, 1971; Funahashi et al., 1989; Miller et al., 1996; Baeg et al., 2003). Thus, through efferents to basal ganglia and motor cortex, prefrontal ensembles appear well suited to organize goal-directed action sequences (Sesack et al., 1989; Gu et al., 1999; Middleton and Strick, 2002; Dum and Strick, 2005).
With respect to goal-directed behavior, an outstanding question is how the outcome of an action sequence effects changes in prefrontal firing patterns such that action selection probability yields maximal reward. Firing pattern alterations may take the form not only of changes in discharge rate but also trial-to-trial discharge reliability. In motor cortex, increased discharge reliability complements rate change as a key component of motor skill learning (Kargo and Nitz, 2004), and variability in premotor cortex activity negatively correlates with response preparedness (Churchland et al., 2006). Thus, it may be crucial to consider the reliability with which prefrontal single-unit activity is generated from trial to trial. Furthermore, when change in the discharge variability of a single neuron is observed, one should also consider whether its source derives from group-dependent fluctuations between distinct activity patterns at the level of neuronal ensembles or is group independent. The latter possibility would suggest a stochastic process intrinsic to the individual neuron.
We addressed each issue by analyzing the firing properties of mouse medial precentral (MPC) neurons during performance of a navigational task. This subregion of prefrontal cortex is unique in bearing efferents to motor cortex while obtaining afferents from sensory and parietal regions of cortex and ventral tegmental dopamine neurons (Reep et al., 1984; Sesack et al., 1989; Conde et al., 1995). From a common start site, mice were tracked while traversing any of four equal-length paths leading to goal sites. Reward amounts delivered at goal sites were systematically varied.
Even for identical action sequences, expected reward magnitude altered MPC discharge rates. In addition, trial-to-trial rate variability was flexibly related to expected reward magnitude; action sequences leading to large rewards were associated with greater discharge reliability. Increased reliability reflected a decreased probability of ensemble pattern fluctuation as opposed to variability intrinsic to individual neurons. Higher discharge variability along a given path was related to both a lower probability of selecting that path and a higher probability of reversing course. The findings indicate that reward expectation drives changes in MPC activity capable of increasing the probability of selecting and maintaining execution of serial actions leading to a goal.
Materials and Methods
Presurgical training.
Male mice (C57 white strain; n = 8) were trained on a double T-maze task (Fig. 1A). Mice were food restricted for 3 d before training so that body weights on the first day of training were 90–95% of the pretraining weight. Chocolate food pellets (20 mg; Bio-Serv, Frenchtown, NJ) were introduced to the mice during this period. On training days, mice were placed at the start site on the maze. Mice entered the maze and were faced with a decision to turn left or right at the first T-junction (referred to as turn A in Fig. 1A) and then were faced with another decision at a second T-junction (turn B or C). Mice were not allowed to back up or reverse directions once a decision had clearly been made. Depending on the action sequence performed (i.e., the series of left–right turn choices made), mice received a large reward (three chocolate pellets, designated HR for high reward), a small reward (one pellet, designated SR), or no reward (NR). By blocking the path leading to the goal site, mice were then forced to return to the start site by paths along the maze perimeter. Mice were not forced to stop at the start site and no external cues were provided to mark the onset of a new lap. Thus, the task had a continuous structure. Mice were initially trained on a configuration in which right–right (R–R) action sequences resulted in HR, left–right sequences (L–R) resulted in SR, and the other action sequences [left–left and right–left (L–L and R–L)] resulted in NR. Mice were trained for a maximum of 30 min/d or until 20 good trials were performed for 7 consecutive days. Training was performed in the light, between 11:00 A.M. and 3:00 P.M. The criteria for good trials differed between days 1–3 and subsequent days. On days 1–3, a good trial was one in which the animal ran from the start site to the end zone without pausing for >3 s at any point (measured manually) and with no reversals. Thereafter, a good trial was one in which the animal made no obvious pause or reversal.
Surgery.
Trained mice were anesthetized with isoflurane on the day of surgery (n = 8). Mice were placed in a stereotaxic instrument, and isoflurane levels were maintained throughout the surgery based on the breathing rate and responsiveness to foot pinch. The skull was exposed, fascia was removed, and a small hole was drilled 2.0 mm rostral and 1.00 mm lateral to bregma on the right side. Approximately eight miniature screws (Small Parts, Miami Lakes, FL) were inserted into the skull around its perimeter. One screw was placed over left forelimb motor cortex (0.5 mm rostral and 1.50 mm lateral to bregma) and was connected to serve as a ground wire. A small portion of the exposed dura was removed. An array of either four or six stereotrode wires was placed stereotaxically through the hole, 1.00 mm deep into the right precentral region of the prefrontal cortex. Agarose gel was placed over the hole to cover the exposed brain. Each stereotrode consisted of two twisted tungsten wires (Kargo and Nitz, 2003, 2004). The stereotrode wires and ground wire were soldered into miniature connectors (Omnetics, Minneapolis, MN). The entire exposed complex consisting of the array connector and screws was covered with dental cement leaving only the female portion of the connector exposed. Animals were given antibiotic (Bacitril) intraperitoneally and 1.0 ml of saline subcutaneously and allowed 4 d to recover.
Postsurgical training (stable choice vs reversal learning).
After recovering from surgery, mice were trained for an additional 5 d on the same reward configuration. Mice were trained twice daily and ran for 30 min or for 20 good trials per session. The first session was run in the morning between 9:00 A.M. and 10:00 A.M., and the second session was run in the late afternoon, between 4:00 and 5:00 P.M. On the sixth day of postsurgical training, reversal learning was assessed on the task. The new HR sequence was L–L, the SR sequence was R–L, and the two NR sequences were R–R and L–R. Mice were trained with this new reward configuration for 3 d.
Data acquisition.
On each postsurgical training session, a tethered wire system was plugged into the array connector on the head. The tethered wires had preamplifiers (NBLabs, Denison, TX) to increase the signal-to-noise ratio of the stereotrode recordings. Voltage signals were led to an amplifier deck (Neuralynx, Tucson, AZ). Amplified voltage signals from individual stereotrode wires were bandpass filtered (0.6–6 kHz), and amplitude thresholds for triggering data collection were individually set for each wire. An amplitude-crossing event on one wire triggered data collection from the stereotrode pair. Data were stored at 32 kHz on hard disk using custom-written software (Matt Wilson and Loren Frank, Massachusetts Institute of Technology, Cambridge, MA). An array of light-emitting diodes (LEDs) was also present at the end of the tether system and was used to track animal position. A tracking system (DragonTracker, Boulder, CO) detected the LEDs and sent two-dimensional position data to hard disk. The positioning of the camera resulted in a pixel/distance ratio of ∼1 pixel/cm. Position data were collected at 60 Hz.
Spike discrimination.
Single units were discriminated off-line and primarily based on the relative amplitudes of spikes recorded on the stereotrode wires (a full description of our cluster cutting methodology can be found in the work of Kargo and Nitz, 2004). Briefly, the cluster cutting method involves extraction of a set of spike waveform parameters for each spike from the two stereotrode wires and the separation of units on the basis of these parameters using interactive graphics software (Matt Wilson, Massachusetts Institute of Technology). Different combinations of parameter pairs were projected as two-dimensional scatter plots. When this was performed, points derived from single cells formed recognizable clusters (see examples of Kargo and Nitz, 2004). The spikes within a cluster were enclosed in a polygon drawn using the computer mouse. The data points were then projected into new two-dimensional plots in which the earlier partitions of the data were preserved by color coding the points lying within the polygon boundaries. This process was performed until a multidimensional set of boundaries was established that provided the subjectively best separation of spike waveform clusters. The times of spikes in each cluster (i.e., for each neuron) were exported to Matlab (MathWorks, Natick, MA) for analysis. For 88% of defined clusters, there was little or no overlap between cluster waveforms and waveforms not defined in clusters (i.e., non-isolable multiunit activity).
For spike waveforms isolated as single units, mean peak amplitude was taken as the voltage difference between the spike origin and peak. These values were compared with a “noise” measure calculated as the mean amplitude of the remaining spike waveforms (i.e., those for which single-unit isolation was not possible based on various waveform parameters). Note that this measure of noise is much larger than background voltage fluctuations observed in the absence of obvious spiking activity. The mean ± SD ratio of peak amplitude to noise was 3.37 ± 1.43. Mean ± SD peak amplitude was 115.9 ± 58.1 μV.
Data analysis.
LED position data were imported into Matlab. We first identified continuous and noncontinuous runs. We verified that animals did not pause or make path reversals or errors on each examined lap. There were two criteria for continuous runs: no obvious pause and no path reversal. Pauses were assessed by whether successive data points clustered at the same position (±2 pixels) for more than three consecutive points or >180 ms. Path reversals were determined subjectively by examining the time series of positions of the animal in Matlab. We determined the speed of the animal for each good lap (speed = sqrt [(xt + 1 − xt) + (yt + 1 − yt)]/0.060 s). We also determined the time points at which the animal first crossed (virtual) lines positioned at the start (marked S in Fig. 1A), turns A and B (or alternatively A and C depending on the choice made at A), and at the entrance to the reward zones (HR, SR, and NR) in which reward pellets, if appropriate, were delivered.
We quantified positional deviations from an idealized path based on the method described by Schmitzer-Torbert and Redish (2004). Briefly, idealized straight-line paths were constructed from the start to turn A, from turn A to turn B or C, and from turns B or C to the end zone crossing lines. We calculated the minimum distance of the animal's position from the idealized paths at each time point and the mean distance for each lap.
Cell firing rates were normalized using a technique similar to Daw et al. (2003). We needed to address the problem of visualizing and analyzing data during a self-paced task in which the durations of navigational epochs varied somewhat from trial to trial [Fig. 2B; for example, the duration between S and A, between A and B (or A and C), and between B or C and the end zone crossing line]. For all continuous runs, we computed the firing rate of cells from −3.0 s relative to crossing the start point (S) to +3.0 relative to crossing the end zone lines. We used a technique that calculates instantaneous firing rates using partial binning (Schwartz and Adams, 1995; Kargo and Nitz, 2003, 2004). We used bin sizes of 200 ms. We computed the mean duration of each interval for a training session. Five navigational intervals were defined: (1) from −3.0 s before S to S; (2) from S to A; (3) from A to B (or A to C); (4) from B or C to E; and (5) from the reward line (R) to +3.0 s after R. Intervals 1 and 5 were constant for each trial, whereas intervals 2–4 varied. Time-varying firing rates within an interval on a single trial were then resampled (i.e., stretched or compressed) to match the mean length of the interval (Fig. 2B, middle). Only continuous runs with no pauses were counted as good trials, so interval durations were fairly consistent. Differences in trial durations are exaggerated in the top panels of Figure 2B to increase the clarity of method presentation. Nevertheless, we compared interval durations (and duration variability) for different action sequences (e.g., HR vs SR) as well as mean firing rates within non-normalized intervals using standard t tests.
For each neuron and for each time bin, paired t tests were used to compare mean firing rates between (1) HR and SR trials on stable choice days, (2) HR trials during early and late adaptation periods on reversal days, and (3) NR trials during early and late adaptation periods on reversal days. Early and late adaptation periods were arbitrarily determined as the first 10 HR (or NR) and last 10 HR (or NR) trials on reversal days.
As a control for the possibility that our dataset could be biased toward the detection of a greater number of significant t values than expected based on our probability threshold (0.05), we also calculated t values based on randomized data. In this procedure, data from HR and SR runs (as well as early and late runs) were randomly interleaved before determining the number of neurons exhibiting significant differences at any particular bin. This procedure was repeated 100 times to generate, for each bin, the mean number of neurons expected to exhibit significant differences based on chance alone.
We also examined the variability of firing patterns on both stable choice days and reversal learning days. We calculated the firing rate variance (σ2) and mean firing rate (μ) of each neuron during each training session. We found a log–log relationship between σ2 and μ, i.e., log(σ2) = k × log(μ) + b, where k = 1.30 and b = 0.21 (r2 = 0.78; p < 0.001). Thus, when the mean firing rate of a cell was high, the trial-to-trial variance was correspondingly larger compared with cells that fired at lower rates. We examined this relationship because we were interested in testing whether the coefficient of firing rate variance [CV, or SD/mean, a measure similar to the Fano factor (Baddeley et al., 1997)] was increased during one action sequence compared with a different one, e.g., HR versus SR. The mean and SD of CVs across all cells were determined for each time bin. Paired t tests were then used to compare CVs on (1) HR and SR trials on stable choice days, (2) HR trials during early and late adaptation periods on reversal days, and (3) NR trials during early and late adaptation periods on reversal days. Because the number of HR and SR path traversals differed on stable choice days, we performed a control analysis to rule out the possibility that differences in the HR and SR CVs were simply a function of increased sampling of HR runs. To do this, we randomly subsampled the HR runs a total of 50 times. For each bin, the mean CV for HR runs was then determined.
Histological analysis.
Animals were anesthetized intraperitoneally with Nembutal (70 mg/kg) and perfused intracardially with 50 ml of saline and then with 150 ml of 4% paraformaldehyde. Brains were removed and postfixed in 4% paraformaldehyde for 2–3 d. Brains were frozen and cryosectioned into 50 μm slices. We examined successive frontal cortical slices for depth of electrode tracks, mediolateral location, and extent of cortical damage. All brains exhibited minimal cortical damage. Figure 2A shows a histological section from one animal as well as, in schematic form, the area sampled in the eight mice. Recording sites overlapped considerably and so are presented as a single region (black circle) that corresponds to the medial precentral region of the prefrontal cortex but overlaps slightly with the anterior cingulate.
Results
Goal-directed selection of action sequences
Mice developed action policies that reflected knowledge of the action–outcome contingencies of the navigational task described in Figure 1A. On a given trial, mice traversed one of four differentially rewarded paths determined by whether they made left or right turns at two different T-junctions. The probabilities of selecting any of the four possible action sequences (i.e., paths) were computed within overlapping 10-trial segments across days (e.g., for trials 1–10, 2–11, … 121–140; n = 8) (Fig. 1B). On days 1–3, probabilities for selection of all four paths were near chance (25%). The probability of selecting the HR path thereafter increased up to day 6 and remained stable across days 6–7 (trial blocks 115–139). Over this period, mice selected the HR sequence at 53 ± 9% (mean ± SD), the SR sequence at 27 ± 11%, and the two NR sequences at 12 ± 4 and 8 ± 4%.
Electrode implantation did not affect choice. Animals were implanted with a stereotrode array on day 8, and training was resumed on day 13 after 5 d of rest. The set of path selection probabilities on days 13–17 was not significantly different from that before surgery (mean ± SD; HR, 55 ± 9%; SR, 26 ± 8%; NR, 12 ± 5 and 7 ± 4%). The mean trial duration, or the time from crossing the start line to crossing into the end zone, returned to presurgical levels by day 15 (mean ± SD duration for days 6–7, 3.1 ± 0.8 s; mean ± SD duration for days 15–17, 3.3 ± 1.0 s).
Characterization of single-unit recording sites
Histological analysis revealed that stereotrode tips in all mice were found within the MPC region of the prefrontal cortex. A representative track left by the stereotrode array in one animal is depicted in Figure 2A (top). In the bottom plot, the black circle comprises the area within which the arrays in all animals were found.
Differential neural activity and action preparation
We sought to determine how reward magnitude might affect the development of neural activity patterns that discriminate action sequences such as the right–right and left–right turn sequences that define selection of the HR and SR paths, respectively. To do this, we first analyzed the discharge rates of MPC neurons as a function of the animal's position along HR versus SR paths. Recordings were obtained from 110 MPC neurons during days 16 and 17 (animal m2, 19 neurons; m3, 14 neurons; m4, 9 neurons; m5, 5 neurons; m6, 16 neurons; m7, 13 neurons; m8, 23 neurons; m9, 11 neurons). Of these neurons, 93 met a test of minimal responsiveness during the task, i.e., exhibited a mean firing rate >3 Hz at some point within the time period spanning from 2 s before crossing the start line to 2 s after crossing the end zone.
Across all neurons, peak discharge rates were distributed evenly over all epochs of task performance. Figure 3A shows the mean firing pattern of each of the 93 cells during execution of the HR (i.e., right turn–right turn) and SR (i.e., left turn–right turn) action sequences. Each row represents the firing rate of an individual neuron across the time-normalized space between the start and goal positions (for details of the normalization procedure, see Fig. 2B and Materials and Methods). For purposes of display only (Fig. 3A), discharge rate was normalized to a maximum of 1.0, with deep red and darkest blue representing, respectively, maximal and zero activity. For the sake of comparison, neurons in both the HR (top image) and SR (bottom image) paths are arranged from top to bottom based on when the peak firing rate occurred during the HR sequence. By using the same ordering for both conditions, it is possible to discern how activity is distributed across task performance as well as differences in discharge patterns across HR and SR paths. Vertical white lines represent the times in which the animals crossed, respectively, the start line, first T-junction (turn A), second T-junction (turn C), and the reward line (positioned ∼3 cm short of the actual site of reward in the end zone). The preferred response times of the population of cells tiled evenly across the task timeline. Because of the rule by which neurons were ordered, this is most easily observed for the HR data but was also true for SR data when ordering was based on peak firing times during SR path traversals (data not shown). The sequential tiling was not a result of pooling data from different animals (supplemental Fig. 1, available at www.jneurosci.org as supplemental material).
Consistent with the idea that MPC neurons play a role in selecting particular action sequences, the activity of most MPC neurons discriminated the HR and SR paths. Figure 3B depicts the activity of two such neurons. Mean firing rate for the neuron depicted in the top plot peaked near the time at which the animal reached the goal site of both HR and SR path traversals despite the fact that these goal sites were positioned at opposite corners of the maze. Nevertheless, peak activity was twice as great on arrival at the goal site corresponding to the SR path. In the bottom plot, a second neuron exhibited peak discharge just after turn B of the HR path. In contrast, discharge rate was relatively unchanged across the space of the SR path.
For each neuron, we quantified the points along the path at which firing rates were significantly different between HR and SR actions. This analysis is summarized in Figure 3C in which each gray dot represents a time point for which the mean firing rate of an individual neuron, for all HR versus SR path traversals, differed at p < 0.05. A total of 55 of 93 neurons (59.1%) exhibited significant changes in discharge rate at some point between the start and goal positions. The superimposed blue line represents the percentage of neurons that had significantly different firing rates at the corresponding time.
For the HR and SR path segments between turn A and the goal sites, differences in firing rate were expected given that traversal of these segments is associated with different spatial positions, fields of view, and directions of motion. However, ∼10% of all neurons exhibited discharge rate differences within the path segment bounded by the start line and turn A. This segment is common to both paths. In total, 17.2% of all neurons exhibited firing rate differences at some point within this path segment. At all points, the number of neurons firing differentially for the HR and SR paths was higher than that expected by chance. Chance levels were determined from control analyses in which individual HR and SR path traversals were randomly interspersed (see Materials and Methods). Using this bootstrapping procedure, the percentage of neurons expected to differ by chance varied slightly around a mean of 4.4% (full and dotted red lines of Fig. 3C represent, respectively, the mean and 95% confidence intervals of the percentage of neurons expected to differ by chance).
Changes in discharge pattern exhibited by individual neurons could not be explained by the overall firing rate of the MPC neuron population, which was relatively constant across the full length of both HR and SR paths (Fig. 3D, left). Similarly, changes in discharge patterns could not be explained by differences in locomotor variables accompanying HR and SR path traversals. Both movement velocity and variability in movement velocity differed minimally between HR and SR path traversals (Fig. 3D, right). Finally, the stereotypy with which mice traversed the HR and SR paths was similar. There was no significant difference in the degree to which mice deviated from the idealized HR and SR paths (positional variability for the HR and SR paths, respectively, was 3.49 ± 2.28 and 3.60 ± 2.38 cm2, mean ± SD; p = 0.4).
From the foregoing analysis, it is clear the MPC neurons discriminate action sequences, but it is difficult to know whether these differences in activity are related to different reward expectations or to different planned or prepared movements regardless of the expected outcome. Given that prefrontal cortical neurons can exhibit activity reflective of both preceding and ensuing navigational behaviors (Baeg et al., 2003), this complication even applies to the pre-start and start–turn A task epochs in which maze position and behavior for HR and SR path traversals were most alike. Therefore, to address this, we examined neural activity during changes in action–outcome contingencies on day 18. If motor actions are the same (e.g., head velocity and position) for a particular action sequence, then changes in neural activity should reflect primarily the changes in reward expectation.
Action–outcome contingencies were altered by changing the amount of reward given at the end of the four possible paths. A total of 40–50 trials were run to allow sufficient time for adaptation. Mice adapted action policies quickly in response to contingency changes (Fig. 1B). The probabilities for selection of each of the four paths on the last trial block of the adaptation session were as follows (mean ± SD): HR sequence, 62 ± 11%; LR sequence, 20 ± 8%; NR sequences, 14 ± 4% (previous HR sequence) and 4 ± 4% (previous LR sequence).
Even for the same action sequence covering the same environmental spaces, changes in reward expectation drove alterations in the positional firing patterns of MPC neurons. Neuronal activation patterns were compared between the first 10 (early) and last 10 (late) traversals after the reward contingency change (i.e., after making one of the former NR paths into the new HR path). Of 85 neurons that were recorded on day 18, 76 met a test of minimal responsiveness during the task. For 29 of these neurons (38.2%), a significant change in discharge rate was observed at some point between the start and goal positions. Figure 4A shows the mean firing patterns of these cells during the first 10 (top) and last 10 (bottom) HR trials. Neurons are arranged from top to bottom in both panels based on when the preferred response times occurred during the last 10 HR trials. As for the presentation of HR and SR data on stable-choice days (Fig. 3A), maintaining the same ordering for presentation of early and late HR data permits visualization of the changes in activity patterns occurring across these time periods. The preferred response times of many cells were changed as evident by the smooth tiling of responsiveness across the timeline of the late HR trials compared with the early ones.
Figure 4B depicts the activity patterns of two neurons that exhibited statistically significant changes in firing rate between the early and late running periods on the new HR path. In the top plot, a discharge increase found just after turn B is greatest during the final 10 trials when a strong bias to the new HR path had developed. In the bottom plot, a peak in discharge rate for a second neuron is found between turn B and the goal site for the first 10 traversals of the new HR path. As the animal developed a bias for this path, discharge rate in this same region of the maze dropped to near zero.
We quantified such changes in discharge rate by determining the points in time across the HR path for which firing rates were significantly different between early and late traversals. These times are shown in Figure 4C in which gray dots represent time points for which firing rates differed at p < 0.05. The superimposed blue line represents the percentage of cells that had significantly different firing rates at the corresponding time. The full and dotted red lines indicate, respectively, the mean and 95% confidence interval for the number of neurons expected to differ by chance based on control analyses in which early and late path traversals were randomly interspersed (see Materials and Methods). At any given point along the path, a greater number of neurons than expected by chance were found to exhibit significant rate changes. Moreover, all of these changes occurred in the absence of differences in the overall firing rate of the neuronal population as well as movement velocity and variability in movement trajectory (Fig. 4D).
Differences in pre-start activity during the adaptation period were related directly to differences in the probability of selecting the new HR sequence. Figure 5A shows the mean positional firing rates of a neuron during early (gray) and late (black) HR runs. The amplitudes of firing rate increases before both the start and reward lines significantly increased over the course of the adaptation period. In Figure 5B, black dots represent the peak firing rate of the pre-start burst calculated in partially overlapping seven-trial blocks, (e.g., for trials 1–7, 4–10, 7–13; r2 = 0.73, p < 0.001). The probability of selecting the new HR sequence increased over time in parallel with the increased firing rate of this cell (gray squares represent the probability calculated in the seven-trial blocks; r2 = 0.62, p < 0.005).
Coefficient of firing rate variance and the maintenance of trajectories toward goals
A critical aspect of action sequence selection concerns the trial-to-trial reliability of neural activation patterns that discriminate different action sequences. If the reliability with which an action sequence-specific pattern of activity is produced is low, then selection may be more stochastic. Here, we examine the reliability of these patterns by comparing the variances in trial-to-trial firing rate patterns for HR and SR path traversals and early versus late HR and NR path traversals on the adaptation day (day 18 in Fig. 1B).
First, we examined the extent to which MPC neurons exhibited trial-to-trial variability about the mean activation pattern for both HR and SR action sequences. As has been observed previously for primary motor cortical neurons (Kargo and Nitz, 2004), we obtained a log–log relationship between the means and variances of discharge rate for MPC neurons (Fig. 6A) (variance ≈ k × mean rate; k = 1.3, r2 = 0.78, n = 93 cells). Thus, when the mean firing rate of a cell was high, the trial-to-trial variance was correspondingly larger compared with cells that fired at lower rates. As a result, we were able to use the coefficient of firing rate variance (CV, or SD/mean) to determine whether discharge reliability across a population of neurons differed between HR and SR path traversals.
CVs were improved during HR sequences compared with SR sequences (Fig. 6B, black, HR runs; gray, SR runs). For the purposes of display, we plot 1/CV such that increases in the reliability with which particular firing rates occur across all path positions correspond to greater y-axis values. Significant increases in reliability were evident before crossing the start line, during the approach to the reward, and at the reward site (black horizontal bars represent time points in which CV differences were significant at p < 0.05; mean CVs for HR and SR were significantly different at p < 0.01) (Fig. 6C). Similar results were obtained when the number of laps used to calculate CVs were matched for the HR and SR paths (Fig. 6B, dotted line). In addition, control analyses presented in supplemental Figure 3 (available at www.jneurosci.org as supplemental material) demonstrate that increased reliability for HR traversals was dependent on neither the increased amount of within-day experience on this path nor the tendency for the animal to make consecutive traversals along this path. Finally, this result does not reflect corresponding differences in mean firing rates, movement velocity, variability in movement velocity (Fig. 3D), or positional variability.
Increased reliability in MPC discharge, which occurred during HR sequences, was associated with fewer navigational errors. We quantified two types of errors: (1) HR errors and (2) SR errors. HR errors included the following: (1) an incorrect decision at the second T-junction where the animal made an R–L sequence leading to an unrewarded (NR) end zone; (2) path reversal errors at the second T-junction where the animal incorrectly went R–L and then reversed directions to go R–R to the HR end zone (for how path reversal errors were determined, see Materials and Methods); and (3) path reversal errors at the first T-junction where the animal went right but aborted the potential high-reward run (R–R) and reversed directions to go left. SR errors were the opposite: (1) an incorrect decision at the second T-junction where the animal made an L–L sequence leading to an NR end zone; (2) path reversal errors at the second T-junction where the animal incorrectly went left and then reversed directions to go right to the SR end zone; and (3) path reversal errors at the first T-junction where the animal went left en route to the low reward but then reversed directions to go right. We determined the number of path errors for all eight animals on days 16 and 17. The number of SR errors was more than triple the number of HR errors (significant difference at p < 0.01) (Fig. 6C).
Trial-to-trial variability levels might still be related to subtle motor or sensory differences between HR and SR runs. To address this, we examined CVs during the adaptation period on day 18. CVs were compared for early versus late traversals to the new (previously unrewarded) HR path and for early versus late traversals of an unrewarded (NR) path that had previously been the HR path.
During the adaptation period, we observed significant increases in reliability across the new HR path. These increases occurred from 2.0 s before crossing the start line, at each T-junction, during approach to the reward, and at the reward site (Fig. 6D, full gray lines, early HR; full black lines, late HR; horizontal black bars represent time points in which CV differences were significant at p < 0.05). The opposite effect occurred for the now unrewarded (or NR) action sequence that previously led to the HR end zone, namely a worsening of CVs during adaptation. Discharge reliability decreased for the last 10 NR trials compared with the first 10 NR trials (Fig. 6D, dashed gray lines, early NR; dashed black lines, late NR; horizontal gray bars represent time points in which CV differences were significant at p < 0.05). Figure 6E depicts mean 1/CV values for all cells across all path positions. 1/CV values for the new NR path decreased significantly with experience, whereas those for the new HR path increased (*p < 0.001). As was the case for changes in the firing patterns of individual neurons, neither of these differences can be accounted for by corresponding alterations in the mean firing rate of the MPC neuron population (Fig. 4D), the movement velocity of the animals (Fig. 4D), or the degree of behavioral stereotypy across a path trajectory.
In summary, CVs associated with different paths to potential goal sites were closely related to the probability of choosing those paths. The development of biases toward paths leading to greater reward was paralleled by decreases in the variability of discharge patterns, whereas the avoidance of paths leading to no reward was associated with increased variability.
Characterization of rate variability components
The foregoing results demonstrate that discharge pattern variability is sensitive to reward contingencies and predictive of action selection. However, the increased discharge pattern variability observed during traversals of SR and NR paths could arise from more than one source. Firing rate variance could reflect intrinsic variability in the integration of afferent inputs by individual neurons or, alternatively, could reflect fluctuations in the activity patterns, or representations, generated by populations of functionally connected neurons. At one extreme, deviations from the mean lap-to-lap firing rates for neurons recorded simultaneously could arise in an independent manner. At the other extreme, such deviations could be strongly correlated such that, on individual trials (i.e., laps), an increase in firing rate for one neuron relative to its mean is associated with a proportional increase, as opposed to a decrease, for another neuron relative to its own mean (Zohary et al., 1994; Lee et al., 1998). Such relationships between rate fluctuations relative to the mean are termed “noise correlations” and may be positive or negative in sign depending on whether the two neurons in question maintain positive or negative correlations of their mean firing rates across a series of stimuli [also known as “signal correlations” (for an excellent review of this subject, see Averbeck et al., 2006)]. To determine the relative strength of these two variability sources in our data, we calculated correlations between deviations from mean rates for all simultaneously recorded neuron pairs (n = 222).
For the present set of analyses, signal correlations are simply the correlations between mean firing rates of two neurons across all positions (bins) and traversals of a particular path. To illustrate, firing rate data for two neurons exhibiting a strong, positive signal correlation are depicted in the first two graphs of Figure 7A. Here, firing rates across the 40 positional bins of 10 different traversals of the SR pathway are laid end to end (dark blue trace; dashed vertical lines mark the end of each trial). For comparison, the mean firing rate calculated for each positional bin across all 10 traversals is repeated (light blue trace). Trial-specific (red trace) and mean (pink trace) firing rates for another neuron are given in the middle graph. The correlation between the light blue and pink traces was 0.75. The bottom graph depicts, for each position and for each traversal, the deviations from mean firing rate exhibited by each neuron. The correlation between rate deviations (i.e., the noise correlation) was 0.67. Thus, deviations from mean firing rate for the two neurons were highly correlated in both sign and degree, which suggests a common source of firing rate variability.
Negative signal and noise correlations were also observed for both HR and SR paths. An example of one is given in Figure 7B. The peaks of activity for HR path traversals were offset for the two neurons depicted. Mean activity for the second neuron (middle trace) peaked at the end of runs, whereas a broader distribution of activity across the first two-thirds of the path was observed for the first (top trace). As a result, the signal correlation between these neurons was negative (r = −0.37). A significant negative correlation was also observed for the fluctuations of firing rate for each neuron around their respective means. Positive deviations from mean rate for the first neuron tended to be associated with simultaneous negative deviations for the second (Fig. 7B, arrow). Overall, the noise correlation for this pair of neurons was −0.22.
Across all neuron pairs, signal and noise correlations were strongly related. Plotted in Figure 7C are the signal (“rate r”) versus noise (“dev. r”) correlations for all neuron pairs during traversals of the HR path (left) and SR path (right). In each case, the correlation of these correlations was significant, indicating a robust relationship between the signs and strengths of signal and noise correlations for many neuron pairs.
Next we compared separately the average signal and noise correlations for the same neuron pairs during traversal of the HR and SR paths. Depicted in Figure 7D are the mean absolute values for signal (“rate”) and noise (“dev.”) correlations of all neuron pairs for the HR (gray) and SR (black) paths (error bars indicate SD). Overall, mean noise correlations were somewhat greater than those reported in previous experiments (Zohary et al., 1994; Lee et al., 1998),al though they still account for only a fraction of the total variability. Although mean signal correlations were not different for the two paths, noise correlations were significantly higher across SR path traversals (n = 222; *p < 0.001). Thus, the higher total variance associated with SR compared with HR path traversals may result not from increases in group-independent noise (i.e., random, uncorrelated rate fluctuations) but rather from increases in the tendency of groups of neurons to fluctuate in concert. Such group-dependent firing rate deviations may be interpreted not as noise per se but instead as fluctuations in the firing patterns developed among groups of functionally related neurons.
The same analysis was applied to data collected on day 18 when the reward contingencies associated with paths were altered. Correlations between signal and noise correlations for both early and late traversals to the new HR and new NR remained strongly positive (early NR, 0.77; late NR, 0.82; early HR, 0.60; late HR, 0.56). However, no robust differences in either signal or noise correlations were observed between the early and late traversals of the new-HR and new-NR paths.
Discussion
The present findings indicate that (1) MPC neurons exhibit task-related firing patterns depending strongly on expected reward magnitude and (2) the probability of selecting and maintaining execution of action sequences is reflected in the reliability of MPC ensemble pattern production. Together, the findings indicate that MPC neurons generate rapidly adaptable activity patterns capable of biasing behavioral decisions toward attainment of greater reward. Such a role for the MPC is consistent with its projections to motor cortices (Sesack et al., 1989; Gu et al., 1999) and data demonstrating that prefrontal lesions, including MPC, produce response selection deficits at specific spatial locations (Kesner et al., 1996).
Several features of the task and data enabled the patterning and reliability of MPC activity to be related to reward magnitude. Discharge was correlated to reward attainment but also to actions leading to reward (Jung et al., 1998; Chang et al., 2000; Pratt and Mizumori, 2001; Averbeck et al., 2002; Matsumoto et al., 2003) and was, in some cases, sensitive to the ordering of previous and upcoming behaviors (supplemental Fig. 2, available at www.jneurosci.org as supplemental material) (Fuster and Alexander, 1971; Kubota and Niki, 1971; Funahashi et al., 1989; Miller et al., 1996; Baeg et al., 2003). Path selection probabilities were biased toward receipt of the highest reward, but mice also traversed SR paths (Fig. 1B). That is, behavioral choices obeyed the “matching” law (Hernstein, 1961). Because many SR path traversals were uninterrupted, MPC activity could be compared under conditions in which incentive varied but locomotor behavior did not (Fig. 3D).
MPC discharge patterns differentiated HR, SR, and NR action sequences in two distinct ways. First, most neurons exhibited significantly different firing rates at some point between the start and goal positions of different paths (Fig. 3C). Second, discharge reliability was higher across traversals of high-reward paths (Fig. 6B). By altering the reward magnitudes associated with each path, differences in MPC activity patterns were found to be driven solely by changes in reward magnitude as opposed to movement direction or spatial position. When only the reward value of a path changed, a high percentage (38.2%) of neurons still exhibited firing rate changes (Fig. 4C). Thus, the mapping of MPC activity to all action sequence epochs is adaptable to the context of expected reward magnitude.
Remarkably, altering the spatial distribution of reward also revealed that variability in ensemble patterns varies both positively and negatively with reward magnitude. Mean firing rates across all MPC neurons did not differ between paths. Thus, alterations in CVs were attributable to changes in discharge reliability. Discharge reliability increased for paths in which the associated reward magnitude increased and decreased for paths in which it decreased (Fig. 6D). Notably, reliability decreases occurred on a path for which animals had extensive experience. Thus, experience can, under these circumstances, either enhance or depress reliability (or “fidelity”) within MPC activity patterns.
The reliability of pattern production was flexibly related to path selection probabilities. Variability was lowest for high-reward paths (Fig. 6C), decreased as animals learned to select a new high-reward path (Fig. 6D), and increased with learned avoidance of a newly unrewarded path (Fig. 6D). Furthermore, the more reliably observed patterns of HR paths were associated with a reduced tendency for the animal to reverse course (i.e., to “change its mind”).
The latter finding calls into question whether deviations from mean rate solely reflect stochastic processes within individual neurons or are indicative of fluctuations in ensemble activity patterns among functionally connected neurons. To address this question, we compared correlations between rate deviations among simultaneously recorded neuron pairs. Correlations between mean rate deviations (i.e., noise correlations) depended on the strength and sign of rate correlations (i.e., signal correlations) (Fig. 7C). Neuron pairs with positive signal correlations (i.e., similar activity patterns across path space) yielded positive noise correlations. Similarly, negative signal correlations were associated with negative noise correlations (Fig. 7B). An explanation for this finding is that neurons with similar response profiles are members of a functional group within which the activity of individual members enhances that of others. Neurons outside the group are, at the same time, inhibited.
The relationship between signal and noise correlations held for both HR and SR paths (Fig. 7C), yet noise correlations were higher during traversals of the SR path (Fig. 7D). Mean signal correlations did not differ. This suggests that SR path traversal was associated with a greater propensity for MPC ensembles to fluctuate between distinct activity patterns. Such fluctuations would increase the total variance when summed with variance sources attributable to behavioral variability and group-independent error in the responsiveness of individual neurons to their afferents. Thus, a portion of the increased total variance on the SR derives not from group-independent noise but instead from fluctuations in firing patterns generated by groups of neurons acting in concert. The lower rate variability found for HR path traversals appears to be dominated more by group-independent rate fluctuations. Although speculative, one interpretation of this result is that the higher group-dependent rate variance for SR path traversals is directly related to the increased number of reversal errors associated with this path (Fig. 6C).
The mechanisms by which MPC activity patterns are altered by reward magnitude are unknown. One possibility is that dopaminergic neurons signaling changes in expected reward (Fiorillo et al., 2003; Morris et al., 2004; Bayer and Glimcher, 2005) force activity alterations despite unchanging sensory inputs and motor outputs. Indirectly, dopamine signaling could effect the same process through interactions with basal ganglia neurons signaling reward or reward expectation (Hikosaka et al., 1989; Schultz et al., 1992; Wiener, 1993; Lauwereyns et al., 2002; Arkadir et al., 2004; Schmitzer-Torbert and Redish, 2004; Samejima et al., 2005; Watanabe and Hikosaka, 2005). Alternatively, altered MPC firing patterns could reflect changes in the spatially modulated activity of hippocampal neurons (Jones and Wilson, 2005). Because changes in goal position within an environment alter spatial firing patterns for both PFC and hippocampal neurons (Markus et al., 1995; Fyhn et al., 2002; Hok et al., 2005), this explanation applies equally to MPC activity changes observed between early and late traversals of the new HR path across which locomotor behaviors and allocentric positions were unchanged.
Intuitively, one might assume that changes in discharge reliability reflect synaptic plasticity (e.g., increased synaptic efficacy would increase spiking probability in response to excitatory afferents). In this respect, it is notable that activity of dopamine neurons could modulate synaptic plasticity through interactions with NMDA receptors (Otani et al., 2003; Chen et al., 2004; Huang et al., 2004). However, dopamine signaling could, without affecting synaptic strengths in the long term, produce the observed changes in discharge reliability. Dopamine, acting through D1 receptors, selectively enhances synaptic input to layer V PFC neurons and specifically enhances their task-related activation or discharge suppression (Williams and Goldman-Rakic, 1995; Durstewitz et al., 2000; Seamans et al., 2001). This action could serve to stabilize interactions between neurons acting as a functional group such that fluctuations between competing ensemble activity patterns would be minimized. Indeed, it has been suggested that D1 receptor stimulation increases the “energy barrier” between different network attractor states (Durstewitz, 2006). Ultimately, the impact of dopamine on neuronal response reliability will likely be clarified by experiments examining the dynamics of dopamine release in animals performing complex tasks.
The reliability of pattern production could also be influenced by basal ganglia output reaching the MPC. Work in monkeys demonstrates that caudate neurons exhibit greater task-dependent discharge during trials associated with higher reward magnitude (Hollerman et al., 1998; Kawagoe et al., 1998; Tremblay and Schultz, 1999; Hassani et al., 2001). If the absolute amount of task-related caudate activity in rodents is similarly affected by reward magnitude, its convergence with cortical sensory, self-motion, and route-position information (Conde et al., 1995; Nitz, 2006) could boost the probability of task-related MPC discharge.
In summary, the amount of reward resulting from a preceding action sequence drives at least two forms of change in MPC neuronal discharge on subsequent choice trials. The first, changes in firing patterns across task space, takes place even when actions and the environment in which they are executed remain unchanged. The second, alterations in discharge reliability, may explain the development of response selection biases and the ability of animals to execute a series of behaviors in an uninterrupted manner.
Footnotes
-
The present work was supported by the Neurosciences Research Foundation as well as the G. Harold and Leila Y. Mathers Foundation. W.J.K. is the Clayson Fellow in Motor Control. We thank Glen Davis and Kara Papaefthimiou for help with histology, unit analyses, and organizing the manuscript.
- Correspondence should be addressed to Douglas Nitz, The Neurosciences Institute, 10640 John J. Hopkins Drive, San Diego, CA 92121. nitz{at}nsi.edu