Abstract
Multiple memory systems are distinguished by different sets of neuronal circuits and operating principles optimized to solve different problems across mammalian species (Tulving and Schacter, 1994). When a rat selects an arm in a plus maze, for example, the choice can be guided by distinct neural systems (White and Wise, 1999) that encode different relationships among perceived stimuli, actions, and reward. Thus, egocentric or stimulus–response associations require striatal circuits, whereas spatial or episodic learning requires hippocampal circuits (Packard et al., 1989). Although these memory systems function in parallel (Packard and McGaugh, 1996), they can also interact competitively or synergistically (Kim and Ragozzino, 2005). The neuronal mechanisms that coordinate these multiple memory systems are not fully known, but converging evidence suggests that the prefrontal cortex (PFC) is central.
The PFC is crucial for abstract, rule-guided behavior in primates and for switching rapidly between memory strategies in rats. We now report that rat medial PFC neuronal activity predicts switching between hippocampus- and caudate-dependent memory strategies. Prelimbic (PL) and infralimbic (IL) neuronal activity changed as rats switched memory strategies even as the rats performed identical behaviors but did not change when rats learned new contingencies using the same strategy. PL dynamics anticipated learning performance whereas IL lagged, suggesting that the two regions help initiate and establish new strategies, respectively. These neuronal dynamics suggest that the PFC contributes to the coordination of memory strategies by integrating the predictive relationships among stimuli, actions, and reward.
Introduction
Human cognition is organized by abstract rules that include sets or strategies that organize relationships among actions and contingencies (Fuster et al., 2000). Rule- or strategy-guided behavior is demonstrated when identical stimuli elicit different responses or different stimuli elicit the same response, as in the Wisconsin card sorting task (Milner, 1982). Prefrontal cortex (PFC) damage impairs normal rule learning in humans (Owen et al., 1991; Gershberg and Shimamura, 1995; Levine et al., 1998; Bunge et al., 2005) and disrupts rule-guided behavior in monkeys (Dias et al., 1996; Bussey et al., 2001; Gaffan et al., 2002). Consistent with the effects of lesions, monkey PFC neurons encode abstract rules (Wise et al., 1996; White and Wise, 1999), categories (Freedman et al., 2001), and strategies (Genovesio et al., 2005), so that neuronal activity depends more on interpretation and goal than either physical stimulus attributes or responses (White and Wise, 1999).
Although the structural homology of the rodent and primate PFC remains controversial, strategy switching provides a compelling functional homology: rats trained to use one rule to find food (e.g., approach a place) adapt to changing task contingencies and use a new rule (e.g., make a body turn). Strategy switching requires the animals to distinguish different relationships among the same stimuli and responses to perform well and is impaired by medial PFC (mPFC) inactivation (Ragozzino et al., 1999; Rich and Shapiro, 2007). Because place and response strategies are implemented by coactive and dissociable brain systems that operate in parallel, strategy switching provides an important model for investigating the neural mechanisms of PFC function and its role in coordinating multiple memory systems. The neuronal coding mechanisms that support strategy switching in rodents remain unknown.
The present experiment recorded prelimbic (PL) and infralimbic (IL) neuronal activity while rats switched between place and response strategies in a plus maze paradigm that was impaired by mPFC inactivation (Rich and Shapiro, 2007). The experimental design distinguished the influence of strategy from other sensory–behavioral correlates by assessing neuronal activity as rats made the same consistent path from start to goal before, during, and after switching (see Fig. 1). Thus, the path from the south start arm to the east goal arm was identical in both the “go east” spatial and the “turn right” response tasks. If PL/IL neurons code task strategy rather than specific goals, rewards, or overt behaviors, then their activity (1) should change when a new strategy is adopted, even if the two strategies are expressed in identical behavior, and (2) should be stable when the same strategy is expressed by different behaviors. Both of these predictions were met. Strategy coding predominated in the rat PL and IL and predicted the time course of even while the rats performed identical behaviors. The results show that the rat mPFC, like primate PFC, helps organize behavior by encoding abstract rules that describe the higher-order relationships among percepts and actions, suggest that “rule coding” by prefrontal cortex may be conserved across mammalian species, and that these rules can influence multiple memory systems.
Materials and Methods
Animals
Four Long–Evans rats weighing 275–325 g at the beginning of the experiment were housed individually in a colony room held on 12 h light/dark cycle. After acclimating to the colony room for at least 5 d, rats were food restricted to no less 85% of their ad libitum body weight and maintained on a restricted diet for the duration of the experiment.
Recording equipment
Drive assemblies of 12 independently mobile tetrodes (Neuro-hyperdrive; David Kopf Instruments) were constructed and implanted using standard stereotaxic methods. Tetrodes were fabricated by twisting together four 12.7-μm-diameter nickel–chromium wires (RO-800; Kanthal Precision Technology). The Cheetah data acquisition system (Neuralynx) recorded simultaneously from the 12 tetrodes, as well as the animal's location and heading as signaled by light-emitting diodes on the drive assembly that were detected by an overhead video camera and stored as time-stamped x–y coordinates (640 × 480 camera pixels, 16.7 ms/sample). Custom COM (Microsoft) programs running simultaneously with Cheetah acquisition software allowed the insertion of event flags that signaled variables such as trial type, strategy, or correct versus incorrect trials. Data were sorted offline according to these event flags.
Maze
An elevated plus maze was made of wood with four arms (65 cm long, 6.25 cm wide, with outside edges 2.5 cm high) meeting at 90° angles. Food wells were drilled at the ends of all arms to hold cereal reward. The bottoms of the wells were made of mesh screen, below which was an inaccessible food reward to minimize the influence of odor cues in the task. A waiting platform was placed next to the maze. The maze and waiting platform were open to the testing room, which had several stationary visual cues on the walls. East (E) and west (W) maze arms were designated “goal” arms, and north (N) and south (S) arms were designated “start” arms. Only one start arm was open on a given trial, and a wooden block prevented access to the unused start arm. In each trial, both goal arms were open but only one was baited.
Behavior analysis
In all tasks, the rat was placed on a start arm and trained to approach the choice point and then enter a goal arm for food reward (half of a piece of Froot Loops cereal; Kellogg's). In “place” tasks, rats learned to approach one of the two goal arms (east or west) from both start arms. In “response” tasks, rats learned to make either a right or a left body turn on every trial to enter the rewarded goal arm. Thus, in place tasks, rats learned to find food by approaching the same location at the end of one of two goal arms by turning in opposite directions from each of two opposing start arms. In response tasks, rats learned to find food by turning in the same direction from each start arm and thus enter opposite goal arms.
A key feature of strategy switching in the plus maze allowed dissociation of rule coding from other sensory–behavioral correlates: one “consistent path” from start to goal always remained constant before, during, and after switching. To clarify this important feature, “paths” (trajectories), “tasks” (contingencies), and “strategies” (abstract rules) were defined as hierarchically related, increasingly abstract levels of behavior organization in the plus maze (Fig. 1). Rats learned four possible paths, specific trajectories through the maze from start to goal, e.g., north to west (N-W), north to east (N-E), etc. Two paths together defined a task, which stipulated stimulus–response–reward contingencies; for example, N-E and S-E paths were rewarded in the “go east” task, and N-E and S-W paths were rewarded in the “go left” task. In this experiment, tasks were also guided by strategies, abstract rules that did not stipulate reward contingencies but defined the cognitive approach relevant to solving a task. Both “go east” and “go west” thus required spatial navigation and described a place strategy, whereas “go left” and “go right” required stereotyped body turns and described an egocentric response strategy. Different paths could thereby serve the same strategy; for example, the N-W and N-E paths could be guided by different “go west” and “go east” tasks, but both required approaching a place. The key comparison was provided by consistent paths that were shared by both place and response strategies; for example, the N-W path was correct during both the “go west” place task and the “turn right” response task.
At the start (∼20 trials) of each session, task contingencies were unchanged from the previous day (“before” phase), and performance levels were high, demonstrating that rats recalled previous training (mean ± SE, 96.2 ± 0.4% correct). Contingencies then changed without warning, from a place task to a response task or vice versa during switches (e.g., “go east” to “turn left”) or from one place task to the other during reversals (e.g., “go east” to “go west”) (Fig. 2). When contingencies were changed, the rats initially continued to follow the previous task and made errors and then improved performance throughout the remainder of the session. During switches, one path remained correct before and after the switch (e.g., both “go west” and “turn right” both require N-W paths) (Fig. 2e), so that 50% performance reflected correct choices on consistent paths and errors on “changing paths”; during reversals, both paths changed so that rats performed worse initially (0% correct). Rats learned new tasks through trial and error within one continuous session. After reaching criterion performance (eight consecutive correct trials), 20–30 additional trials tested the newly acquired task (“after” phase). Rats learned and followed new contingencies with proficiency equal to the before phase in both switches and reversals (performance, 96.5 ± 0.4%; ANOVA; effect of training phase, F(1,30) = 0.08, p = 0.78; effect of session type, F(1,30) = 2.106, p = 0.16; interaction of phase and session type, F(1,30) = 0.186, p = 0.67) (Fig. 2d). Switch or reversal sessions were separated by at least 2 d of stable performance (SP) sessions in which rats performed the most recently learned task for 30–50 trials (at >80% correct).
Maze acclimation
Before surgery, all rats were handled and acclimated to the testing environment by allowing them to forage for food randomly scattered on the maze.
Surgery
Rats were anesthetized with continuous-flow isoflurane and mounted in a stereotaxic frame. Rectal temperature was monitored, and core temperature was maintained with a heating pad. The scalp was shaved, scrubbed with Betadine, and incised. A single burr hole was drilled at stereotaxic coordinates +3.2 mm anterior, −0.5 mm lateral from bregma. Dura was removed, and the electrode bundle was lowered to the cortical surface. Two ground wires were attached to skull screws, and the drive assembly was affixed to the skull with dental acrylic. All electrodes were immediately lowered 1.27 mm into the cortex. Rats were allowed to recover for 5–10 d after surgery before beginning maze training.
During pretraining, tetrodes were lowered to their target depths. In each rat, 6 of the 12 recording tetrodes were aimed at the PL region, whereas the other 6 were aimed at IL. Because the tetrode bundle is large (1 mm diameter), tetrodes aimed at each region were interspersed in the bundle and counterbalanced across rats to avoid systematic differences in mediolateral or anteroposterior positioning. Initially, tetrodes were lowered by ∼0.6 mm/d, in 0.3 mm increments. When tetrodes reached the depth of the upper PL (∼2.5 mm) or IL (∼3.8 mm), they were adjusted with fine movements until a stable signal was detected. Tetrodes were lowered throughout the experiment to collect data from new cell populations but were not moved within 48 h before a switch or reversal.
Pretraining
After surgery, rats reacclimated to the maze with another foraging session. The next day, cereal reward was placed only in the food cups on both goal arms, and one start arm was blocked. Rats were placed on each start arm twice and given access to both goal arms for four trials total. The direction of their first turn was recorded for each trial, and three or more turns in the same direction was noted as a turning bias. If a rat displayed a turning bias, then the first response strategy that rat learned was opposite to their turning bias.
Training
Each rat was assigned pseudorandomly to either “go east” or “go west” as an initial task. In each trial, the rat was placed at the distal end of a start arm (north or south) facing the center of the maze and allowed to enter one goal arm (east or west). Rats were not allowed to correct errors; if they attempted to backtrack at any point, the trial ended with no reward. If the rat chose the correct arm, it was allowed to consume the food and then placed on the waiting platform until the start of the next trial. If the rat entered the incorrect arm, then it was returned to the waiting platform with no reward. Intertrial intervals were ∼5–8 s, and criterion performance was eight consecutive correct trials.
During initial training, trials occurred from each of the two start arms, alternating every two trials for up to 40 trials. If rats reached criterion on the first day, training ended and the next day the same initial strategy and task were performed in a session of 40 trials from pseudorandomly assigned start arms (such that no more than three consecutive trials occurred from the same start arm). If rats did not reach criterion on the first day, they were trained using the alternating pattern of start arms until they reached criterion, and then they progressed to pseudorandom start arm trials. Each rat performed 40 trials of the initial strategy from pseudorandom start arms, achieving >80% correct trials for 2 d, and then switch or reversal training began.
Rats performed multiple switches and reversals on the plus maze, each over the course of ∼8 weeks. PL/IL activity is required to perform at least three task switches but not spatial reversals (Rich and Shapiro, 2007). Task contingencies were changed initially so that two rats learned a spatial reversal, and two learned to switch from a spatial to a response task. Each switch or reversal session started with a before phase in which rats first performed a block of trials in their previously learned task, and start arms were presented pseudorandomly. Then task contingencies changed without warning or unusual delay between trials. The start arms were interleaved with two trials from each start to ensure equal sampling of both paths during the switch or reversal. When rats reached a criterion of eight consecutive correct trials in the new task, they were required to continue performing the newly acquired task, but start arms were again presented pseudorandomly. All data included in the final analyses were collected during switches or reversals acquired in one training session.
Twenty-four hours after a switch or reversal session, rats performed the new task again for at least 40 trials, constituting a stable performance session. All switches/reversals were separated by a minimum of two stable performance sessions and therefore occurred at least 3 d apart (usually 7–10 d).
Sessions selected for analysis
Only spatial reversals were included in the analysis because rats did not reliably learn response reversals within one training/recording session. Stable performance sessions were only included after the task had been performed above criterion for at least 1 d. Twelve switch, eight stable performance, and five reversal sessions met all criteria for additional analysis.
Recording
Data acquisition began for each trial when the rat was placed at the distal end of a start arm. Recording was paused while rats consumed the reward and were placed on the waiting platform.
Unit discrimination.
Activity of multiple single cells (units) were recorded from tetrodes that provide more accurate discrimination of individual cells than single-electrode recordings (Harris et al., 2000). To assign waveforms to single units, custom computer programs calculated parameters from digitized waveforms and displayed these parameter values as points in a multidimensional space, with each dimension defined by one parameter (e.g., peak or valley time and amplitude, peak-to-valley distance, valley-to-valley distance, etc). Points were assigned to clusters offline with semi-automated, nonlinear elliptical cluster cutting software. The software allows manual adjustment, which dramatically increases accuracy of unit discrimination (Harris et al., 2000). All clearly separated clusters were included for subsequent analysis. The same computer program assessed potential drift by displaying sequential spike waveforms in bins of spikes sampled throughout the trial and by calculating average waveforms for selected trial subsets (supplemental Section 1, supplemental Fig. 1, available at www.jneurosci.org as supplemental material).
Behavior analysis.
Learning curves were generated for each switch session using an algorithm developed by Smith et al. (2004) that calculates the probability of the rat making a correct choice as a function of trial number. This approach takes the viewpoint of an ideal observer, such that performance in the entire session is considered to calculate the probability of a correct trial at each point. The algorithm has been compared with other methods (e.g., moving averages) and has been shown to be more accurate for estimating learning (Smith et al., 2004). The algorithm requires two constants to be specified. The initial probability (before learning) of correct outcome (p0) was set to 0.5 for switches and 0.0 for reversal. This value was chosen because the rats initially followed a previously used task, which would result in 50% correct trials in the new strategy during switches or 0% correct during reversals. The maximum value for the data was specified as 1, because trial outcomes were coded as follows: 0, incorrect; 1, correct. A conditional start flag estimated the amount of bias in the initial conditions. Because rats had been trained previously on a strategy, the value specifying the most bias (start flag, 2) was used.
Blocks of trials were defined by the lower 95% confidence interval (CI) for the probability of correct outcome as follows. For initial analysis, each curve was divided into four blocks: before, early, late, and after. The before phase included all trials before task contingencies changed. The early phase included trials from the time the contingencies changed until the learning curve showed improving performance (the lower CI surpassed 0.30 during switches and 0.10 during reversals). The late phase included the remaining trials before performance reached criterion (lower CI above 0.30 and below 0.60 for both switches and reversals). The after phase included all trials after high performance was achieved in the new task (lower CI >60%). Additional analysis of switches divided each curve into 11 trial blocks. Before and after phases were each divided into three blocks of trials, so that blocks 1–3 consisted of trials before the switch and blocks 9–11 consisted of trials after criterion performance was reached. Blocks 4 (lower CI of 0.4–0.3) and 5 (lower CI <0.29) quantified performance after contingencies changed and before the learning curve began to increase; blocks 6–8 assessed improving performance (block 6, lower CI of 0.3–0.39; block 7, lower CI of 0.4–0.49; block 8, lower CI of 0.5–0.59). Average performance per block was calculated as the proportion of correct trials within a defined block. All post hoc comparisons with repeated-measures ANOVA assessed changes in average performance across blocks. All post hoc comparisons used the Bonferroni's method, so that the corrected α = 0.05.
Spatial behavior analyses
Because the maze arms were narrow (6.25 cm wide), the rats' movements were primarily restricted to specific trajectories. In rare instances, a rat failed to make a smooth trajectory through the maze during an individual trial (e.g., pausing in the middle of a trial, leaning over the edge of the maze, etc.). To exclude these aberrant behaviors and their potential influence on single-unit activity, each trial was replayed and observed on the computer monitor. Any trial with aberrant behavior was excluded from analysis.
To further ensure that overt behavior did not change during consistent paths in two strategies, consistent path trials occurring before and after a switch were parsed from the recording session and saved in separate computer files. Custom software extracted and analyzed the rat's position data using location arrays. Because higher grid resolutions can reveal smaller statistical differences than low resolution grids (larger n), we used a relatively high-resolution, 50 × 50 grid (∼5 cm2/grid unit) to assess behavior. To quantify behavior during these consistent path trials before and after switches, the rat's position, running speed, and direction of movement were analyzed statistically. Position was quantified by the number of visits to each ∼5 cm grid unit per trial, and correlation of these values before and after switching assessed the spatial variability of the rat's movement across trial blocks. Similarly, the rat's heading direction and running speed in each visited grid unit was correlated across trial blocks. Average running speed was also compared across trial blocks using t tests. t tests assessed differences in the overall magnitude of speed, whereas the correlations assessed the spatial distribution of those speeds. The statistical analyses examined behavior on the whole maze and then separately examined the three critical subregions of the maze: the start arm, choice point, and goal arm, defined by the grid units in these regions. Separate t tests and correlations were performed for each subregion as described above to ensure that unit activity was only assessed during homogenous behavior.
To compare switches to identical behaviors during stable performance, one of two paths was selected at random from each stable performance session, and the first and last five trials of the selected path were parsed from the session and analyzed as above.
Unit analysis.
Behavioral correlates of unit activity were analyzed using temporal (supplemental Section 2, supplemental Fig. 2, available at www.jneurosci.org as supplemental material) and spatial tests, and the same overall pattern of results was obtained in both. The initial analysis examined changes in firing rate in the start and the goal arms by dividing the maze into a 28 × 28 grid (∼9.3 cm2/grid unit) and calculating the average firing rate in each grid unit. Sessions were divided into four phases (before, early, late, and after; stable performance sessions were divided into four equal trial blocks), each phase was separated into paths originating from each start arm, and each path was separated into start (six grid units) and goal (seven grid units) regions. The average on-maze firing rate was calculated for each cell during each phase. “Silent cells” with firing rates <.01 Hz during all four phases were excluded from additional analysis. Phase × path × maze region ANOVAs were performed for each neuron. Bonferroni's-corrected post hoc comparisons were performed for all phase-responsive cells, and cells were classified as either persistently changing (if before and after phases differed) or transiently changing (if there were differences in firing rates, but before and after phases did not differ). To ensure that changes in firing rates observed during switches were not attributable to movement of tetrodes during recording, waveforms were monitored throughout recording session, and average waveforms were compared statistically before and after switching (supplemental Section 1, supplemental Fig. 1, available at www.jneurosci.org as supplemental material).
To directly compare changes in neuron activity and behavior under the same constraints, correct trials at the beginning (before) and end (after) of switch, reversal, and stable performance sessions were divided into paths originating from each start arm. The same high-resolution grids (50 × 50) described above were used here to assess the same spatial resolution in both unit activity and behavior. The mean firing rate of each neuron was calculated for each grid unit excluding those never visited by the rat. The resulting firing rate maps for each neuron recorded before and after the switch were then compared with t tests to determine whether cells significantly changed firing rates.
Temporal dynamics of neuron activity was analyzed during switches by comparing the firing rates of persistently changing cells (those with a main effect of phase by ANOVA and a significant before vs after post hoc test, as described above), during consistent paths. Each session was divided into 11 trial blocks (described above), and average firing rate per grid unit was calculated for each block. For each neuron, the spatial distribution of firing rates was standardized by calculating a z-score based on the mean and SD of firing rates across all visited locations. This normalization ensured that each cell in an array would contribute equally to the population analysis independent of its mean firing rate. For each cell, a vector of 22 z-scores was created by calculating the z-score of the average grid unit per maze region per trial block (11 trial blocks × start and goal regions). Each array of PL and IL neurons were submitted to principal components analysis (supplemental Fig. 3, available at www.jneurosci.org as supplemental material) and correlation analyses.
Correlation matrices were analyzed by converting each Pearson's r to Fisher's Z, which normalizes the distribution of r values (Fisher, 1924). Because trials before the switch were divided into three trial blocks, the first six values from each vector represent measures taken before the switch (three trial blocks in start and goal regions), and the first six rows (or columns) of the correlation matrix represent correlations of each successive trial block with the before phase (supplemental Figs. 4, 5, available at www.jneurosci.org as supplemental material). Therefore, the values of the first six rows were compared across trial blocks (start and goal pixels were included in the same trial block) using repeated-measures ANOVA with a factor of brain region (PL or IL). Because the correlation matrix is diagonally symmetric, some measures in the first six rows repeat in the top left corner. For comparisons between groups, these measures were excluded, so that correlations starting when the switch began (trial block 4) through the end of matrix (trial block 11) were submitted to analysis (supplemental Fig. 4, available at www.jneurosci.org as supplemental material). These calculations assessed the decay of the original code, rbefore. The same procedure was followed for the last six rows of the correlation matrices, which represent the population correlations with the phase after the switch. Here, measures in the bottom right repeat and were excluded, so that correlations starting at the beginning of the matrix (trial block 1) through the end of the switch (trial block 8) were submitted to analysis. These calculations assessed the development of the new code, rafter. Changes across blocks within each population were also assessed using a repeated-measures multivariate ANOVA (MANOVA) and orthogonal contrasts (Systat Software). For this analysis, the before block was defined by the unique entries in the first six rows in the top left (i.e., start and goal arm blocks 1,2; 1,3; 2,3), and the after block was defined by the unique entries in the last six rows (i.e., start and goal arm blocks 9,10; 9,11; 10,11). In this way, before was compared with each block from 4 to 11, and after was compared with each block from 1 to 8. All post hoc comparisons used the Bonferroni method so that the corrected α = 0.05.
Results
Rats learned plus maze strategies with switches and reversals
Rats (n = 4) were trained to perform strategy switches and reversals on a plus maze (Fig. 2a) (see Materials and Methods). Before a switch or reversal, rats were pretrained in one task. On testing days, rats performed a series of trials in the pretrained task (the before phase), and then task contingencies changed without warning and new contingencies were learned through trial and error. In the same testing session, rats were then tested on a series of trials using the new task (the after phase). Performance was quantified by computing a learning curve for each switch or reversal using a state-space smoothing algorithm that calculates the probability of a correct choice as a function of trial number (Smith et al., 2004). To compare neuron activity during equivalent learning epochs across rats and sessions, trials within each session were grouped into performance-matched blocks for analysis (Fig. 2b,c). Performance was identical before and after switching (Fig. 2c, compare blocks 1–3, 9–11) but differed statistically during the trial and error phase, when performance was poor (blocks: repeated-measures ANOVA, F(10,110) = 25.26; p < 0.001). Performance improved measurably by block 7, when mean performance was typically ≥70% correct.
PL/IL neurons encoded strategy switches
After initial training, the rats were implanted with 12 recording tetrodes, six targeted to PL and six to IL (Fig. 3). A total of 374 neurons (208 PL and 166 IL) were analyzed: 184 from strategy switches, 83 from reversals, and 107 from sessions in which one task was performed consistently with no switch or reversal (stable performance sessions). Similar proportions of PL and IL neurons were recorded across the three session types (χ22df = 3.17; p = 0.20). The mean ± SEM firing rate of PL/IL neurons was 1.34 ± 0.18 Hz. PL and IL fired with similar rates overall (PL, 1.40 ± 0.20; IL, 1.27±.33; t371df = 0.34; p = 0.73), and the average was maintained during switches (1.44 ± 0.31), reversals (1.28 ± 0.31), and stable performance (1.00 ± 0.23; ANOVA, F(2,371) = 0.55; p = 0.58).
Behavioral correlates of PL/IL activity resembled those reported previously [e.g., many neurons fired during goal approach (Jung et al., 1998; Pratt and Mizumori, 2001) (supplemental Fig. 2, available at www.jneurosci.org as supplemental material)]. The most striking and new results emerged, however, when we analyzed single-unit activity changes during strategy switches. Dynamic, strategy-related coding predominated in both PL and IL when quantified by both perievent time histograms (supplemental Section 2, available at www.jneurosci.org as supplemental material) and the spatial distribution of firing rates (below). In contrast, coding was essentially unchanged during stable performance and reversal learning. Thus, mPFC neurons coded switches among memory systems but not reversals within a single system.
To quantify the influence of different task variables on single units, three-way ANOVAs compared firing rates across four learning phases (before, early, late, and after), two paths, and two maze regions (start and goal arms) during switch, reversal, and stable performance sessions (Table 1). More neurons responded to learning phase (significant ANOVA effect at α < 0.01) during strategy switches than either reversals or stable performance (χ22df = 17.15; p < 0.001). Equal proportions of PL and IL neurons were phase responsive during switches (z = 1.62; p = 0.10). Most phase-responsive neurons were persistently changing cells (59.2%, 45 neurons), which fired with significantly different rates before and after the switch (post hoc tests, p < 0.05) (Figs. 4a, 5a,c). Firing rates decreased in 62.5% (25) and increased in another 37.5% (15) of persistently changing neurons. Other phase-sensitive cells (35.5%, 27 neurons) changed transiently, firing differently during early or late phases but similarly before and after the switch. The remaining 5.3% (four neurons) had only a main effect of phase. Simultaneously recorded ensembles obtained from individual sessions also had higher proportions of phase-responsive cells during switches than other conditions (Fig. 4c) (proportion of cells with ANOVA effect of session type, F(2,22) = 8.77, p = 0.002; post hoc comparisons: switch vs reversal, p = 0.05; switch vs stable performance, p = 0.002; reversal vs stable performance, p = 1.0). Thus, PL and IL firing rates distinguished strategy switches from stable performance and reversal learning sessions by developing new codes as rats acquired the switch.
Coding dynamics reflected strategy switches during identical behaviors
If PL/IL coding dynamics specifically reflect switching memory strategies rather than other behavioral variables, then coding should change when rats learn to use different strategies even as overt behavior remains identical. To test this, neuronal activity recorded exclusively in correct consistent paths throughout strategy switching was parsed and analyzed (e.g., N-W paths as a rats switched from the “go west” place task to the “turn right” response task) (Fig. 5a). Because PL/IL neurons can respond to running speed (Poucet, 1997; Pratt and Mizumori, 2001) or deviations in spatial trajectories (Euston and McNaughton, 2006), we compared the reliability of behavior statistically (see Materials and Methods), inspected trials replayed offline, and removed those in which behavior varied (e.g., a pause in the middle of a trial or an initial turn toward the incorrect goal). Behavior tended to be highly stereotyped, with zero to one trials removed from most trial blocks (mean, 0.64 per block). Among the remaining trials, position, running speed, and movement direction never differed between before and after phases (all p > 0.05) and were highly correlated across phases (running speed, all r > 0.59, p < 0.001; spatial position, all r > 0.65, p < 0.001; direction, all r > 0.34, p < 0.005). To measure behavior on a finer scale, the same variables were analyzed separately in start, choice point, and goal subregions, and again behavior in the consistent path was indistinguishable before and after switching (supplemental Section 3, supplemental Figs. 6, 7, available at www.jneurosci.org as supplemental material).
Coding changes were prominent during strategy switches even in these highly filtered data, with many neurons (22.9%, 40 of 175) firing at markedly different rates (α < 0.01) before and after strategy switches. By comparison, fewer neurons fired differentially during consistent paths at the beginning versus the end of stable performance, when strategies were identical (12.6%, 13 of 103; χ21df = 4.4; p < 0.04) (Fig. 5b). Firing rate changes were equally common in place-to-response and response-to-place switches (z = 0.95; p = 0.34) and were equally rare when place or response tasks were performed stably (z = 0.60; p = 0.55). Across conditions, similar proportions of PL and IL neurons changed firing rate (switches, z = 0.88, p = 0.38; stable performance, z = 1.55, p = 0.12). Thus, consistent with the previous analysis, PL/IL neurons responded dynamically to changing memory strategies even when behavior was identical and responded relatively stably when abstract rules were constant.
Neuronal coding was consistent in different behaviors guided by the same strategy
If PL/IL neurons guide changing memory strategies rather than changes to behavior or task contingencies, then different behaviors using the same rule should be coded similarly. To assess this hypothesis, we compared different paths guided by the same strategy before a switch (e.g., N-E vs S-E in the “go east” task). Most neurons (91.3%, 157 of 172) active during this phase fired with similar patterns even when rats used two different paths to accomplish the same strategy, and only 8.5% (15 of 172) fired differently (α < 0.01). In contrast, 22.6% (40 of 172 neurons) of the same neuronal population had distinct codes when different strategies guided the same path (χ21df = 13.5; p < 0.001). Overall, the activity of 19.8% (35 of 172 neurons) distinguished strategies, 5.6% (10 of 172) distinguished paths, and 2.5% (5 of 172 neurons) distinguished both path and strategy (Fig. 5d). These results provide compelling evidence that PL/IL coding dynamics accompanied strategy switching between memory systems rather than changes to overt behavior guided by the same memory system.
If PL/IL neurons encode switching memory strategies rather than changing contingencies, then firing should be similar when task contingencies change but the same strategy is followed, as in reversal sessions. We therefore compared switches, in which task contingencies and strategies changed, with reversals, in which task contingencies changed but strategies did not. To ensure that similar behavior dynamics were compared across sessions, only changing paths were analyzed here. Proportionally fewer neurons changed firing rates during reversals (10.8%, 9 of 83) than switches (21%, 36 of 173; χ21df = 3.85; p < 0.05) (Fig. 5b,c). Neither PL nor IL neurons were strongly affected by the reversal (PL, 9.6%, n = 5; IL, 12.9%, n = 4; z = 0.46; p = 0.65). Furthermore, proportions of changing cells were similar during reversals (10.8%) and stable performance (12.6%), suggesting that altered contingencies alone had no more influence on PL/IL coding than ongoing consistent behavior. Thus, PL/IL coding accompanied changes in strategy from one memory system to another more than changes in overt behavior or reward contingency guided by the same memory system.
The different firing patterns observed across learning conditions provides strong evidence that neither waveform instability nor variance in maze running behavior can account for the selective changes in PL/IL coding during strategy switching. If, for example, firing rate changes across phases were attributable to tetrode movement or other sources of waveform instability, then phase-sensitive neurons should be equally distributed across the three behavioral treatments, but they were not. Similarly, if small differences in maze running behavior altered firing rates across learning phases, then firing in consistent paths should be more similar than firing in changing paths, but they were not. Rather, firing rate differed most when strategies were changed, indicating that neither unit instability nor the details of maze behavior accounted for PL/IL coding dynamics.
PL established new population codes before IL
The behavioral correlates of PL/IL activity above could either influence or reflect more widely distributed strategy switching mechanisms. If the neuronal dynamics contribute to switching strategies, then neuronal activity changes should precede the onset of switching behavior; conversely, if the correlates reflect “upstream” mechanisms, then changes in behavior would be expected to precede those in neuronal activity. We therefore analyzed the temporal pattern of population coding during switching in persistently changing neurons, the best candidates for coding abstract rules. Each recording session was divided into 11 performance-based trial blocks using learning curves (see Materials and Methods). The firing rate of each cell was normalized to its session mean, and z-scores were computed for each trial block in both the start and goal arms. Thus, the activity of each neuron was represented by a vector of 22 z-scores (11 trial blocks × start and goal arms). These vectors were grouped into two arrays, one composed of PL and one of IL neurons, and a correlation matrix was computed from each array (supplemental Fig. 5, available at www.jneurosci.org as supplemental material). Only correct consistent path trials were included to ensure that changes in population activity were attributable to the strategy switch and not other behavioral variables. Both PL and IL population codes were stable from one trial block to the next when rats performed well using one strategy, i.e., during stable performance before and after switching (supplemental Fig. 5, available at www.jneurosci.org as supplemental material), but the population codes changed significantly as the switch was acquired.
To assess the dynamics of strategy-related population coding, we measured (1) the decay of the original, pre-switch activity patterns once the switch was imposed, and (2) the emergence of new codes as performance of the new strategy improved (Fig. 6). We thus calculated two curves for each population: rbefore measured the average correlation between activity before the switch and each successive trial block and represented the decay of pre-switch codes (Fig. 6); rafter measured the average correlation between activity after the switch and each successive block and represented the emergence of new codes (Fig. 6). As shown by the population matrices (supplemental Fig. 5, available at www.jneurosci.org as supplemental material), PL and IL coding was stable before switching and rbefore was high. The initial PL and IL populations codes decayed rapidly after contingencies were changed but with different dynamics (rbefore interaction of population and trial blocks, F(7,154) = 0.4.17, p < 0.001; block 7, contrasts, p = 0.001) (Figs. 6, 7b) (supplemental Fig. 8, available at www.jneurosci.org as supplemental material). The initial PL code decayed significantly from block 5 onward, whereas IL did not decay significantly until block 7 (MANOVA of blocks before: PL, F(8,40) = 28.6, p < 0.001; IL, F(8,40) = 9.4, p < 0.001; paired contrasts: PL, before vs blocks 5–11, all F(1,5) > 45.8, p < 0.05; IL, before vs blocks 7–11, all F(1,5) > 17.1, p < 0.05). New PL codes emerged in block 5 and thus anticipated switching performance (MANOVA of blocks after, PL, F(8,40) = 18.8, p < 0.001; paired contrasts after vs blocks 5–11, F(1,5) < 11.5, p > 0.1). Moreover, PL population codes for the new strategy emerged faster than IL (Figs. 6, 7c) (supplemental Figs. 2, 5, 8, available at www.jneurosci.org as supplemental material) (rafter interaction of population and trial block, F(7,154) = 9.05, p < 0.001; PL vs IL blocks 1–3 vs 4–7, rafter post hoc contrasts for all blocks, F(1,22) > 7.3, p < 0.05). New IL codes emerged only in block 8, after performance began improving (MANOVA of blocks after: IL, F(8,40) = 10.9, p < 0.001; paired contrasts after vs blocks 8–1, F(1,5) < 14.9, p > 0.1). Because performance levels began improving significantly at block 7 and reached an average of 83.7% correct in block 8, new IL code emerged relatively late in learning. The different dynamics suggest that PL and IL may contribute differently to rule coding. PL population changes preceded switching, suggesting that these neurons contribute to mechanisms that initiate strategy switching. In contrast, IL population codes only began to change after the onset of behavioral switching, suggesting that these neurons do not initiate strategy switches, although they may help to establish the persistence of new memories strategies.
Discussion
PL/IL activity encodes strategy switches
By varying memory strategies while keeping behavior constant and varying behavior or task contingencies while keeping strategies constant, we found that PL/IL neurons respond specifically when rats switch strategies in the plus maze. Five important results emerged. PL/IL coding was stable (1) when both behavior and strategies were unchanged, (2) when the same strategy guided different paths, and (3) when different tasks were adopted during reversal learning. In contrast, PL/IL coding (4) changed rapidly when rats changed strategies even when overt behavior was identical. (5) PL coding changes anticipated the rats' behavioral adoption of the new strategy, whereas IL coding lagged. Together, the results show that PL and IL neurons rapidly encode strategy changes. We suggest that this coding facilitates coordination of multiple memory systems and extends Miller and Cohen's (2001) guided activation theory to include the rodent PFC.
The present results dovetail with lesion and in inactivation studies that show that PL/IL dysfunction selectively impairs strategy switching and set shifting but not initial task acquisition or reversal learning (Ragozzino et al., 1999; Birrell and Brown, 2000). Similarly, mPFC activity (Mulder et al., 2003) or functional connectivity (Baeg et al., 2007) changes during learning, and tasks with different rules are coded differently by these neurons (Jung et al., 1998). However, this investigation is the first to directly compare learning-related changes between a task that requires mPFC neurons (switching) and tasks that do not (SP and reversals).
Previous experiments have shown that PL/IL neurons can be especially sensitive to behavioral variability (Euston and McNaughton, 2006; Cowen and McNaughton, 2007). However, the present results found neural activity to be remarkably stable when the same strategy was used, even if behavior varied. Indeed, most neurons (>90%) responded similarly even when the rat followed different paths, as long as the paths were guided by the same strategy (Fig. 5). That neurons were relatively insensitive to changes in overt behavior compared with previous reports (Euston and McNaughton, 2006) may have resulted in part from different cognitive demands. We recorded mPFC during strategy switching exactly because inactivation of these neurons selectively impaired this cognitive ability (Rich and Shapiro, 2007) and predicted that PL/IL neurons would encode strategy-related information. Indeed, when cognitive demands required the PL/IL, the rules guiding goal-directed behavior were coded more prominently than sensorimotor variables that distinguish different paths or goal locations.
Population dynamics in PL and IL reflected strategy switching. Although both populations established new firing patterns during strategy switching, new PL codes anticipated learning whereas IL established new stable codes only after performance was near proficient. The different dynamics imply that PL and IL support related but distinct roles in strategy switching. Early responses in PL may contribute to switching mechanisms by promoting new strategies, inhibiting old strategies, or both. Selective PL lesions reduce behavioral flexibility in operant tasks when reward values change (Killcross and Coutureau, 2003) and impair the ability to resolve ambiguous situations based on previous experience (Haddon and Killcross, 2006). In the plus maze, PL neurons may facilitate switching by integrating multiple task contingencies during goal-directed learning (Corbit and Balleine, 2003), a process required to flexibly update abstract, strategy-guided behavior. IL activity also changed prominently during strategy switches, but the transition to the new stable state occurred only as performance approached criterion. The correlation between representational stability and switching performance suggests that IL coding is important later in the learning process, perhaps helping to establish the new strategy for future selection. Indeed, IL dysfunction impairs the persistence of extinction in pavlovian tasks (Rhodes and Killcross, 2004) including fear conditioning (Quirk et al., 2006) with the same temporal pattern as described for task switching (Rich and Shapiro, 2007), suggesting that this area helps establish new learning in both tasks. The effects of selective PL and IL lesions on strategy switching in the plus maze remain unknown, and future experiments are needed to determine their independent functions.
Among neurons responding to strategy switches, persistently changing cells revealed a candidate mechanism for rule encoding, because their firing patterns were stable within each strategy but distinct between strategies. If PL/IL neurons encode rules per se, then they should exhibit distinct, reliable, and predictable responses when rats switch repeatedly from one strategy to another. Alternatively, the neurons may respond to transitions between rules, so that reliable patterns of activity correlate with switching to or away from a strategy. Finally, the activity may encode switching per se, signaling that a new rule must be learned without specifying any particular one. From this view, distinct patterns of activity may not recur predictably with repeated rule switching beyond the certainty that the pattern will change as a new rule is learned. Experiments that record activity across multiple task changes [e.g., an ABA (applied behavioral analysis) design] should distinguish among these hypotheses.
PFC function in rodents and primates
The human PFC contributes to the use of rules and strategies of varying levels of complexity (Owen et al., 1991; Gershberg and Shimamura, 1995; Levine et al., 1998; Bunge et al., 2005). PFC damage impairs tasks that require contextually appropriate rules (Milner, 1982; Shallice and Burgess, 1991; Levine et al., 1998), including the Wisconsin card sorting task (Grant and Berg, 1948). PFC activation is associated with maintenance of task-setting contextual knowledge (MacDonald et al., 2000; Sakai and Passingham, 2003) and with the presentation of cues indicating task-solving strategies (Brass and von Cramon, 2004). The monkey prefrontal cortex is crucial for rapid acquisition and retention of rules, and neuronal activity in the PFC reflects these functions (Wise et al., 1996). The lateral PFC in primates, which may be functionally homologous to the rat mPFC (Birrell and Brown, 2000; Brown and Bowman, 2002), is crucial for set shifting (Dias et al., 1996) and self-ordered strategy selection tasks (Bussey et al., 2001; Gaffan et al., 2002). Lateral PFC neurons encode learned strategies or rules (White and Wise, 1999; Asaad et al., 2000; Fuster et al., 2000; Wallis et al., 2001; Genovesio et al., 2005) and perceptual categories (Freedman et al., 2001). The elaboration of cortical circuitry and cognition is more complex in humans and other primates than in rats, but the basic computations required for “rule learning” may derive from PFC mechanisms common across species.
Coordinating multiple memory systems
Strategy switching, set shifting, and reversal learning each require behavior flexibility, are impaired by PFC disruption, and reveal the role of PFC in cognitive control (Miller and Cohen, 2001). Set shifting requires rats trained to discriminate one dimension of complex stimuli (e.g., odor or texture) to discriminate a different dimension of the same complex stimulus (Birrell and Brown, 2000). Reversal learning entails exchanging responses to two stimuli. All three require recognizing contingency changes, withholding responses to old contingencies, and learning to follow new contingencies through trial and error. Despite these similarities, important biological differences exist. Set shifting and strategy switching are impaired by mPFC disruption; reversal learning is not. We observed relatively stable PL/IL activity during reversals in the present experiment, and changing contingencies had no more influence on PL/IL activity than stable performance of one task (Fig. 5c). The dissociation between set shifting and strategy switching on the one hand and reversal learning on the other provides an important approach for investigating how coding mechanisms across PFC regions and associated brain systems support cognition (de Bruin et al., 1994; Ragozzino et al., 1999, 2003; Birrell and Brown, 2000; Rich and Shapiro, 2007). Reversal learning is impaired by orbital PFC (OFC) inactivation (Kim and Ragozzino, 2005), a treatment that does not impair strategy switching. In contrast to the “memory strategy” coding described here for mPFC neurons, rat OFC neurons signal “expected outcomes,” the affective valence associated with a given stimulus (Schoenbaum et al., 2003).
The present results begin to show how mPFC coding mechanisms may help coordinate interactions among memory systems. Place and response-guided memory strategies have been doubly dissociated in the plus maze (Packard and McGaugh, 1996). The hippocampus- and caudate-based memory systems normally function in parallel but can compete for the control of behavior in the plus maze, as when hippocampal inactivation facilitates response learning (Chang and Gold, 2003). We proposed that mPFC activity during strategy switching selectively modulates these different memory systems, so that when a rat is placed in a familiar situation, the memory system that most recently predicted reward will guide behavior (Rich and Shapiro, 2007). By recording mPFC neurons and distinguishing paths, tasks, and strategies as hierarchically linked descriptions of behavior in the plus maze, we can now extend this proposal. Changes in paths or tasks can be guided by different representations within each memory system. For example, hippocampal neurons fire differently during “go east” and “go west” trials in the plus maze, and prospective coding in common start arms predicts memory performance(Ferbinteanu and Shapiro, 2003). mPFC coding changed minimally during such spatial reversals, which do not entail changes between memory systems, although OFC neurons are predicted to strongly code such reversal learning (Schoenbaum et al., 2003). In contrast, mPFC activity changed dramatically with shifts between memory systems, which in this experiment are operationally identical to memory strategies or abstract rules. From this view, the mPFC facilitates switching among brain modules rather than switching between coding patterns within modules. This interpretation predicts the following: (1) that other paradigms that involve abstract rule learning (e.g., set shifting) require mPFC function because such learning requires representational shifts among functional (as an example) cortical modules rather than “remapping” within modules; (2) the same paradigms will engage PL/IL neurons; and (3) paradigms that do not require changes among functional modules will do neither.
Footnotes
-
This work was supported by National Institutes of Health Grants MH065658 and MH073689 and the Mount Sinai School of Medicine.
- Correspondence should be addressed to Matthew Shapiro, Fishberg Department of Neuroscience, Mount Sinai School of Medicine, One Gustave Levy Place, Box 1065, New York, NY 10029. matthew.shapiro{at}mssm.edu