Abstract
The prefrontal cortex (PFC) is thought to learn the relationships between actions and their outcomes. But little is known about what changes to population activity in PFC are specific to learning these relationships. Here we characterize the plasticity of population activity in the medial PFC (mPFC) of male rats learning rules on a Y-maze. First, we show that the population always changes its patterns of joint activity between the periods of sleep either side of a training session on the maze, regardless of successful rule learning during training. Next, by comparing the structure of population activity in sleep and training, we show that this population plasticity differs between learning and nonlearning sessions. In learning sessions, the changes in population activity in post-training sleep incorporate the changes to the population activity during training on the maze. In nonlearning sessions, the changes in sleep and training are unrelated. Finally, we show evidence that the nonlearning and learning forms of population plasticity are driven by different neuron-level changes, with the nonlearning form entirely accounted for by independent changes to the excitability of individual neurons, and the learning form also including changes to firing rate couplings between neurons. Collectively, our results suggest two different forms of population plasticity in mPFC during the learning of action–outcome relationships: one a persistent change in population activity structure decoupled from overt rule-learning, and the other a directional change driven by feedback during behavior.
SIGNIFICANCE STATEMENT The PFC is thought to represent our knowledge about what action is worth doing in which context. But we do not know how the activity of neurons in PFC collectively changes when learning which actions are relevant. Here we show, in a trial-and-error task, that population activity in PFC is persistently changing, regardless of learning. Only during episodes of clear learning of relevant actions are the accompanying changes to population activity carried forward into sleep, suggesting a long-lasting form of neural plasticity. Our results suggest that representations of relevant actions in PFC are acquired by reward imposing a direction onto ongoing population plasticity.
Introduction
Among the myriad roles assigned to the mPFC, a common thread is that it learns a model for the statistics of actions and their expected outcomes, to guide or monitor behavior (Alexander and Brown, 2011; Euston et al., 2012; Holroyd and McClure, 2015; Khamassi et al., 2015; Starkweather et al., 2018; Wang et al., 2018). One way to probe this role is to use rule-switching tasks that depend on trial-and-error to uncover the statistics of each new action–outcome association. Previous work has shown that inactivating mPFC impairs the learning of new rules (Ragozzino et al., 1999a,b; Rich and Shapiro, 2007; Floresco et al., 2008), and single pyramidal neurons change their firing times relative to ongoing theta-band oscillations only with successful rule learning (Benchenane et al., 2010). In well-trained animals, a shift in their behavioral strategy in response to a rule change is preceded by a shift in population activity in PFC (Durstewitz et al., 2010; Karlsson et al., 2012; Powell and Redish, 2016), consistent with a change to a statistical model of the current action–outcome dependencies.
We know little, though, about how PFC population activity changes during the initial learning of rules (Peyrache et al., 2009; Tavoni et al., 2017; Maggi et al., 2018). The changes to population activity could be continuous or constrained only to periods of overt learning. And these changes could be modulations of firing rates, of firing correlations, or of precise co-spiking between neurons. Knowing the continuity and form of plasticity in population activity would provide strong constraints on theories for how statistical models of the world are acquired and represented by mPFC.
To address these questions, here we analyze the continuity and form of population plasticity in the PFC of rats learning rules on a Y-maze (Peyrache et al., 2009). We report that the structure of the population's activity markedly changes between the periods of sleep either side of training on the maze. This turnover in neural activity occurs whether or not there is behavioral evidence of learning during training, and can be accounted for entirely by changes to the excitability of individual neurons, with no contribution from changes to correlations. Unique to bouts of learning is that changes to the structure of population activity in training are carried forward into the following periods of sleep. These conserved activity states are created by a combination of changes to individual neurons' excitability and to rate, but not spike, correlations between neurons. Thus, PFC population activity undergoes constant plasticity, but this plasticity only has a persistent direction during learning.
Materials and Methods
Task and electrophysiological recordings.
Four Long–Evans male rats with implanted tetrodes in prelimbic cortex were trained on a Y-maze task (see Fig. 1A). Each recording session consisted of a 20–30 min sleep or rest epoch (pretraining epoch), in which the rat remained undisturbed in a padded flowerpot placed on the central platform of the maze, followed by a training epoch, in which the rat performed for 20–40 min, and then by a second 20–30 min sleep or rest epoch (post-training epoch). Figure 1B shows the structure of these three epochs in the 10 identified learning sessions. Every trial in the training epoch started when the rat left the beginning of the departure arm and finished when the rat reached the end of one of the choice arms. Correct choice was rewarded with drops of flavored milk. Each rat had to learn the current rule by trial-and-error: go to the right arm, go to the cued arm, go to the left arm, or go to the uncued arm. To maintain consistent context across all sessions, the extra-maze light cues were lit in a pseudo-random sequence across trials, whether they were relevant to the rule or not.
The data analyzed here were from a total set of 50 experimental sessions taken from the study of Peyrache et al. (2009), representing training sessions starting from naive until either the final training session, or until choice became habitual across multiple consecutive sessions (consistent selection of one arm that was not the correct arm). The 4 rats, respectively, had 13, 13, 10, and 14 sessions. From these, we have used here 10 learning sessions and up to 17 “stable” sessions (see below).
Tetrode recordings were spike-sorted only within each recording session for conservative identification of stable single units. In the sessions we analyze here, the populations ranged in size from 15 to 55 units. Spikes were recorded with a resolution of 0.1 ms. For full details on training, spike-sorting, sleep identification, and histology see Peyrache et al. (2009).
Session selection and strategy analysis.
We primarily analyze here data from the 10 learning sessions in which the previously defined learning criteria (Peyrache et al., 2009) were met: the first trial of a block of at least three consecutively rewarded trials after which the performance until the end of the session was >80%. In later sessions, the rats reached the criterion for changing the rule: 10 consecutive correct trials or one error in 12 trials. By these criteria, each rat learned at least two rules.
We also sought sessions in which the rats made stable choices of strategy. For each session, we computed P(rule) as the proportion of trials in which the rat's choice of arm corresponded to each of the three rules (left, right, cued-arm). Whereas P(left) and P(right) are mutually exclusive, P(cued-arm) is not, and has an expected value of 0.5 when it is not being explicitly chosen because of the random switching of the light cue. A session was deemed to be “stable” if P(rule) was greater than some threshold θ for one of the rules, and the session contained at least 10 trials (this removed only two sessions from consideration). Here we tested both θ = 0.9 and θ = 0.85, giving 13 and 17 sessions, respectively. These also, respectively, included 2 and 4 of the rule-change sessions. For the time-series in Figure 1C, E, F, we estimated P(rule) in windows of 7 trials, starting from the first trial, and sliding by one trial.
Characterizing population activity as a dictionary.
For a population of size N, we characterized the instantaneous population activity from time t to t + δ as an N-length binary vector or word. The ith element of the vector was a 1 if at least one spike was fired by the ith neuron in that time bin, and 0 otherwise. Throughout, we test bin sizes covering two orders of magnitude, with δ ranging from 1 to 100 ms. For a given bin size, the set of unique words that occurred in an epoch defined the dictionary of that epoch. The probability distribution for the dictionary was compiled by counting the frequency of each word's occurrence in the epoch and normalizing by the total number of time bins in that epoch.
For each session, we constructed three dictionaries per bin size, and their corresponding probability distributions P(Epoch): pretraining sleep P(Pre), post-training sleep P(Post), and trials during training P(Trials). To unambiguously identify sleep periods, and for comparisons with previous reports of replay in PFC (Euston et al., 2007; Peyrache et al., 2009), we used slow-wave sleep bouts for the pretraining and post-training sleep dictionaries.
We built dictionaries using the number of recorded neurons N, up to a maximum of 35 for computational tractability. The number of neurons used in each analysis is listed in Tables 1 and 2; where we needed to use less than the total number of recorded neurons, we ranked them according to the coefficient of variation of their firing rate between the three epochs, and choose the N least variable- in practice this sampled neurons from across the full range of firing rates. Only two learning sessions and six stable sessions were capped in this way.
Comparing dictionaries between epochs.
We quantified the distance D(P | Q) between two dictionarys' probability distributions P and Q using the Hellinger distance, defined by DH(P | Q) = . To a first approximation, this measures for each pair of probabilities (pi, qi) the distance between their square-roots. In this form, DH(P | Q) = 0 means the distributions are identical, and DH(P | Q) = 1 means the distributions are mutually singular: all positive probabilities in P are 0 in Q, and vice versa.
To understand whether a pair of pretraining and post-training sleep dictionaries meaningfully differed in their structure, we compared the distance between them, D(Pre | Post), with the predicted distance if they had an identical underlying probability distribution (in which case D(Pre | Post) > 0 would be solely due to finite sampling effects). We used a resampling test to estimate the predicted distance. We first created a single probability distribution P(sleep) for a session by calculating the probability of each word's appearance in all sleep bouts across both pretraining and post-training sleep epochs. We then sampled P(sleep) to create new time-series of pretraining and post-training sleep words, matching the number of emitted words in each epoch in the original data. By then reconstructing the dictionaries in each epoch from the resampled data, we obtained a prediction for the distance D(Pre* | Post*), where the asterisk indicates the estimate from the resampled data. Repeating the resampling 20 times gave us a distribution of expected distances assuming an identical underlying probability distribution for words. The sampling distribution's mean and its 99% CI are plotted for each session in Figure 3D, E: the intervals are too small to see on this scale.
We quantified the relative convergence of the training dictionary X with the dictionaries in sleep by [D(Pre | X) − D(Post | X)]/[D(Pre | X) + D(Post | X)]. Convergence > 0 indicates that the distance between the training epoch [P(X)] and post-training sleep [P(Post)] distributions was smaller than that between the training and pretraining sleep [P(Pre)] distributions.
Testing hypotheses for changes in dictionary structure.
To understand what drove the observed changes in the structure of population activity, we tested three hypotheses: (1) independent changes in the excitability of neurons; (2) changes in firing rate covariations between neuron; and (3) shifts in precise co-spiking between neurons. We tested these hypotheses in two steps:
We tested whether dictionaries constructed from independently firing neurons could account for the observed changes in the structure of population activity, with two possible outcomes:
Yes: Then we could conclude that changes in the data were due to independent changes to the excitability of the recorded neurons.
No: This implied that the correlations between neurons were also changed.
To then identify the types of those correlations, we turned to dictionaries constructed from spikes jittered a little in time, and asked whether they could account for the observed changes:
No: Then we would have evidence that precise co-spiking between neurons contributed to the changes in population activity structure.
Yes: Then changes to population activity did not depend on precise co-spiking and could be accounted for by changes to covariations in rate between neurons.
For the independent neuron dictionaries, we shuffled interspike intervals for each neuron independently, and then constructed words at the same range of bin sizes. As both the training and sleep epochs were broken up into chunks (of trials and slow-wave sleep bouts, respectively), we only shuffled interspike intervals within each chunk. This procedure kept the same interspike interval distribution for each neuron but disrupted any correlation between neurons during a trial or during a sleep bout, thus testing for dictionary changes that could be accounted for solely by changes to independent neurons. We repeated the shuffling 20 times.
For any given data statistic sdata for a single session, we compute the same statistic sshuffle for each shuffled dataset, and plot the difference δ = Sdata − E(Sshuffle) using the mean E() over the shuffled data's statistics. CIs at 99% for all δ were smaller than the size of the plotted symbol for δ, so are omitted for clarity.
For the jittered dictionaries, each spike was jittered in time by a random amount drawn from a Gaussian of mean 0 and SD σ. We tested σ from 2 to 50 ms. For each σ, we constructed 20 jittered datasets. Words were constructed from each using 5 ms bins here, both as this time-scale would capture millisecond-precise spike timing between neurons and because the biggest effects in the data were most consistently seen at this bin size.
We illustrated changes in the rate covariation between neurons using the coupling between single-neuron and ongoing population activity (Okun et al., 2015). Each neuron's firing rate was the spike density function fi obtained by convolving each spike with a Gaussian of 100 ms SD. Population coupling for the ith neuron is the Pearson's correlation coefficient: ci= corr(fi, P≠i), where P≠i is the population rate obtained by summing all firing rate functions, except that belonging to the ith neuron.
Relationship of location and change in word probability.
To examine the spatial correlates of word occurrence, the maze was linearized, and normalized (0 = start of departure arm; 1 = end of the chosen goal arm). The location of every occurrence of a word during the training epoch's trials (“trial word”) was expressed as a normalized position on the linearized maze, from which we computed the word's median location and corresponding interquartile interval. Histograms of median word location were constructed using kernel density, with 100 equally spaced points between 0 and 1.
We tested whether the trial words closer in probability to post-training than pretraining sleep were from any specific locations, which would suggest a changing representation of a key location. For each word, we computed the difference in its probability between training and pretraining sleep δpre = |p(pre) − p(trial) |, and the same for post-training sleep δpost = |p(post) − p(trial) |, and from these computed a closeness index: (δpre − δpost)/(δpre + δpost). Closeness is 0 if the word is equidistant from training to both sleep epochs, 1 if it has an identical probability between training and post-training sleep, and −1 if it has an identical probability between training and pretraining sleep.
When assessing identified maze segments, words were divided into terciles by thresholds on the closeness index at [−0.5, 0.5]; similar results were obtained if we used percentile bounds of [10%, 90%]. We counted the proportion of words in each tercile whose median position fell within specified location bounds on the linearized maze. CIs on the proportions were computed using 99% Jeffrey's intervals (Brown et al., 2001).
Statistics.
Quoted measurement values are mean x̄ and CIs for the mean [x̄ − tα/2,nSE, x̄ + tα/2,nSE], where tα/2,n is the value from the t distribution at α = 0.05 (95% CI) or α = 0.01 (99% CI), and given the number n of data points used to obtain x̄. For testing the changes in convergence, we used the Wilcoxon signed-rank test for a difference from 0; for differences in population coupling correlations, we used the Wilcoxon signed-rank paired-sample test. Throughout, we have n = 10 learning sessions and n = 17 stable sessions.
Data and code availability.
The spike-train and behavioral data that support the findings of this study are available at www.CRCNS.org (Peyrache et al., 2018). The sessions meeting our learning and stable criteria are listed in Tables 1 and 2.
Code to reproduce the main results of the paper is available at https://github.com/mdhumphries/PfCDictionary.
Results
Signatures of rule-learning on the Y-maze
Rats with implanted tetrodes in the prelimbic cortex learned one of four rules on a Y-maze: go right, go to the randomly cued arm, go left, or go to the uncued arm (Fig. 1A). Rules were changed in this sequence, unsignaled, after the rat did 10 correct trials in a row, or 11 correct trials of 12. Each rat learned at least two of the rules, starting from a naive state. Each training session was a single day containing 3 epochs totaling typically 1.5 h: pretraining sleep/rest, behavioral training on the task, and post-training sleep/rest (Fig. 1B). Here we consider bouts of slow-wave sleep throughout, to unambiguously identify periods of sleep. Tetrode recordings were spike-sorted within each session, giving populations of single-neuron recordings ranging between 12 and 55 per session (for details of each session and each epoch within a session, see Tables 1 and 2).
To test for the effects of learning on the structure of joint population activity, we need to compare sessions of learning with those containing no apparent learning as defined by the rats' behavior. In the original study containing this dataset, Peyrache et al. (2009) identified 10 learning sessions as those in which three consecutive correct trials were followed by at least 80% correct performance to the end of the session; the first trial of the initial three was considered the learning trial. By this criterion, the learning trial occurs before the midpoint of the session (mean 45%; range 28%–55%). We first check that this criterion corresponds to clear learning: Figure 1C, D shows that each of the 10 sessions has an abrupt step change in reward accumulation around the identified learning trial corresponding with a switch to a consistent, correct strategy within that session (Fig. 1E).
We further identify a set of 17 sessions with a stable behavioral strategy throughout, defined as a session with the same strategy choice (left, right, cue) on >85% of trials (Fig. 1F). This set includes four sessions in which the rule changed. Setting this criterion to a more conservative 90% reduces the number of sessions to 13 (including two rule change sessions) but does not alter the results of any analysis; we thus show the 85% criterion results throughout.
Constant plasticity of population activity between sleep epochs
We want to describe the joint population activity over all N simultaneously recorded neurons with minimal assumptions, so that we can track changes in population activity, however they manifest. Dividing time into bins small enough that each neuron either spikes (1) or does not spike (0) gives us the instantaneous state of the population as the N-element binary vector or word in that bin (Fig. 2). The dictionary of words appearing in an epoch and their probability distribution together describe the region of joint activity space in which the population is constrained. Comparing dictionaries and their probabilities between epochs will thus reveal if and how learning changes this region of joint activity.
If learning during training correlated with changes to the underlying neural circuit in PFC, then we might reasonably expect population activity in post-training sleep to also be affected by these changes, and so differ from activity in pretraining sleep. We thus compare the dictionaries in pretraining and post-training sleep for the learning sessions, and then check whether any detected changes also appear during sessions of stable behavior.
A first check is simply whether the dictionary content changed during learning and not stable behavior. We find that the words common to both sleep epochs (Fig. 3A) account for almost all of each epoch's activity (Fig. 3B) at bin sizes up to 20 ms. Consequently, there are no differences between learning and stable behavior in the overlap of dictionary contents between sleep epochs (Fig. 3A) or in the proportion of activity accounted for by words common to both sleep epochs (Fig. 3B). We could thus rule out that learning changes the dictionary content between sleep epochs compared with stable behavior. Any learning-specific change ought then be found in the structure of the population activity.
We capture this structure by the respective distributions P(Pre) and P(Post) for the probability of each word appearing in pretraining or post-training sleep. Changes to the detailed structure of the pretraining and post-training sleep dictionaries are then quantified by the distance between these probability distributions (Fig. 3C). These distances will vary according to both the number of neurons N and the duration of each epoch. So interpreting them requires a null model for the distances expected if P(Pre) and P(Post) have the same underlying distribution P(Sleep), which we approximate using a resampling test (see Materials and Methods). In this null model, any differences between P(Pre) and P(Post) are due to the finite sampling of P(Sleep) forced by the limited duration of each epoch.
In learning sessions, the distance between pretraining and post-training sleep probability distributions always exceeds the upper limit of the null model's prediction (Fig. 3D). This was true at every bin size (Fig. 3F), even at small bin sizes where the dictionaries were nearly identical between the sleep epochs (Fig. 3A). Thus, the probability distributions of words consistently differ between pretraining and post-training sleep epochs in learning sessions.
However, Figure 3E, F shows that this consistent difference is also true for the sessions with stable behavior. There is quantitative agreement too, as the gap between the data and predicted distances has the same distribution for both learning and stable behavior (Fig. 3F). We conclude that the probabilities of words do systematically change between sleep epochs either side of training but do so whether there is overt learning or not.
Learning systematically updates the dictionary
This leaves open the question of whether changes in population activity between sleep epochs are a consequence of changes during training. If the population changes between sleep epochs are unrelated to population activity in training, then the probability distribution of words in training will be equidistant on average from that in pretraining and post-training sleep. Alternatively, changes to population activity during training may carry forward into post-training sleep, possibly as a consequence of neural plasticity during the trials changing the region of joint activity space in which the population is constrained. A prediction of this neural-plasticity model is that the directional change would thus occur predominantly during learning sessions, so that only in these sessions is the distribution of word probabilities in training closer to that in post-training sleep than in pretraining sleep.
Unpicking the relationship between the sleep changes and training requires that the dictionary in training also appears in the sleep epochs; otherwise, changes to word probabilities during training could not be tracked in sleep. We find that the structure of population activity in training is highly conserved in the sleep epochs (Fig. 4A), both in that the majority of words appearing in trials also appear in the sleep epochs and that these common words account for almost all of the total duration of the trials. This conservation of the training epoch population structure in sleep allows us to test the prediction of a learning-driven directional change in population structure (Fig. 4B).
To do so, we take the dictionary of words that appear during training, and compute the distance between its probability distribution and the probability distribution of that dictionary in pretraining sleep (D(Pre | Learn)), and between training and post-training sleep (D(Post | Learn)) (Fig. 4C). The prediction of the directional change model is then D(Pre | Learn)>D(Post | Learn). This is exactly what we found: D(Pre | Learn) is consistently larger than D(Post | Learn) at small bin sizes, as illustrated in Figure 4D for 5 ms bins.
If these directional changes are uniquely wrought by learning, then it follows that we should not see any systematic change to the dictionary in the stable behavior sessions (Fig. 4B). To test this prediction, we similarly compute the distances D(Pre | Stable) and D(Post | Stable) using the dictionary of words from the training epoch, and test whether D(Pre | Stable) ≈ D(Post | Stable). Again, this is exactly what we found: D(Pre | Stable) was not consistently different from D(Post | Stable) at any bin size, as illustrated in Figure 4E for 5 ms bins.
It is also useful to consider not just which sleep distribution of words is closer to the training distribution, but how much closer. We express this as a convergence ratio C = [D(Pre | X)−D(Post | X)]/[D(Pre | X) + D(Post | X)], given the training distribution X = {Learn, Stable} in each session. So computed C falls in the range [−1, 1] with a value >0 meaning that the training probability distribution is closer to the distribution in post-training sleep than the distribution in pretraining sleep. Figure 4G shows that, for learning sessions, the word distribution in training is closer to the post-training than the pretraining sleep distribution across an order of magnitude of bin sizes. For stable sessions, the absence of relative convergence is consistent across two orders of magnitude of bin size (Fig. 4G). Both qualitatively and quantitatively, the structure of PFC population activity shows a relative convergence between training and post-training sleep that is unique to learning.
Changes to neuron excitability account for changes between sleep epochs
What then is the main driver of the observed changes in the structure of population activity? These could arise from changes to the excitability of independent neurons, to covariations in rate over tens to hundreds of milliseconds, or to the millisecond-scale precise timing of coincident spiking between neurons. We first examine the drivers of the changes between sleep epochs we saw in Figure 3.
Individual sessions show a rich spread of changes to neuron excitability between the sleep epochs (Fig. 5A). We thus begin isolating the contribution of these three factors by seeing how much of the change in population structure between sleep epochs can be accounted for by independent changes to neuron excitability. Shuffling interspike intervals within each epoch gives us null model dictionaries for independent neurons by removing both rate and spike correlations between them, but retaining their excitability (at least, as captured by their interspike interval distribution).
When we analyze the changes between sleep epochs for independent neuron dictionaries, the strong similarity with the results from the data dictionaries is compelling. We illustrate this in Figure 5B–D, by repeating the analyses in Figure 3D–F, but now using the independent neuron dictionaries, and see that the results are essentially the same. The departure from the null model of a single probability distribution in sleep is almost identical between the data and independent neuron dictionaries, illustrated in Figure 5E, F for 5 ms bins. And while the data dictionaries tend to depart further from the null model, this excess is negligible, being on the order of 0.1% of the total departure from the null model (Fig. 5G).
A potential confound in searching for the effects of correlation here are that words coding for two or more active neurons are infrequent at small bin sizes, comprising less than 10% of words at small bins sizes (Fig. 5H). As a consequence, any differences between the independent neuron and data dictionaries that depend on correlations between neurons in the data could be obscured. To check for this, we repeat the same analyses of the changes between sleep for both the data and independent neuron dictionaries when they are restricted to include only coactivity words. As Figure 5I shows, this did not uncover any hidden contribution of correlation between neurons in the data; indeed, for coactivity words alone, the difference between the data and the independent model is ∼0. Thus, the changes in word probabilities between pretraining and post-training sleep can be almost entirely accounted for by independent changes to the excitability of individual neurons (Fig. 5A).
Learning-driven changes to the dictionary include rate covariations
Can independent changes to individual neuron excitability also account for the relative convergence of dictionaries in learning? Repeating the comparisons of training and sleep epoch activity using the independent neuron dictionaries, we observe the same learning-specific convergence of the training and post-training sleep dictionaries, illustrated in Figure 6A for 5 ms bins (compare Fig. 4D,E). Figure 6B shows that the difference in convergence score between the data and independent neuron dictionaries is close to 0 at most bin sizes. This suggests that the changes in population activity during the trials that are carried forward to the post-training sleep can also be accounted for by the changing excitability of individual neurons.
To check this conclusion, we again account for the relative infrequency of coactivity words at small bin sizes by recomputing the distances between sleep and training epochs using dictionaries of only coactivity words. Now we find that, unlike the changes between sleep epochs, the relative convergence between training and post-training sleep for the data dictionaries is greater than for the independent neuron dictionaries (Fig. 6C). We conclude that changes to the correlations between neurons during the trials of learning sessions are also detectably carried forward to post-training sleep.
These correlations could take the form of covariations in rate, or precise coincident spikes on millisecond time-scales. To test for precise co-spiking, we construct new null model dictionaries: we jitter the timing of each spike and then build dictionaries using 5 ms bins to capture spike alignment. If precise co-spiking is contributing to the correlations between neurons, then relative convergence should be smaller for these jittered dictionaries than the data dictionaries. As Figure 6D shows, this is not what we found: across a range of time-scales for jittering the spikes, the difference in relative convergence between the data and jittered dictionaries was ∼0. The changed correlations between neurons are then rate covariations, not precise co-spiking.
Figure 6E–H gives some intuition for these changes in rate covariation. We measure the coupling of each neuron's firing to the ongoing population activity (Fig. 6E) as an approximation of each neuron's rate covariation (as population coupling is fixed to a particular time-scale, so it can only represent part of the covariation structure captured by the dictionaries of words). The distribution of population coupling across the neurons varied between epochs (Fig. 6F,G), signaling changes to the covariations in rate between neurons. Consistent with changes to rate covariations, the distribution of coupling tended to be more similar between training and post-training sleep than between training and pretraining sleep (Fig. 6H).
Locations of dictionary sampling during learning
The changes to population activity in training carried forward to post-training sleep may correspond to learning-specific elements of the task. We check for words linked to task elements by first plotting where each word in the training dictionary occurs on the maze during trials. Words cluster at three maze segments, as illustrated in Figure 7A for 3 ms bins: immediately before the choice area, at its center, and at the end of the chosen arm. This clustering is consistent across all bin sizes (Fig. 7B).
Repeating this location analysis using the dictionaries of independent words gives the same three clusters (Fig. 7A,B, gray lines). This suggests that the clustering of words at particular locations can be largely attributed to the amount of time the animals spent at those locations. The only departures are that the choice region is slightly underrepresented in the data dictionaries, and the arm-end slightly overrepresented. These departures are potentially interesting, as they correspond to key points in the task: the area of the maze at which the goal arm has to be chosen, and the arrival at the goal arm's reward port.
We thus check whether words in these three segments are more likely to have their probabilities in training carried forward to post-training sleep. Figure 7C shows that, when we plot the closeness of each word's probability in training and sleep, we obtain an approximately symmetrical distribution of locations for words closer to pretraining and post-training sleep. At the three maze segments, we indeed find that a word's probability in training is equally likely to be closer to pretraining sleep, equidistant from both sleep epochs, or closer to post-training sleep (Fig. 7D–F). We obtain the same results if we use just coactivity words, or if we divide the closeness distribution into pre/equidistant/post by percentiles rather than the fixed ranges we use in Figure 7D–F (data not shown). There is, then, no evidence in this analysis that words representing specific maze locations, and putatively key task elements, have their changes in training carried forward to post-training sleep. Rather, changes to the structure of population activity during learning are distributed over the entire maze.
Independent neurons capture the majority of structure in PFC population activity
The above analyses have shown that independently firing neurons capture much of the changes to and location dependence of population activity in mPFC. This implies that independent neurons can account for much of the population activity structure within each epoch. We take a closer look at this conclusion here.
A useful measure of the overall structure of the population spiking activity is the proportion of 1's that encode two or more spikes. The occurrence rates of these “binary errors” across different bin sizes tell us about the burst structure of the neural activity. Figure 8A shows that increasing the bin size applied to the data interpolates between words of single spikes and words of spike bursts in both training and sleep epochs. At bin sizes <10 ms, almost all 1's in each word are single spikes; at bin sizes >50 ms, the majority of 1's in each word are two or more spikes and so encode a burst of spikes from a neuron.
Dictionaries of independent neurons largely recapitulate these bin size dependencies for all epochs (Fig. 8B–D). Their only departure is ∼5% more binary errors than in the data at bin sizes >20 ms (Fig. 8D). As by construction there are the same number of spikes for each neuron in the data and independent neuron dictionaries, this implies that the data contain more spikes per burst on 50–100 ms time-scales (so that there are fewer bins with bursts in total).
A useful summary of the joint structure of population activity is the fraction of emitted words that code for two or more active neurons. For the data, increasing the bin size increases the fraction of emitted words that contain more than one active neuron (Fig. 8E), from ∼1% of words at 2 ms bins to all words at ≥50 ms bins. There are consistently more of these coactivity words in training epochs than sleep epochs for the same bin size, pointing to more short time-scale synchronous activity during movement along the maze than in sleep.
Dictionaries of independent neurons also recapitulate these bin size and epoch dependencies of neural coactivity (Fig. 8F–H). Figure 8H shows that the independent neuron dictionaries have more coactivity words at small bin sizes. It might be tempting here to conclude that the data dictionaries are constrained to fewer coactivity words than predicted by independent neurons, but these differences are equally consistent with a shadowing effect from spike sorting, where one or more near-simultaneous spikes from neurons on the same electrode are missed (Harris et al., 2000; Bar-Gad et al., 2001): when the data are shuffled, more near-simultaneous spikes between neurons are possible. Nonetheless, above bins of 5 ms, the disagreement between the data and independent neuron dictionaries is proportionally negligible (Fig. 8H). Consequently, much of the population activity in mPFC is well captured by an independent-neuron model, perhaps pointing to a high-dimensional basis for neural coding.
Discussion
We studied here how the structure of population activity in mPFC changes during rule-learning. We found the structure of instantaneous population activity in sleep always changes after training, regardless of any change in overt behavior during training. This plasticity of population activity could be entirely accounted for by independent changes to the excitability of individual neurons. Unique to learning is that changes to the structure of instantaneous population activity during training are carried forward into the following bouts of sleep. Population plasticity during learning includes both changes to individual neuron excitability and to covariations of firing rates between neurons. These results suggest two forms of population plasticity in mPFC: one a constant form unrelated to learning and the other correlated with the successful learning of action–outcome associations.
To isolate learning and nonlearning changes, we found useful the “strong inference” approach of designing analyses to decide between simultaneous hypotheses for the same data. We identified separable sessions of learning and stable behavior to contrast the hypothesis that population structure would only change during overt learning against the hypothesis that population structure is always changing regardless of behavior. Similarly, we contrasted three hypotheses for what drove those changes in population structure: changes to excitability of independent neurons, changes in brief covariations of rates, and changes in precise co-spiking.
A dictionary of cortical activity states
Characterizing the joint activity of cortical neurons is a step toward understanding how the cortex represents coding and computation (deCharms and Zador, 2000; Wohrer et al., 2013; Yuste, 2015). One clue is that the joint activity of a cortical population seems constrained to visit only a subset of all the possible states it could reach (Tsodyks et al., 1999; Luczak et al., 2009; Sadtler et al., 2014; Jazayeri and Afraz, 2017), in part determined by the connections into and within the network of cortical neurons (Fernández Galán, 2008; Marre et al., 2009; Ringach, 2009; Buesing et al., 2011; Habenschuss et al., 2013; Kappel et al., 2015). This view predicts that changing the network connections through learning would change the set of activity states (Battaglia et al., 2005).
We see hints of this prediction in our data. We found changes to the probability of words in training that are detectable in post-training sleep, consistent with the idea that reinforcement-related plasticity of the cortical network has persistently changed the constrained set of activity states. But changing the network's connections should change not just the set of activity states, but also their sequences or clustering in time (Tkacik et al., 2014; Ganmor et al., 2015). This suggests that further insights into population plasticity with these data could be found by characterizing the preservation of word sequences or clusters in time between training and sleep epochs, and comparing those to suitable alternative hypotheses for temporal structure.
Excitability drives constant population plasticity
A change in the statistics of a population's neural activity is not in itself evidence of learning (Okun et al., 2012). Indeed, we saw here a constant shifting in statistical structure between sleep epochs, regardless of whether the rats showed any evidence of learning in the interim training epoch. As these shifts between sleep could be seen at all time-scales of words we looked at, and were recapitulated by dictionaries of independent neurons, they are most consistent with a model of independent changes to the excitability of individual neurons.
Excitability changes could arise from the spontaneous remodeling of synaptic connections onto a neuron, whether from remodeling of dendritic spines (Fu et al., 2012; Hayashi-Takagi et al., 2015), or changes of receptor and protein expression within a synapse (Wolff et al., 1995; Ziv and Brenner, 2018). Alternatively, these changes could arise from long-lasting effects on neuron excitability of neuromodulators accumulated in mPFC during training (Seamans and Yang, 2004; Tierney et al., 2008; Dembrow et al., 2010; Benchenane et al., 2011). A more detailed picture of this constant population plasticity will emerge from stable long-term population recordings at millisecond resolution (Jun et al., 2017) of the same PFC neurons throughout rule-learning.
Learning correlates with directional population plasticity
Unique to learning a new rule in the Y-maze was that changes to word probability in training were carried forward to post-training sleep. As this persistence of word probability occurred most clearly for short time-scale words (≤20 ms), and were partly driven by changes in rate covariations, it is most consistent with a model of synaptic changes to the PFC driven by reinforcement. A possible mechanism here is that reinforcement-elicited bursts of dopamine permitted changes of synaptic weights into and between neurons whose coactivity preceded reward (Izhikevich, 2007; Benchenane et al., 2011). Such changes in synaptic weights would also alter the excitability of the neuron itself, accounting for the changes between pretraining and post-training sleep epochs in learning sessions.
A particularly intriguing question is how the constant and learning-specific plasticity of population activity are related. Again, stable long-term recordings of spiking activity in the same population of neurons across learning would allow us to test whether neurons undergoing constant changes in excitability are also those recruited during learning (Lee et al., 2012; Hayashi-Takagi et al., 2015). Another question is how the carrying forward of training changes of population activity into sleep depends on an animal's rate of learning. In each learning session here, the identified learning trial was before the halfway mark, meaning that the majority of words contributing to the training dictionary came from trials after the rule was acquired. It is an open question as to whether the same relationship would be seen in sessions of late learning, or in tasks with continual improvement in performance rather than the step changes seen here.
Replay and dictionary sampling
The increased similarity of word probability in training and post-training sleep suggests an alternative interpretation of “replay” phenomena in PFC (Euston et al., 2007; Peyrache et al., 2009). Replay of neural activity during waking in a subsequent episode of sleep has been inferred by searching for matches of patterns of awake activity in sleep activity, albeit at much coarser time-scales than used here. The better match of waking activity with subsequent sleep than preceding sleep is taken as evidence that replay is encoding recent experience, perhaps to enable memory consolidation. However, our observation that the probabilities of words in stable sessions' trials are not systematically closer to those in post-training sleep (Fig. 4) is incompatible with the simple replay of experience-related activity in sleep. Rather, our results suggest that learning correlates with persistent changes to the cortical network, such that words have more similar probabilities of appearing in training and post-training sleep than in training and pretraining sleep. In this view, replay is a signature of activity states that appeared in training being resampled in post-training sleep (Battaglia et al., 2005).
Population coding of statistical models
What constraints do these changes to mPfC population activity place on theories for acquiring and representing statistical models of actions and their outcomes? In this view, the joint activity of the population during the trials represents something like the joint probability P(a, o | state) of action a and outcome o given the current state of the world (Alexander and Brown, 2011); or, perhaps more generally, a model for the transitions in the world caused by actions, P(state(t + 1) | a, state(t)). Such models could support the proposed roles of mPFC in guiding action selection (by querying the outcomes predicted by the model), or monitoring behavior (by detecting unexpected deviations from the model). The changes in the structure of population activity during learning are consistent with updating such models based on reinforcement.
Our results show these dictionary changes are carried forward to the spontaneous activity of sleep, suggesting that the encoded statistical model is present there, too. One explanation for this stems from the sampling hypothesis for probability encoding. In this hypothesis, a population encodes a statistical model in the joint firing rates of its neurons, so that the pattern of activity across the population at each moment in time is a sample from the encoded distribution (Fiser et al., 2010; Berkes et al., 2011). This hypothesis predicts that spontaneous activity of the same neurons must still represent samples from the statistical model: but in the absence of external input, these are then samples from the “prior” probability distribution over the expected properties of the world.
According to this hypothesis, our finding that learning-driven changes to population structure are conserved in post-training sleep is consistent with the statistical model now reflecting well-learned expected properties of the world: namely, that a particular set of actions on the maze reliably leads to reward. In other words, the prior distribution for the expected properties of the world has been updated. Further, the sampling hypothesis also proposes a role for the constant changes of excitability without obvious direction: that such spontaneous plasticity explores possible configurations of the network and so acts as a search algorithm to optimize the encoded statistical model (Kappel et al., 2015; Maass, 2016). These links, although tentative, suggest the utility of exploring models for probabilistic codes outside of early sensory systems (Fiser et al., 2010; Pouget et al., 2013).
Footnotes
M.D.H. was supported by Medical Research Council Senior Non-Clinical Fellowship Award MR/J008648/1. A.S. and M.D.H. were supported by Medical Research Council Grant MR/P005659/1. A.P. was supported by Canada Research Chair Tier 2 (154808). The original data were obtained through funding from the EU Framework 6 ICEA project. We thank Silvia Maggi and Rasmus Petersen for comments on early drafts of this manuscript; and the M.D.H. laboratory (Javier Caballero, Mat Evans) for discussions.
The authors declare no competing financial interests.
- Correspondence should be addressed to Mark D. Humphries at mark.humphries{at}nottingham.ac.uk