Sleep Spindle Activity is Associated with the Integration of New Memories and Existing Knowledge

Sleep spindle activity has been associated with improvements in procedural and declarative memory. Here, for the first time, we looked at the role of spindles in the integration of newly learned information with existing knowledge, contrasting this with explicit recall of the new information. Two groups of participants learned novel spoken words (e.g., cathedruke) that overlapped phonologically with familiar words (e.g., cathedral). The sleep group was exposed to the novel words in the evening, followed by an initial test, a polysomnographically monitored night of sleep, and a second test in the morning. The wake group was exposed and initially tested in the morning and spent a retention interval of similar duration awake. Finally, both groups were tested a week later at the same circadian time to control for possible circadian effects. In the sleep group, participants recalled more words and recognized them faster after sleep, whereas in the wake group such changes were not observed until the final test 1 week later. Following acquisition of the novel words, recognition of the familiar words was slowed in both groups, but only after the retention interval, indicating that the novel words had been integrated into the mental lexicon following consolidation. Importantly, spindle activity was associated with overnight lexical integration in the sleep group, but not with gains in recall rate or recognition speed of the novel words themselves. Spindle activity appears to be particularly important for overnight integration of new memories with existing neocortical knowledge.


Introduction
Sleep appears to consolidate new memories by strengthening and integrating them with existing memories (Payne et al., 2008). It is unclear which aspects of the neural architecture of sleep drive these effects. Although earlier research examined the role of sleep stages, interest in sleep spindles (11-15 Hz oscillations lasting up to 3 s) is comparatively recent. Two-step theories of memory consolidation suggest that storage is initially hippocampally mediated, but gradually gains neocortical representation through dialogue between the two structures (McClelland et al., 1995;Buzsáki, 1998). Slow oscillations (Ͻ1 Hz) allow synchronization between neocortical activity and hippocampal ripples, which are crucial to memory consolidation (Diekelmann and Born, 2010). Spindles increase during the up-state of slow oscillations  and are temporally aligned with hippocampal ripples (Siapas and Wilson, 1998;Sirota et al., 2003), implicating them in the plasticity of the hippocampal-neocortical consolidation process.
Spindle activity is associated with memory performance, including consolidation of declarative memories (Schabus et al., 2004;Clemens et al., 2005;Schmidt et al., 2006). For example, word-pair learning before sleep induced higher sleep spindle activity than a nonlearning task, and spindle activity correlated positively with recall after sleep .
These studies suggest that spindles are implicated in consolidation of memory for studied items, but little is known about whether spindles are involved in integrating new memories with existing knowledge. In this study, we used a word learning paradigm that assessed both recall of novel words and their integration into the existing body of lexical knowledge (Gaskell and Dumay, 2003). When a newly learned spoken word (e.g., cathedruke) is added to the mental lexicon, it should compete during recognition with similar-sounding familiar words (e.g., cathedral), exhibiting an inhibitory effect on the response time (RT) to the familiar word. This impact on recognition of familiar words is dissociable from measures of memory for the novel word itself: a delaying effect of the novel competitor on the existing word can only occur if the novel word has been integrated into the recognition system for existing words.
The emergence of such lexical competition coincides with a shift from hippocampal to neocortical processing. Davis et al. (2009) examined brain activation to familiar words, newly learned, and unknown novel words using fMRI. Half the newly learned words were learned on the scanning day, and half 1 d before. A strong hippocampal response was seen in response to unknown items encountered for the first time in the scanner compared with newly learned words that had been introduced that day. The strength of this blood oxygenation level-dependent response predicted subsequent memory, suggesting hippocam-pal involvement in initial encoding of novel words. At the neocortical level, unknown novel words showed an elevated response relative to familiar words. For the newly learned novel words, only the items learned the previous day showed behavioral lexical competition effects, and only these stimuli had the neocortical activation profile of existing words, demonstrating that novel words show a word-like neocortical response only after some consolidation. Another study found sleep to facilitate the emergence of lexical competition and novel word recall (Dumay and Gaskell, 2007). Here we examine the sleep-related electrophysiological basis of lexical integration and novel word recall. We focus on spindle activity and attempt to identify the type of learning it most benefits.

Materials and Methods
Participants and design. Of 60 native English-speaking participants, 30 were randomly allocated to the sleep group (7 males; mean age, 20.3 years), and 30 to the wake group (10 males; mean age, 20.5 years). All were required to abstain from alcohol and drugs for 24 h before training, to avoid caffeine during the day of training and between session 1 (S1) and session 2 (S2), and to maintain a regular sleep schedule for 3 d before the experiment. Exclusion criteria included medication affecting sleep and a history of sleep or mental disorders.
Wake group participants performed S1 in the morning, with the test phase beginning between 9:15 and 10:00 A.M. (Fig. 1). They spent the day normally (refraining from alcohol, caffeine, and sleep) and returned between 6:00 and 8:00 P.M. to carry out S2 (average time between S1 and S2 test, 9.5 h). They returned a week later (mean, 6.5 d; range, 3-7 d) in the evening for session 3 (S3), identical to S2. Sleep group participants arrived in the evening, were wired for polysomnographic (PSG) recording, and completed S1 with the test phase beginning between 10:00 and 10:30 P.M. They then prepared for bed, slept in the laboratory (mean, 8.0 h; range, 6.0 -8.7 h), and were awakened at 7:45 A.M. After breakfast, S2 began between 8:10 and 8:40 A.M. (average time between S1 and S2 test, 10.0 h). They returned a week later (mean, 6.9 d; range, 3-11 d) in the morning for S3.
Materials. Critical stimuli were 60 triplets (Tamminen and Gaskell, 2008) consisting of a familiar base word (e.g., cathedral), a fictitious novel word derived from the base word (e.g., cathedruke), and a similarsounding non-word foil to be used in old-new categorization (e.g., cathedruce). Base words were bisyllabic or trisyllabic, with an average length of 8.0 phonemes. Mean frequency was 4.5 occurrences per million. The triplets were divided into two matched lists of 30; each partic-ipant was trained on novel words from one list, and the base words of the other list acted as controls. The magnitude of lexical competition was calculated by subtracting RTs to control base words for which no new competitor was learned from RTs to test base words for which a competitor was learned. Assignment of lists to conditions was counterbalanced.
Procedure. Participants were exposed to each novel word 30 times in random order, and asked to detect visually presented target phonemes. This phoneme monitoring task ensures reliable learning (Gaskell and Dumay, 2003). Four blocks of a verbal repetition task were included to allow practice in pronouncing the novel words. Participants were informed that memory for the stimuli would be tested.
The test phase included four subtests, adapted from previous studies (Gaskell and Dumay, 2003;Dumay and Gaskell, 2007). In the auditory lexical competition test, participants made speeded word/non-word decisions (lexical decision) to 30 test base words, 30 control base words, 60 filler real words, and 120 filler non-words (pronounceable but meaningless words), in random order. The equal numbers of words and nonwords prevented guessing strategies. RTs were measured from word onset, with a response deadline of 2.5 s from word offset.
In free recall, participants had 3 min to recall as many novel words as possible. In cued recall, the first two or three phonemes of a novel word were presented, and 10 s was given to recall and produce the complete novel word. In old-new categorization, the newly learned novel words intermixed with foils were presented. Half the novel words preceded and half followed their corresponding foil, with at least four intervening stimuli. Participants decided whether a stimulus was familiar or not, with RTs measured from word onset with 3 s response deadline. Only half of the newly learned novel words were included in this task in S1, to determine whether performance in the other test tasks in S2 and S3 was affected by this extra exposure in S1. No such effect was found. Test tasks were presented in fixed order to ensure that the task involving presentation of the novel words (old-new categorization) was always last. Stimuli were delivered through headphones and manual responses were made by pressing response keys on a laptop keyboard.
EEG recording and analysis. A Grass-Telefactor Technologies system recorded EEG (200 Hz sampling rate). Four scalp electrodes were positioned using the international 10 -20 system (F3, F4, C3, C4) with contralateral mastoid references. Two electro-oculographic channels monitored eye movements, and one chin electromyographic channel monitored muscle tone. Epochs (30 s) of sleep data were categorized into stages (Rechtschaffen and Kales, 1968). Spindle analysis involved artifact-rejected nonrapid-eye movement (NREM) sleep [stage 2 and slow-wave sleep (SWS)]. Raw EEG data were bandpass filtered (11-15 Hz) using a linear finite impulse response filter. Automated detection (Ferrarelli et al., 2007) derived the number of discrete spindle events: for each channel, amplitude fluctuations in the filtered time series exceeding a predetermined threshold counted as spindles. Thresholds were calculated relative to the mean channel amplitude (eight times average amplitude). This algorithm is similar to others and was also used by Nishida and Walker (2007).

Behavioral measures
Free recall Due to equipment failure, one participant's data were lost in free recall and four in cued recall. Percentages of words recalled were arcsine-transformed for analyses to better meet the assumption of normality; Figure 2 shows untransformed data. A mixed ANOVA (Greenhouse-Geisser corrected where necessary) with participant group (sleep vs wake) and test session (S1, S2, S3) as variables showed a main effect of session (F (2,110) ϭ 3.31, p ϭ 0.04) and an interaction between group and session (F (2,114) ϭ 8.41, p Ͻ 0.001). Planned comparisons showed that recall rates in the sleep group improved significantly overnight (F (1,28) ϭ 16.17,  p Ͻ 0.001), whereas there was no such improvement in the wake group during this interval. In contrast, recall rates improved between the two final sessions in the wake group (F (1,27) ϭ 12.32, p ϭ 0.002), but not in the sleep group. The S1-S3 contrast failed to reach significance in both groups. Equivalent analyses for all tasks using difference scores are presented in the supplemental material, available at www.jneurosci.org.

Old-new categorization
RTs to novel words were analyzed with extremes (outside 500 -3000 ms; 0.7% of the data) and errors (25% of the data) removed. Each remaining data point was log transformed (arcsine-transformation is appropriate for percentages/proportions only) to better meet the assumption of normality and reduce effects of remaining outliers. Data in Figure 2 have been retransformed back to raw units. The ANOVA showed a significant main effect of session (F (2,112) ϭ 33.01, p Ͻ 0.001) and a marginally significant interaction (F (2,112) ϭ 3.02, p ϭ 0.053). In the sleep group, RTs became significantly faster overnight, (F (1,28) ϭ 21.98, p Ͻ 0.001); no significant S1-S2 change was seen in the wake group. No S2-S3 change was seen in the sleep group, whereas the wake group showed significantly faster RTs (F (1,28) ϭ 21.18, p Ͻ 0.001). The S1-S3 contrast was significant in both groups (sleep group, F (1,28) ϭ 31.77, p Ͻ 0.001; wake group, F (1,28) ϭ 25.99, p Ͻ 0.001).

Lexical competition
Log-transformed RTs to base words in the test (familiar words with a novel competitor) and control (familiar words with no new competitors) conditions were analyzed following removal of errors (11.9%) and outliers (outside 300 -2500 ms; 0.15% of the data). A mixed ANOVA with session, base word condition (test or control), and participant group was calculated, with a dummy variable for counterbalancing list. We found main effects of session (F (2,112) ϭ 15.85, p Ͻ 0.001) with mean RTs falling with each session, and base word condition (F (1,56) ϭ 9.39, p ϭ 0.003) with slower overall RTs to test base words. There was also an interaction between session and base word condition (F (2,112) ϭ 26.46, p Ͻ 0.001). Immediately after training, both groups showed a significant facilitatory effect for test base words (sleep group, F (1,28) ϭ 5.49, p ϭ 0.026; wake group, F (1,28) ϭ 5.43, p ϭ 0.027) (Fig. 2). In S2, the effect changed into an inhibitory lexical competition effect, with slower responses to test than control base words, both in the sleep (F (1,28) ϭ 10.13, p ϭ 0.004) and wake (F (1,28) ϭ 10.41, p ϭ 0.003) groups. A significant lexical competition effect was also observed in S3 in both groups (sleep, F (1,28) ϭ 12.36, p ϭ 0.002; wake, F (1,28) ϭ 21.23, p Ͻ 0.001). An interaction was found between session and group (F (2,112) ϭ 4.55, p ϭ 0.017).
In the wake group, overall RTs became significantly faster between S1 and S2 (F (1,28) ϭ 24.62, p Ͻ 0.001), whereas no such change was seen in the sleep group.

Subjective measures of sleepiness
Stanford Sleepiness Scale was administered before test sessions. An ANOVA with session and group showed a main effect of session (F (2,116) ϭ 72.83, p Ͻ 0.001), but not group. Participants felt more tired in S1 than in S2 (t (59) ϭ 8.69, p Ͻ 0.001) or S3 (t (59) ϭ 11.00, p Ͻ 0.001). This likely reflects fatigue from the exposure phase in S1. As both groups showed the same pattern, this cannot explain differences between the groups over the retention intervals.

Sleep stage analysis
We calculated correlations between time spent in stage 2 sleep, SWS, rapid eye movement sleep, and overnight improvement in the recall/ recognition tasks, as well as overnight change in the lexical competition effect. We also evaluated correlations between sleep stages and performance in S1, S2, and S3. For each task, p values were corrected for multiple comparisons (Bonferroni). SWS duration correlated with overnight improvement in old-new categorization speed, with more SWS associated with greater decreases in RT (r ϭ 0.52, p ϭ 0.04; uncorrected p ϭ 0.003). No other tasks revealed significant correlations (supplemental Table S1, available at www.jneurosci.org as supplemental material).

Sleep spindle analysis
Noisy channels were removed before analysis. Average spindle counts were calculated for each participant over the remaining channels. Correlation coefficients were calculated for the number of NREM sleep spindles and overnight change in performance (see supplemental materials for spindle density measures, available at www.jneurosci.org) and level of performance in each session. No significant correlations were found in recall or recognition (supplemental Table S2, available at www. jneurosci.org as supplemental material). Importantly, overnight change in the magnitude of the lexical competition was highly correlated with spindle count (r ϭ 0.59, p ϭ 0.004; uncorrected p ϭ 0.001), with larger increases in competition associated with greater spindle activity (Fig. 3). A correlation was also found between the magnitude of the competition effect in S1 and spindle counts (r ϭ Ϫ0.50, p ϭ 0.02; uncorrected p ϭ 0.005), with larger S1 facilitation effects associated with higher spindle counts during the following night. Since participants who showed more facilitation in S1 tended to undergo a larger increase in the magnitude of the competition effect overnight (r ϭ Ϫ0.76, p Ͻ 0.001), we wanted to establish whether spindle activity was associated with one of these effects when the other was controlled for and to partial out NREM sleep duration. A partial correlation between spindle count and competition change overnight while holding the S1 facilitation effect and NREM sleep duration constant showed a significant correlation (r ϭ 0.38, p ϭ 0.047, uncorrected). A partial correlation looking at the relationship between spindle count and S1 facilitation while holding overnight change and NREM sleep duration constant showed no correlation ( p ϭ 0.52). To confirm that spindle activity was associated with overnight change only in lexical competition when taking the other tests into account, we ran a multiple regression on the spindle counts using overnight change in all tests as regressors. Only lexical competition was associated with spindle activity (␤ ϭ 0.62, p Ͻ 0.001). Tolerance statistics indicated no problems regarding multicollinearity. Independent analyses of stage 2 and SWS spindles followed the same pattern (supplemental materials, available at www.jneurosci.org). Correlations between spindle activity and change in the lexical competition effect overnight were calculated separately for each scalp electrode to see if the effect was carried by specific sites (e.g., left hemisphere sites may be crucial in language learning tasks).
All electrodes showed a significant correlation (corrected p values Ͻ 0.05) apart from F4, which did not survive Bonferroni correction. However, there may have been too few sites to uncover a topographical effect.
Spectral analysis revealed no significant correlations with any word learning measures in any of the frequency bands.

Discussion
We looked simultaneously at consolidation of item memory for newly learned words and their integration in the mental lexicon. Recall of the novel words improved overnight but not during the day. This benefit was probably dependent on a combination of synaptic and systems-level consolidation. More distinctively, novel items also showed an inhibitory influence on recognition of previously known (neocortically represented) words. Thus, the incorporation of the new material altered the way existing neocortical recognition systems operated. This integrative effect is likely to be a pure measure of systems-level consolidation, as revealed by a shift from hippocampal to neocortical representation (Takashima et al., 2006;Gais et al., 2007;Davis et al., 2009). Higher spindle counts predicted a larger increase overnight in the magnitude of the lexical competition effect, suggesting that sleep spindles are central to this sleep-mediated, systems-level consolidation.
The close temporal correlation between hippocampal ripples and neocortical spindles suggests that these events link neocortical and hippocampal cell assemblies, a key feature of two-step models of memory consolidation (Siapas and Wilson, 1998;Sirota et al., 2003). Spindles are also involved in triggering neural plasticity. Rosanova and Ulrich (2005) showed that a natural neural firing pattern recorded in vivo during spindles induced short-term potentiation and long-term potentiation (LTP) in rat pyramidal cells in vitro. Similarly, inducing LTP increases the reliability with which spindles can be electrically evoked (Werk et al., 2005). As spindles are temporally grouped by slow oscillations , spindle-related correlations may also reflect these oscillatory events, although the absent correlation between spectral power in the slow oscillation band and lexical integration suggests this was not the case here. We found no correlations between spindle activity and overnight novel word recall/recognition measures, suggesting that spindles are less important in strengthening explicit recall, which benefitted more from SWS.
High baseline spindle activity has been linked with higher IQ, with the exception of verbal IQ (Fogel et al., 2007). Although we did not collect IQ measures, we evaluated the relationship between word learning ability and spindles by considering correlations between spindle count and our explicit tasks measuring word learning success. No significant correlations emerged, suggesting that spindle activity here did not merely identify intelligent participants but was associated with lexical integration specifically.
We found no evidence of the novel words engaging in lexical competition immediately after exposure. As reported in some previous studies (Gaskell and Dumay, 2003), a facilitatory pattern was seen in S1, probably due to the test base words having been primed by exposure to the similar-sounding novel words during training. We found a competition effect in S2 in both the sleep (27 ms) and wake (19 ms) groups; however, Dumay and   Gaskell (2007) found it only in their sleep group, a discrepancy that may be due to differences in materials, tasks, or training schedule. Although sleep may be the optimal brain state for consolidation, some degree of lexical integration can also occur during wakefulness. This has been observed in particular when training is distributed over the course of a day rather than delivered at once (Lindsay and Gaskell, 2009). Such effects might also arise spontaneously. For example, sufficient numbers of our wake participants may have rehearsed the novel words over the course of the day, leading to some consolidation. Such wakerelated consolidation probably has a different neural basis from the sleep-related consolidation that is the focus of our PSG data, and remains an avenue for future research.
As the purpose of language is to convey meaning, it is striking that the lexical system integrates novel words that have no given meaning. Meaning may be inconsequential for our measure of lexical integration, but participants nonetheless tend to infer meaning from the overlap between the novel word and familiar words, for example cathedruke is often considered associated with cathedrals (Dumay et al., 2004). Research is underway looking at whether knowledge of novel word meanings benefits from consolidation.
Sleep seems to play an active role in integrating new words in the lexicon through spindle activity, and it also enhanced explicit recall and recognition. In the sleep group, improvements were found after a night of sleep. In the wake group, no improvements occurred after a day of wake, but improvements comparable to those in the sleep group emerged in S3, presumably due to sleep occurring after S2. Sleep seems to provide the optimal neural environment for consolidation of explicit knowledge of new vocabulary (Gais et al., 2006). A potential criticism of the sleepspecific nature of these improvements might involve circadian influences, as performance may simply be better in the morning. However, S2 and S3 took place at the same circadian time, therefore performance in these sessions should not differ if circadian effects alone were operating. Yet the wake group showed an S2-S3 change, so performance gains were determined by timing of sleep rather than mere circadian influences. Measures of subjective sleepiness also revealed no difference between the groups.
Converging evidence for an active role of sleep in explicit recall/recognition improvement comes from the PSG data. Gains in recognition speed overnight were predicted by SWS duration: more SWS was associated with greater overnight decreases in RT. These data join other findings suggesting that SWS is crucial in consolidating declarative memories (Diekelmann and Born, 2010). SWS did not correlate with the recall measures, although this is likely due to low levels of recall that may not provide enough variability to detect an association.
To allow an organism to make use of newly acquired information effectively, new memories must be integrated in networks of knowledge already in place. This integration process has so far been assessed in terms of the extent to which separate new pieces of information are combined into a more structured whole (Ellenbogen et al., 2007). The integration we have observed here goes further in that it requires newly learned material to be related to material that is fully consolidated in neocortical lexical memory. As such, we suggest that it is the clearest evidence to date of a function for sleep in memory integration. Furthermore, we identify NREM spindles as central to this crucial component of memory consolidation.