Testing is much more than a measure of memory; taking a test, or retrieval practice, modifies memory (for a recent review, see Roediger & Butler, 2011). A well-studied demonstration of this phenomenon is seen in the testing effect, the finding that retrieval enhances subsequent retention (e.g., Roediger & Karpicke, 2006). However, retrieval may have another enhancing effect that has been largely overlooked: It may enhance subsequent encoding, an effect known as test-potentiated learning (Arnold & McDermott, 2012; Izawa, 1971).

The dearth of research on test-potentiated learning is especially notable for free recall tests, possibly in part because of a conclusion that Tulving (1967) came to decades ago. Tulving claimed that in multitrial free recall paradigms, tests and study trials have equivalent effects on learning. He concluded that subsequent recall “depends primarily on the total amount of time spent on the task, and that it is relatively little affected, if at all, by the distribution of this time between studying and recalling the material” (p. 181). That is, practicing recalling items has the same effect on learning as does studying the material.

However, Tulving (1967) also observed that the mechanisms underlying the equivalent effects of study and test trials were not the same. In a condition with three successive tests between study trials, forgetting occurred after the first test, but this loss was counteracted by a large increase in recall after each study trial. That is, in conditions with consecutive test trials, more learning seemed to occur on subsequent study trials than in conditions without consecutive test trials. These results suggest the additional tests may have potentiated learning during study.

Later researchers demonstrated that the distribution of study and test trials does affect learning, and that taking free recall tests between study trials enhances performance (Donaldson, 1971; Karpicke & Roediger, 2007; Lachman & Laughery, 1968; Roediger & Smith, 2012; Rosner, 1970). Although this research has shown that free recall tests enhance performance, how these tests do so is still unknown. Is learning enhanced because tests directly improve retention of the retrieved items (i.e., the testing effect), as was suggested by Donaldson? Or is performance enhanced because free recall tests potentiate subsequent learning, an indirect effect of testing, as was suggested by Rosner?

By manipulating both the number of prior tests and whether or not the material was restudied, we can identify whether the benefit incurred from restudying is enhanced by a recall test preceding the restudy phase. That is, do prior recall tests boost, or potentiate, the enhancement seen from restudying the material? To answer this question, in Experiment 1 we varied the number of initial tests (zero or three) and whether or not a restudy opportunity occurred following the initial tests. Taking initial tests boosted the amount of information acquired from the restudy phase (i.e., the difference in final recall between the groups who did and did not receive restudy was greater for participants who had taken initial tests). In Experiment 2, these findings were extended and replicated in an Internet sample to ensure the replicability and generalizability of the initial findings. Finally, in Experiment 3 we explored the role that enhanced organization may play in test-potentiated learning. Previous research had shown that testing enhances organization and that this enhancement partially underlies the testing effect (Zaromb & Roediger, 2010). We asked whether enhanced organization from testing might also underlie the test-potentiated learning effect.

Experiment 1

Do free recall tests potentiate learning during subsequent study? This question was addressed in an undergraduate population using a between-subjects design.

Method

Participants

One hundred and seventy-three Washington University in St. Louis undergraduate students participated in exchange for class credit or $10. The policies of the University’s Human Studies Committee were followed for all experiments.

Design

A 2 (initial tests: zero, three) ×2 (restudy, no restudy) between-subjects factorial design was used (see Fig. 1). Each participant was randomly assigned to one of four between-subjects conditions: initial tests and restudy (n = 43), initial tests but no restudy (n = 43), no initial tests but restudy (n = 44), or no initial tests and no restudy (n = 43).

Fig. 1
figure 1

Designs of Experiments 1 and 2. Each row represents a between-subjects condition. S, study trial; T, test trial; subscripts indicate the numbers of pictures studied

Prior to this main task, all of the participants completed a separate recall task, which was included as a baseline measure of individual memory performance.

Materials

For the baseline memory task, three lists of 15 related words from Roediger and McDermott’s (1995) report served as the stimuli. Each list contained words (e.g., nurse, sick, and lawyer) related to one target item (e.g., doctor) that never appeared in the list. All words were unrelated to the images used in the main experiment. The lists were studied in an order randomized across participants. Within each list of related words, items were presented in a random order.

In the main task, the stimuli were 40 line drawings taken from the Snodgrass and Vanderwart (1980) norms. The images were chosen for their high name agreement (86 %–100 %) and depicted simple concrete nouns (e.g., a carrot). On study trials, the order of images was randomized for each participant.

Procedure

The participants were tested in groups of one to four. Instructions were presented on the computer and were the same for all participants.

In Part 1 (the baseline measure), participants studied 45 words. Each word was presented individually on the screen for 3 s, with a 500-ms interstimulus interval. Participants were then given 3 min to recall as many of the items as they could in any order via the computer keyboard.

During Part 2 (the main experiment), participants first studied 40 pictures, which were presented individually on the screen for 3 s, with a 250-ms interstimulus interval. The participants then worked on math problems for 30 s to eliminate primary memory effects. Next, those in conditions with initial testing were given 3 min to recall as many of the pictures as they could, in any order, by typing the name of the object depicted in each picture. All of the responses remained onscreen for the duration of the test. This procedure was repeated twice, for a total of three tests. During this time, the participants in the no-initial-test conditions played three games of Tetris, each lasting 3 min. Next, half of the participants restudied the pictures, and the other half worked on math problems. Restudy followed the same procedure as the initial study. All of the participants then worked on math problems for an additional 30 s before taking a final test. The participants were given 5 min to recall as many pictures as possible.

Results and discussion

Baseline measure

A one-way analysis of variance (ANOVA) between the four conditions revealed no significant differences, F < 1, and therefore, the baseline measure was not included in the remaining analyses.

Main experiment

The final-test dataFootnote 1 were analyzed using a 2 (initial tests: zero, three) ×2 (restudy, no restudy) between-subjects ANOVA. Restudying enhanced later recall relative to not restudying (M = 0.61 vs. 0.43; see Fig. 2), F(1, 169) = 50.87, p < 0.001, η p 2 = 0.23. Similarly, taking initial tests enhanced later recall relative to not taking initial tests (M = 0.57 vs. 0.47), F(1, 169) = 15.07, p < 0.001, η p 2 = 0.08.

Fig. 2
figure 2

Proportions of items recalled on the final test in Experiment 1. The mean differences between the restudy and no-restudy conditions for the two test conditions are displayed above the respective bars. Error bars represent standard errors of the means

Test-potentiated learning would be substantiated by an interaction between initial tests and restudy conditions. That is, the difference between the restudy and no-restudy conditions should be larger in the condition with initial tests, indicating that the benefit of having a restudy trial was enhanced when initial tests had been taken. As can be seen in Fig. 2, this pattern emerged; the difference between the restudy and no-restudy conditions was larger when initial tests had been taken (M = 0.24) than when they had not been taken (M = 0.11), F(1, 169) = 8.21, p = 0.005, η p 2 = 0.05, indicating that the initial tests potentiated learning during the restudy trial.

Another way to interpret this interaction would be to consider that a testing effect occurred only when participants restudied. There was no difference between the zero- and three-test conditions when participants had not restudied, t < 1. In contrast, when participants had restudied, recall was greater in the three-test condition (M = 0.69) than in the zero-test condition (M = 0.53), t(85) = 4.83, p < 0.001, d = 1.03. One might wonder whether this lack of a testing effect is problematic, in that a robust literature has demonstrated testing effects in the absence of a restudy condition. These testing effects, however, often do not emerge until after a delay (Roediger & Karpicke, 2006), and the present experiment was performed within a single session.

Experiment 2

Experiment 1 demonstrated that taking free recall tests potentiated learning during a subsequent restudy trial. Experiment 2 was designed to replicate this finding and extend it to a paradigm in which restudy was manipulated within participants. Furthermore, a more diverse population was used. Rather than testing college undergraduates, participants were recruited through the Internet using the Amazon Mechanical Turk.

Method

Participants

One hundred and eighty participants completed the experiment on the Internet via the Amazon Mechanical Turk in exchange for $2. Thirty-three of the participants were excluded from the analyses because, on a postexperimental question, they indicated that they had written words and/or picture names during the study phases. Additionally, seven participants were excluded because they failed to follow the instructions. After these exclusions, 140 participants remained in the final analyses.

Design and materials

Prior to the main task, a separate recall test, identical to that used in Experiment 1, was given as a baseline measure of memory performance.

The main experiment had a 2 (initial tests: zero, two) ×2 (restudy, no restudy) mixed factorial design, with initial tests manipulated between-subjects and restudy manipulated within-subjects. The participants were randomly assigned to the initial-tests (n = 71) or the no-initial-tests (n = 69) condition. The stimuli were the same as in Experiment 1.

Procedure

The procedure was similar to that of Experiment 1, with two major changes: (1) Participants were tested online and (2) all participants restudied half of the pictures. Which pictures were restudied was determined randomly for each participant.

Additional changes were made to accommodate the online testing format. Participants answered demographic questions prior to the beginning of the experiment, as well as postexperimental questions after completing the study. Furthermore, the participants studied each picture for 4 s (rather than 3 s), with a 500-ms (rather than 250-ms) interstimulus interval. Also, those in the initially tested condition took two (rather than three) initial free recall tests. Similarly, those who were not given initial tests played two (rather than three) games of Tetris. Finally, on the final test, the participants had 3 min (rather than 5 min) to recall the pictures.

Results and discussion

Baseline measure

A t test revealed a marginally significant difference in the proportions of words recalled in the baseline measure by participants who did (M = 0.38) and did not (M = 0.43) subsequently receive initial tests in the main experiment, t(138) = 1.86, p = 0.07, d = 0.32. Because of this marginal effect, in analyses of the main experiment, baseline memory performance was used as a covariate to ensure that any differences between conditions were not due to preexperimental differences. However, including this baseline measure as a covariate did not change any conclusions (see the supplemental materials).

Main experiment

The final-test data were analyzed using a 2 (initial tests: zero, two) ×2 (restudy, no restudy) mixed analysis of covariance (ANCOVA). The proportion of words recalled on the baseline measure was used as the covariate.

Restudied pictures were more likely to be recalled than those that were not restudied (M = 0.62 vs. 0.26; see Fig. 3), F(1, 137) = 446.54, p < 0.001, η p 2 = 0.77. Unlike in Experiment 1, we found no main effect of testing; taking initial tests did not significantly enhance recall relative to not taking initial tests (M = 0.45 vs. 0.43), F < 1 (however, three tests had been used in Experiment 1, rather than two).

Fig. 3
figure 3

Adjusted means indicating the proportions of items recalled on the final test in Experiment 2. Adjusted mean differences between the restudy and no-restudy conditions for the two test conditions are displayed above the respective bars. Error bars represent standard errors of the means

Test-potentiated learning would be indicated by a greater benefit of restudying pictures in the initial-tests condition than in the no-initial-tests condition. As Fig. 3 illustrates, this pattern was found; a larger difference was apparent between the proportions of restudied and not-restudied pictures recalled in the initial-tests condition than in the no-initial-tests condition (M = 0.42 vs. 0.31), F(1, 137) = 9.32, p < 0.003, η p 2 = 0.06.

As in Experiment 1, this interaction can also be interpreted as indicating a significant difference between the initial-test conditions for items that were restudied, F(1, 137) = 6.09, p = 0.02, η p 2 = 0.04, but not for items that were not restudied, F(1, 137) = 1.30, p = 0.26, η p 2 = 0.009. These results again show that a significant testing effect occurred only when participants were given the opportunity to restudy.

Experiment 3

Experiments 1 and 2 demonstrated that taking initial free recall tests can potentiate learning during a subsequent restudy trial. In Experiment 3, we explored why tests have this potentiating effect. Several previous researchers have suggested that tests enhance learning by improving the organization of already-learned material (e.g., Donaldson, 1971; Lachman & Laughery, 1968; Rosner, 1970). More recent work by Zaromb and Roediger (2010) provided evidence that free recall tests improve organization, and that this improvement partially underlies the testing effect in free recall. Does this enhancement also underlie the test-potentiated learning effect in free recall? Improving the organization of already-learned material may increase the ability to encode new items, by creating or improving a structure, or schema, that can be used to incorporate new items with already-learned items.

To test this hypothesis, in Experiment 3 we used categorized words. This change allowed organization to be measured through clustering, or recalling members of the same category together. Our measurement of clustering was adjusted for the total number of items recalled by using the adjusted ratio of clustering (ARC; Roenker, Thompson & Brown, 1971). If enhanced organization underlies test-potentiated learning, organization on an initial test should be related to the amount of information learned during a subsequent restudy trial. That is, more organized recall prior to restudying should be related to more learning during restudy. The relationship between prior organization and subsequent learning was tested in a correlational study.

Method

Participants

Sixty-two participants completed the experiment via the Amazon Mechanical Turk in exchange for $3. Seven of the participants were excluded from the final analysis because they reported writing down words and/or picture names. After these exclusions, 55 participants remained in the final analysis.

Design

This was a correlational study. A baseline task was given prior to the main task, and in that main task, all participants took three initial tests and restudied all items.

Materials

For the baseline task, 30 line drawings of easily identifiable nouns (all unrelated to the categorized words) were chosen from the Snodgrass and Vanderwart (1980) norms.

For the main task, five medium-frequency words from eight categories (total of 40 words) were chosen from the expanded and updated version of the Battig and Montague word norms (Van Overschelde, Rawson & Dunlosky, 2004).

Procedure

As in Exp. 2, the participants were tested online via the Amazon Mechanical Turk and answered demographic questions before and postexperimental questions after the experiment.

The procedure was similar to that of Experiment 1—specifically, to the initial-tests-with-restudy condition—with two differences: On the baseline task, participants learned 30 (rather than 45) items, and on the final test, participants had 3 min (rather than 5 min) to recall items.

Results and discussion

The role that organization may play in test-potentiated learning was examined by measuring the correlation between organization on the test prior to the restudy trial and learning during the restudy trial. Organization on the test prior to the restudy trial was measured using ARC scores, which can range from −1.0 to 1.0, with 1.0 indicating perfect organization, 0 indicating chance-level organization, and negative scores indicating below-chance-level organization.

Learning on the restudy trial was estimated using a conditional probability measure: the proportion of items recalled on the final test, given that they had not been recalled on any previous test. If organization of already-learned material improves subsequent learning, greater organization prior to restudying should be related to more learning during the restudy trial. As can be seen in Fig. 4, this pattern was found:Footnote 2 Higher ARC scores on the test prior to the restudy trial were associated with a greater proportion of items learned during the restudy trial, r(49) = 0.31, p = 0.03.

Fig. 4
figure 4

Scatterplot of the proportions of words recalled on the final test given that they had not been recalled on any previous test, for each participant in Experiment 3, graphed as a function of adjusted-ratio-of-clustering (ARC) scores from the test prior to the restudy trial

However, this relationship could be driven by a third variable. Specifically, individuals with better “memory ability” could tend to both have better organization and learn more during study trials. To test this possibility, the baseline memory measure was used as a covariate. When controlling for this estimate of memory ability, the correlation remained significant, r(48) = 0.34, p = 0.02. That is, higher ARC scores were still associated with greater learning, suggesting that differences in memory ability did not drive this relationship.

General discussion

The primary finding in this report is that free recall tests potentiate learning during subsequent restudy trials. The benefit of restudying the material was enhanced when initial free recall tests had been taken. This pattern was obtained when restudy was manipulated both between-subjects (Exp. 1) and within-subjects (Exp. 2), and in both an undergraduate population (Exp. 1) and a more diverse population recruited online (Exp. 2).

This potentiating effect may at least in part be due to enhanced organization. Previous research has shown that testing improves organization (Zaromb & Roediger, 2010). Experiment 3 indicated that better organization prior to restudying was associated with more learning during restudying. This relationship does not seem to be mediated by memory ability. Although this finding is only correlational, it suggests that tests may potentiate learning by enhancing organization prior to learning.

Other explanations of test-potentiated learning

Other hypotheses have been proposed to explain test-potentiated learning. These alternative hypotheses are not mutually exclusive with the enhanced-organization hypothesis. Multiple factors may contribute to test-potentiated learning.

One such hypothesis is that test-potentiated learning may be driven by enhanced metacognitive knowledge. That is, tests may increase metacognitive accuracy (Roediger & Karpicke, 2006), which could be used to improve restudy strategies. For instance, testing may allow participants to better determine which items they cannot remember, and therefore which items they should focus on during the next restudy opportunity (Lachman & Laughery, 1968).

Recent work using functional magnetic resonance imaging has suggested another possible underlying mechanism for the enhancing effect of tests. Nelson, Arnold, Gilmore, Najjar, Finn and McDermott (2012) observed greater activation in the left posterior inferior parietal lobule during restudy of word pairs that had been tested on a cued recall test than during restudy of pairs not previously tested. That specific region has previously been associated with successful recognition memory (McDermott, Szpunar & Christ, 2009; Nelson et al., 2010), an observation that led the authors to suggest that the initial tests may have increased the tendency for study-phase retrieval, or remindings (Hintzman, 2004), during subsequent restudy. Although this work involved a different set of procedures than those used here, the results provide an intriguing possibility.

Several hypotheses have been proposed to explain the finding of enhanced encoding following a failed generation attempt (Grimaldi & Karpicke, 2012; Hays, Kornell & Bjork, 2012; Kornell, Hays & Bjork, 2009; see also Slamecka & Fevreiski, 1983), or what could be called generate-potentiated learning. In this paradigm, there is no initial study, and participants are asked either to guess the target of a cue word (e.g., tide–?) before studying the complete pair (e.g., tidebeach) or to study the pair without the initial guess. Final recall is enhanced for pairs that have an initial guess. The hypothesis favored by Grimaldi and Karpicke (2012), which they called the search set theory, posits that attempting to guess the target initiates a search process that activates related items. The experimentally defined target and items related to the target may become activated even if they are not given as the response, and this activation may enhance encoding when the target is subsequently presented.

However, this theory does not seem to generalize to free recall learning, especially in a paradigm in which the stimuli are unrelated to each other, as was the case in the first two experiments presented here. Furthermore, Grimaldi and Karpicke (2012) proposed that this activation process is very short-lived and that the enhancing effect only occurs when the target is presented immediately after the retrieval attempt. In the experiments presented here, the delay between the initial tests and restudy suggests that any activation would have already dissipated.

Conclusion

These experiments provide strong evidence that free recall tests have a potentiating effect on subsequent study and suggest that enhanced organization may underlie this potentiating effect. They introduce a new paradigm for studying test-potentiated learning and provide the first steps to a better understanding of the role of retrieval in learning. Not only does retrieval directly benefit future recall, but it also prepares the learner for future learning. In short, subsequent memory is enhanced by retrieval practice and by repeated study, and the combination of the two is an especially potent memory enhancer.