10
Retrieval (Sometimes) Enhances Learning: Performance Pressure Reduces the Benets of Retrieval Practice SCOTT R. HINZE 1,2 * and DAVID N. RAPP 1 1 Department of Psychology and School of Education & Social Policy, Northwestern University, Evanston, USA 2 Department of Psychology, Virginia Wesleyan College, Norfolk, USA Summary: Academic testing has received substantial support as a useful educational activity with robust retention benets, given that tests can promote retrieval practice. However, testing can also instantiate performance-related pressure and anxiety that may misappropriate the resources responsible for producing learning benets. The current project examined the effects of performance pressure on retrieval practice. In two experiments, we instantiated performance pressure with either high-stakes or low-stakes quizzes, and assessed memory and comprehension of content on both quizzes and nal tests. Quiz performance was equivalent under high-stakes and low-stakes conditions, demonstrating that learners adapted to high-pressure quizzes. However, nal test performance was better after low-stakes versus high-stakes quizzes, and only low-stakes quizzes led to a performance advantage over a rereading control group. Participants additionally exhibited some sensitivity to the difculty of retrieving under pressure. These data highlight the benets of retrieval practice but indicate that they can be disrupted under pressure-driven conditions. Copyright © 2014 John Wiley & Sons, Ltd. Testing effects occur when completing a quiz enhances per- formance on a follow-up test on the same or related items (Roediger & Karpicke, 2006a). In a typical experiment, after an initial study period, participants practice retrieving studied information from memory with a quiz or simply restudy the information. The restudy condition serves as an important control, isolating any effects of quizzing to the retrieval of information rather than simply to extra exposure or study time (Carrier & Pashler, 1992). Following a delay, performance on a nal test is assessed. Participants consis- tently remember quizzed information better than information that was merely restudied (Butler, 2010; Hinze & Wiley, 2011; Roediger & Karpicke, 2006b). These ndings suggest that people learn as a function of retrieval practice. To date, discussions of this effect have focused on the benets of retrieval practice, with researchers recommending frequent low-stakes quizzes as an educational intervention (e.g., Pashler et al., 2007). But generally speaking, tests in- uence learners in ways that go beyond solely cognitive effects. For example, tests can moderate student attitudes and affect (e.g., anxiety, motivation, and self-efcacy; Crooks, 1988), depending on the conditions and pressures involved with the testing situation. The negative effects of test anxiety are well established, with stress and worry impeding performance on tasks including problem solving, IQ assess- ments, math evaluations, classroom performance, and stan- dardized exams (Cassady & Johnson, 2002; Hembree, 1988; Lang & Lang, 2010; Ramirez & Beilock, 2011; Tobias, 1985). These negative effects have been attributed to disrup- tions in executive functioning or attentional control (Eysenck, Derakshan, Santos, & Calvo, 2007), as commonly emerge in situations involving test anxiety (Schwarzer & Jerusalem, 1992). By these accounts, intrusive thoughts, self-evaluations, and worries about performing poorly arise from anxiety, capturing resources that under optimal conditions would be devoted to thinking about and answering test questions. Given evidence-based recommendations for the use of testing in educational settings, it is important to determine whether test-related anxiety may attenuate retrieval practice benets. Any interactions between studentsaffective re- sponses to testing and the effect of tests on retention would necessitate careful consideration as to the support that tests may offer for learning. For example, participants with low executive resources, but high trait test anxiety, seem to retain less information than their counterparts following retrieval practice (Tse & Pu, 2012), suggesting that retrieval practice may be less effective for some participants than others. But an additional consideration is that the conditions under which a test is taken (e.g., high-stakes or low-stakes quizzes) may prove a critical point of concern for any recommenda- tions. It is an open question how well the benets of retrieval practice generalize to high-stakes situations, given that most existing implementations of retrieval practice have intention- ally focused on low-stakes rather than high-stakes quizzing (e.g., Carpenter, Pashler, & Cepeda, 2009; McDaniel, Agarwal, Huelser, McDermott, & Roediger, 2011). It may be the case that frequent low-stakes quizzes specically, and not just quizzes and tests in general, support student learning by providing practice with test formats and encour- aging self-efcacy (Bangert-Drowns, Kulik, & Kulik, 1991). The purpose of the current study was to explore the inuence of test anxiety, particularly responses to high-stakes quizzes, on learning from retrieval practice. Test anxiety is consistently associated with poor perfor- mance and can inuence processing before, during, and after a test is taken (Cassady, 2004; Naveh-Benjamin, 1991). Recent projects have shown that manipulations of perfor- mance-related anxiety (i.e., state rather than trait anxiety) analogously diminish performance on cognitively demanding tasks (Beilock, Kulp, Holt, & Carr, 2004; Hayes, MacLeod, & Hammond, 2009). One method of instantiating such anxiety involves inducing worry about ones performance relative to peers (DeCaro, Thomas, Albert, & Beilock, 2011). For in- stance, Beilock et al. (2004) induced performance pressure by informing participants that they could earn a monetary *Correspondence to: Scott Hinze, Department of Psychology, Virginia Wesleyan College, Norfolk, VA23502, USA. E-mail: [email protected] Copyright © 2014 John Wiley & Sons, Ltd. Applied Cognitive Psychology, Appl. Cognit. Psychol. 28: 597606 (2014) Published online 22 April 2014 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/acp.3032

Retrieval (Sometimes) Enhances Learning: Performance Pressure Reduces the Benefits of Retrieval Practice

  • Upload
    david-n

  • View
    297

  • Download
    16

Embed Size (px)

Citation preview

Page 1: Retrieval (Sometimes) Enhances Learning: Performance Pressure Reduces the Benefits of Retrieval Practice

Retrieval (Sometimes) Enhances Learning: Performance Pressure Reduces theBenefits of Retrieval Practice

SCOTT R. HINZE1,2* and DAVID N. RAPP1

1Department of Psychology and School of Education & Social Policy, Northwestern University, Evanston, USA2Department of Psychology, Virginia Wesleyan College, Norfolk, USA

Summary: Academic testing has received substantial support as a useful educational activity with robust retention benefits, giventhat tests can promote retrieval practice. However, testing can also instantiate performance-related pressure and anxiety that maymisappropriate the resources responsible for producing learning benefits. The current project examined the effects of performancepressure on retrieval practice. In two experiments, we instantiated performance pressure with either high-stakes or low-stakesquizzes, and assessed memory and comprehension of content on both quizzes and final tests. Quiz performance was equivalentunder high-stakes and low-stakes conditions, demonstrating that learners adapted to high-pressure quizzes. However, final testperformance was better after low-stakes versus high-stakes quizzes, and only low-stakes quizzes led to a performance advantageover a rereading control group. Participants additionally exhibited some sensitivity to the difficulty of retrieving under pressure.These data highlight the benefits of retrieval practice but indicate that they can be disrupted under pressure-driven conditions.Copyright © 2014 John Wiley & Sons, Ltd.

Testing effects occur when completing a quiz enhances per-formance on a follow-up test on the same or related items(Roediger & Karpicke, 2006a). In a typical experiment, afteran initial study period, participants practice retrievingstudied information from memory with a quiz or simplyrestudy the information. The restudy condition serves as animportant control, isolating any effects of quizzing to theretrieval of information rather than simply to extra exposureor study time (Carrier & Pashler, 1992). Following a delay,performance on a final test is assessed. Participants consis-tently remember quizzed information better than informationthat was merely restudied (Butler, 2010; Hinze & Wiley,2011; Roediger & Karpicke, 2006b). These findings suggestthat people learn as a function of retrieval practice.To date, discussions of this effect have focused on the

benefits of retrieval practice, with researchers recommendingfrequent low-stakes quizzes as an educational intervention(e.g., Pashler et al., 2007). But generally speaking, tests in-fluence learners in ways that go beyond solely cognitiveeffects. For example, tests can moderate student attitudesand affect (e.g., anxiety, motivation, and self-efficacy;Crooks, 1988), depending on the conditions and pressuresinvolved with the testing situation. The negative effects of testanxiety are well established, with stress and worry impedingperformance on tasks including problem solving, IQ assess-ments, math evaluations, classroom performance, and stan-dardized exams (Cassady & Johnson, 2002; Hembree, 1988;Lang & Lang, 2010; Ramirez & Beilock, 2011; Tobias,1985). These negative effects have been attributed to disrup-tions in executive functioning or attentional control (Eysenck,Derakshan, Santos, & Calvo, 2007), as commonly emerge insituations involving test anxiety (Schwarzer & Jerusalem,1992). By these accounts, intrusive thoughts, self-evaluations,and worries about performing poorly arise from anxiety,capturing resources that under optimal conditions would bedevoted to thinking about and answering test questions.

Given evidence-based recommendations for the use oftesting in educational settings, it is important to determinewhether test-related anxiety may attenuate retrieval practicebenefits. Any interactions between students’ affective re-sponses to testing and the effect of tests on retention wouldnecessitate careful consideration as to the support that testsmay offer for learning. For example, participants with lowexecutive resources, but high trait test anxiety, seem to retainless information than their counterparts following retrievalpractice (Tse & Pu, 2012), suggesting that retrieval practicemay be less effective for some participants than others. Butan additional consideration is that the conditions underwhich a test is taken (e.g., high-stakes or low-stakes quizzes)may prove a critical point of concern for any recommenda-tions. It is an open question how well the benefits of retrievalpractice generalize to high-stakes situations, given that mostexisting implementations of retrieval practice have intention-ally focused on low-stakes rather than high-stakes quizzing(e.g., Carpenter, Pashler, & Cepeda, 2009; McDaniel,Agarwal, Huelser, McDermott, & Roediger, 2011). It maybe the case that frequent low-stakes quizzes specifically,and not just quizzes and tests in general, support studentlearning by providing practice with test formats and encour-aging self-efficacy (Bangert-Drowns, Kulik, & Kulik, 1991).The purpose of the current study was to explore the influenceof test anxiety, particularly responses to high-stakes quizzes,on learning from retrieval practice.

Test anxiety is consistently associated with poor perfor-mance and can influence processing before, during, and aftera test is taken (Cassady, 2004; Naveh-Benjamin, 1991).Recent projects have shown that manipulations of perfor-mance-related anxiety (i.e., state rather than trait anxiety)analogously diminish performance on cognitively demandingtasks (Beilock, Kulp, Holt, & Carr, 2004; Hayes, MacLeod,& Hammond, 2009). One method of instantiating such anxietyinvolves inducing worry about one’s performance relative topeers (DeCaro, Thomas, Albert, & Beilock, 2011). For in-stance, Beilock et al. (2004) induced performance pressureby informing participants that they could earn a monetary

*Correspondence to: Scott Hinze, Department of Psychology, VirginiaWesleyan College, Norfolk, VA23502, USA.E-mail: [email protected]

Copyright © 2014 John Wiley & Sons, Ltd.

Applied Cognitive Psychology, Appl. Cognit. Psychol. 28: 597–606 (2014)Published online 22 April 2014 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/acp.3032

Page 2: Retrieval (Sometimes) Enhances Learning: Performance Pressure Reduces the Benefits of Retrieval Practice

reward for themselves and a (fictional) partner if theyimproved their math performance over time. These pressure-laden instructions hurt participants’ performance on subse-quent trials, specifically for difficult problems. This methodallows for experimentally manipulating performance-relatedanxiety during quizzes that is reasonably representative ofthe types of pressure students report feeling in academic con-texts (Sarason & Sarason, 1990).

In this study, we assessed the role of performance pressureduring quizzes on the subsequent retention of information.We were particularly interested in induced pressure duringquizzes to determine whether evidence-based recommenda-tions for repeated testing should be limited to low-stakes,formative quizzes (e.g., McDaniel et al., 2011) or whetherhigher-stakes summative quizzes would be equally effective.For this reason, we varied pressure during quizzing andassessed long-term retention under optimal low-stakesconditions. We used texts on biology topics and final testsassessing multiple levels of comprehension, in order toensure that the results would be representative of education-ally relevant learning demands. With identical retrievalinstructions, participants engaged in either low-stakes orhigh-stakes quizzing, or reread the presented material. Aftera 7-day delay, performance on final transfer tests provided ameasure of the long-term effects of pressure on retrievalpractice benefits. The rereading condition in Experiment 1allows for a replication of the ‘testing effect’, with anylong-term advantage for low-stakes quizzing over rereadingattributed to retrieval practice rather than simple exposuretime. Beyond replication, Table 1 presents competing hy-potheses for the effects of performance pressure on quizzesand final tests that we describe in detail next.

Performance pressure could potentially reduce the benefitsof quizzing in at least two ways. First, the resource-demandingeffects of state anxiety may prevent participants fromsuccessfully retrieving information during quizzes. Becauseperformance on final tests is highly conditional on successfulretrieval during practice (e.g., Kang, McDermott, &Roediger, 2007; McDaniel & Masson, 1985), disrupting re-trieval success should have negative consequences for finaltest performance. This retrieval disruption hypothesis pre-dicts poorer performance under high-stakes than low-stakesconditions for both quizzes and final tests.

A second possibility is that pressure may reduce retentionon final tests independent of any effects on quizzes. Thisprediction assumes that, beyond searching for and recallingor recognizing content from memory, retrieval practiceinvolves additional processes that help to change and/orstabilize memory representations. Some accounts hold thatelaborative or effortful processing during retrieval practice iscrucial for long-term retention (Carpenter, 2009; Johnson &Mayer, 2009; Pyc & Rawson, 2009), as tests provide an

opportunity to reorganize (Zaromb& Roediger, 2010) or elab-orate (Carpenter, 2011; Hinze, Wiley, & Pellegrino, 2013)memory traces over time. Consistent with these accounts, neu-rological evidence suggests that testing effects may be depen-dent on processes associated with re-encoding, in addition toretrieval (Wing, Marsh, & Cabeza, 2013). Finally, some ac-counts hold that retrieval practice involves the refinement ofmemory traces (Karpicke & Smith, 2012), an active processthat is differentiable from the mere generation of content(Karpicke & Zaromb, 2010). These effortful activities maybe especially dependent on the executive control processesdisrupted by performance pressure (Beilock & Carr, 2005)and anxiety (Ashcraft & Kirk, 2001). While performance pres-sure may or may not disrupt retrieval success during quizzes,that pressure may still interrupt other processes thought to beresponsible for the enhancement of long-term learning. Basedon this reasoning, the learning disruption hypothesis predictsthat the effects of pressure during quizzes will be apparenton final tests, independent of any effects on quiz performance,with decrements observed for participants who are quizzed un-der high-stakes relative to low-stakes conditions.In contrast to the predictions of reduced final test perfor-

mance following performance pressure, it is possible thatthe pressure manipulation could beneficially influence partic-ipants’ motivation to engage with the quizzes. If participantsrespond positively to the performance incentives, then thiscould support performance on quizzes. And this successwould potentially have downstream effects on the final tests.Thus, a retrieval motivation hypothesis predicts superior per-formance under high-stakes as compared with low-stakesconditions for both quizzes and final tests. We regarded,however, this hypothesis as unlikely given that the types ofperformance pressure instantiated here have generally notshown positive effects in prior research.

EXPERIMENT 1

Method

Participants

Sixty-three Northwestern undergraduates participated in theexperiment. Data from two participants were excluded fornoncompliance and computer error, respectively, leaving61 participants (39 female; Mage = 19.25, SDage = 1.25). Ingeneral, participants sampled from this selective universitypopulation were high-achieving, with high self-reportedACT scores (n = 42, M = 32.19, SD= 2.88) and grade pointaverage (n = 57, M= 3.49 out of 4, SD = 0.33), where avail-able. Upon recruiting, participants were informed that theywould receive $15 with an opportunity to earn a $5 bonus.In reality, all participants were paid $20.

Table 1. Hypotheses for the effects of performance pressure during quizzes on long-term retention

Quiz prediction Final test prediction

Retrieval disruption hypothesis Low-stakes> high-stakes Low-stakes> high-stakesLearning disruption hypothesis No difference Low-stakes> high-stakesRetrieval motivation hypothesis High-stakes> low-stakes High-stakes> low-stakes

598 S. R. Hinze and D. N. Rapp

Copyright © 2014 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 28: 597–606 (2014)

Page 3: Retrieval (Sometimes) Enhances Learning: Performance Pressure Reduces the Benefits of Retrieval Practice

Materials and procedureEach participant completed a learning session and a final testsession, separated by a 7-day delay. At the beginning of thelearning session, participants were informed that they wouldread texts and take tests on the materials. Participants thencompleted the 20-item trait portion of the state/trait anxietyinventory (STAI-T; Spielberger, Gorsuch, & Lushene,1988) and the 20-item test anxiety inventory (TAI;Spielberger, 1980). These measures were collected to ensureany group differences in state anxiety emerged in responseto the experimental manipulations rather than to trait anxiety.All participants read four biology texts on the topics of

vision, viral reproduction, the respiratory system, and metab-olism, presented in counterbalanced order. The texts were ofsimilar length (373 to 454 words), presented sequentially ona computer screen. Participants read each text for 5minutesand then immediately rated their interest in and prior knowl-edge of the content, followed by judgments of learning(JOLs). For JOLs, participants estimated the number of items(from 0 to 10) they would correctly answer on a 10-questiontest based on each topic. Interest ratings were provided on a4-point scale (1 =Very Boring, 4 =Very Interesting). Priorknowledge was rated on a 4-point scale based on the rangeof information the participants estimated having known priorto reading each text (with endpoints 1 = 0–25% and 4 = 75–100%). Scores on all rating scales were averaged across thefour texts to provide a continuous overall measure for eachjudgment type.After reading and rating all texts, participants were in-

formed about the potential $5 bonus. Participants in therereading and low-stakes quiz groups were told the bonuswas not dependent on future performance. They were prom-ised the bonus in order to maintain a similar reimbursementstructure for all participants during recruitment and prior toinitial reading. Participants in the high-stakes quiz group re-ceived performance pressure instructions adapted fromBeilock et al. (2004); they learned that they would earn thebonus only if they and a partner both scored higher thanthe university average on the quizzes. They were also in-formed that their partner had already completed the taskand scored above average, putting the onus on the participantto earn them the bonus money. In actuality, the partner wasfictional. Regardless of condition, participants were encour-aged to use the rereading or quizzes to prepare for the finaltests. After the performance pressure manipulation, partici-pants completed the 20-item state portion of the State-TraitAnxiety Inventory (STAI-S; Spielberger et al., 1988),assessing current feelings of anxiety, as a manipulation check.Next, participants in the low-stakes and high-stakes quiz

groups spent 5minutes attempting to recall the content fromeach text in the same presentation order. The quizzes in-cluded cues to each paragraph of the text, a supportemployed in previous studies with these materials to encour-age adequate quiz performance (Hinze et al., 2013). Partici-pants in the rereading group restudied each text for the same5-minute period. In all conditions, participants mademetacognitive judgments as to their interest and JOLs imme-diately following each reread/quiz task. We assessed thesemetacognitive judgments again because we were interestedin whether participants were sensitive to the potential costs

and benefits of high-stakes and low-stakes quizzes relativeto rereading. As an additional manipulation check, partici-pants also rated the pressure they experienced during eachreread/quiz task on a 4-point scale (with endpoints 1 =Nopressure and 4 =Very much pressure). All ratings wereaveraged across texts to provide continuous measures ofmetacognitive judgments and performance pressure.

After 7 days, participants returned for the final test session.To relieve performance pressure before the final test, allparticipants were assured that they would earn the $5 bonusregardless of performance. Participants then completed thefinal tests on computer. In order to assess the influence ofquizzing on deep learning, participants responded to novel(transfer) items addressing both retention and understandingof the materials (for discussion of the importance ofassessing transfer outcomes, see Butler, 2010; Carpenter,2012; Karpicke & Blunt, 2011). For each topic, participantsanswered 10 multiple choice questions (Hinze et al., 2013),with half assessing retention of specific facts and terms(detail items) and half assessing integration across multipletext sections (inference items). An open-ended applicationitem (scored as partially or completely correct) was alsopresented for each text that asked participants to apply con-cepts in a new situation. For example, for the vision text,the application item asked, ‘Based on what you read aboutvision, give two reasons why some animals might be ableto see better in the dark than others.’ Answers to theseopen-ended items were never directly offered in the text, re-quiring participants to make an inference, for example, basedon their understanding of how light passes through the eye oris received by photoreceptors in the retina. The order of thethree final test item types was counterbalanced, but the topicswere presented in the same order as in the learning session.

DesignInitial Study/Quiz condition varied between-participants(rereading, low-stakes quiz, high-stakes quiz). Thebetween-participant design eliminated the possibility ofpsychological and physiological responses to the pressuremanipulation carrying over from the high-stakes conditionto other conditions. The effect of Study/Quiz condition wasassessed for both quizzes and final tests, but because thereread condition did not include quizzes, only the quizzedgroups were compared in the analysis of quiz performance.Test Item Type for final tests (detail, inference, application)varied within-participants.

Results and discussion

No significant effects of order were obtained for text presen-tation or final test item type, so all results are presentedcollapsed across order. All results were significant at analpha level of .05 unless otherwise specified.

Manipulation checkWe conducted two manipulation checks to determinewhether the performance pressure manipulation was effec-tive. First, we analyzed self-reported state anxiety as mea-sured by the STAI-S immediately after the pressuremanipulation and before the study/quiz tasks. We performed

Performance pressure and retrieval practice 599

Copyright © 2014 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 28: 597–606 (2014)

Page 4: Retrieval (Sometimes) Enhances Learning: Performance Pressure Reduces the Benefits of Retrieval Practice

an ANCOVA including scores on the STAI-T as a covariateto control for any differences in trait anxiety. There was asignificant effect of Study/Quiz condition on STAI-S scores,F(2, 57) = 11.81, ηp2 = .29. The high-stakes group reportedhigher state anxiety (M = 36.09, SD = 7.91) than did thelow-stakes group [M= 30.82, SD= 7.29, F(1, 38) = 16.81,ηp2 = .31] and the rereading group [M = 30.91, SD = 9.11, F(1, 37) = 20.30, ηp2 = .35]. The low-stakes and rereadinggroups did not differ, F< 1. Participants in the Study/Quizgroups did not differ significantly on trait anxiety measuresbased on the TAI or STAI-T (Fs< 1.51). Second, weassessed self-reported pressure experienced during thestudy/quiz activities, averaged across the four texts. AnANCOVA controlling for STAI-T levels revealed a maineffect of Study/Quiz condition, F(2, 58) = 11.05, ηp2 = .28.Self-reported pressure was higher in the high-stakes(M = 2.36, SD= 0.74) than in the low-stakes [M= 1.89, SD =0.58, F(1, 38) = 7.62, ηp2 = .17] and rereading group[M = 1.60, SD = 0.45, F(1, 37) = 18.44, ηp2 = .33]. Pressurein the low-stakes group was higher than the rereading group[F(1, 38) = 4.15, ηp2 = .10], suggesting that participants mayhave felt some pressure simply from taking a quiz, althoughthis pressure was significantly higher when the quiz washigh-stakes rather than low-stakes.

Quiz performanceThe retrieval disruption hypothesis predicts that pressureshould reduce quiz performance in the high-stakes as com-pared with the low-stakes condition. The retrieval motiva-tion hypothesis predicts that pressure should increase quizperformance in the high-stakes as compared with the low-stakes condition. The learning disruption hypothesis, in con-trast, makes predictions only for final test performance andnot for quiz performance. We began by calculating thenumber of words in the recalls as a measure of recall length.Inconsistent with the two hypotheses, participants in the low-stakes group provided recalls of similar length (M = 100.12,SD = 27.38) to the high-stakes group (M = 96.18, SD =26.54), t< 1. We also analyzed the recalls for the qualityof explanation presented in each protocol, based on codingby Hinze and colleagues (2013; see also Lehman, Schraw,McCrudden, & Hartley, 2007). This measure of recall qualityhas been shown to reliably predict learning from quizzesindependent of measures of recall length (Hinze et al.,2013). The coding was calculated on a scale from 1 (VeryPoor) to 5 (Very Good). Again inconsistent with the twohypotheses, we observed little difference in recall qualityscores between the low-stakes (M=2.91, SD= 0.87) andhigh-stakes (M= 2.86, SD=0.61) groups, t< 1.

Final test performanceTable 2 shows means and standard deviations for final testperformance on detail, inference, and open-ended applica-tion items. We conducted a 3 × 3 ANOVA including theeffects of Study/Quiz condition (reread, low-stakes quiz,high-stakes quiz) and Test Item Type (detail, inference,application). We observed a main effect of Test Item Type,F(2, 116) = 57.96, ηp2 = .50, with performance on open-endedapplication items lower than performance on multiple choicedetail or inference items. More importantly, there was also amain effect of Study/Quiz condition [F(2, 58) = 4.41, ηp2 = .13],moderated by an interaction with Test Item Type [F(4,116) =2.90, ηp2 = .09]. To examine this interaction, weconducted systematic follow-up tests separately for thethree final test outcomes, focused on two hypothesizedeffects: the testing effect—for which the low-stakes quizgroup should outperform the rereading group, and thelearning disruption hypothesis—for which the high-stakesquiz group should perform worse than the low-stakes quizgroup, even though earlier quiz performance across thesegroups did not differ. Finally, we compared performancein the high-stakes quiz and rereading groups to determinethe extent to which performance pressure reduced or elimi-nated the benefits of retrieval practice.We found evidence for a testing effect depending on the

type of final test. The low-stakes quiz group performedsignificantly better than the rereading group on applicationitems [t(39) = 2.10, d= 0.65] and marginally better on infer-ence items [t(39) = 1.72, p = .09, d= 0.53]. No differencesemerged with respect to performance on detail items[t(39) =�0.71], likely due to high performance in both therereading and low-stakes quiz groups.Critically for this experiment, there was uniform evidence

for the learning disruption hypothesis across item types. Thehigh-stakes quiz group performed worse than the low-stakesquiz group on detail items [t(39) = 2.11, d = 0.66], inferenceitems [t(39) = 2.73, d= 0.85], and application items[t(39) = 2.97, d=0.92]. Additionally, the high-stakes quizgroup scored numerically lower than rereading on all itemtypes, although this difference only reached significance fordetail items [t(38) = 2.65, d=0.84; all other ts< 1]. The rela-tively low performance of participants in the high-stakes quizcondition is consistent with the learning disruption hypothesis.As a further test of this hypothesis, in two ANCOVAs, wecompared overall final test performance for low-stakes andhigh-stakes groups after controlling for quiz recall length andrecall quality. These analyses demonstrated a strong effect ofperformance pressure on delayed final test performance;participants performed more poorly on the final test afterhigh-stakes as compared with low-stakes quizzes independentof any differences in recall length [F(1, 38) = 10.65, ηp2 = .22]or recall quality [F(1, 37) = 10.57, ηp2 = .22].In sum, performance pressure during quizzes had a gener-

ally negative influence on long-term retention and understand-ing, and this effect was apparently not due to any influences ofpressure on previous quiz performance. Pressure specificallyreduced the traditionally observed benefits of retrieval practicefor performance on inference and application questions, andactually led to even worse performance than that of partici-pants who merely reread on detail items.

Table 2. Final test performance based on study/quiz activities andtype of final test item in Experiment 1

MC-detailitems

MC-inferenceitems

Applicationitems

Rereading 0.87 (0.12) 0.81 (0.16) 0.56 (0.27)Low-stakes quiz 0.84 (0.12) 0.88 (0.10) 0.73 (0.18)High-stakes quiz 0.75 (0.17) 0.76 (0.17) 0.54 (0.22)

Note: Numbers in parentheses are standard deviations. MC, multiple choice.

600 S. R. Hinze and D. N. Rapp

Copyright © 2014 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 28: 597–606 (2014)

Page 5: Retrieval (Sometimes) Enhances Learning: Performance Pressure Reduces the Benefits of Retrieval Practice

Metacognitive judgmentsWe analyzed participants’ metacognitive judgments of theirknowledge, interest, and JOLs after initially reading the textsand after study/quiz tasks (Table 3). We found no differencesbased on Study/Quiz condition for prior knowledge (F< 1),suggesting that the groups were similarly familiar with thecontent before the intervention. Next, we analyzed interest rat-ings using a mixed ANOVA with Time as a within-participantvariable (post-reading, post-study/quiz) and Study/Quizcondition as a between-participant variable. Overall, therewas a main effect of Time as interest ratings decreased afterrereading or quizzing, as compared with post-reading [F(1,58) = 39.49, ηp2 = .41], with no main effect of Study/Quizcondition (F< 1). There was a significant interaction betweenStudy/Quiz condition and Time [F(2, 58) = 7.23, ηp2 = .20].Pairwise t-tests revealed significant decreases in interest rat-ings after rereading [t(19) = 6.00, d=0.87] and high-stakesquizzes [t(19) = 2.83, d=0.40] but not after low-stakesquizzes [t(20) = 1.60, p= .13, d=0.17]. Thus, participantsgenerally became less interested in the materials with practice,but low-stakes quizzes held their interest most effectively.Finally, for JOLs, a mixed ANOVA assessed the effects ofStudy/Quiz condition and Time. There was no main effectof Study/Quiz condition [F(2, 58) = 1.32, p= .28] or Time(F< 1), but the interaction between these variables was sig-nificant [F(2, 58) = 12.99, ηp2 = .31]. Pairwise t-tests indicatedJOLs significantly increased after rereading [t(19) = 5.89,d=0.89], stayed relatively stable after low-stakes quizzes[t(20) = 1.23, p= .23], and marginally decreased after high-stakes quizzes [t(19) = 1.96, p= .07, d=0.39]. These dataare consistent with reports that participants overestimatelearning from rereading relative to low-stakes quizzes(Karpicke, Butler, & Roediger, 2009) while also indicatingthat participants display some sensitivity to the difficultiesof high-stakes quizzes.

EXPERIMENT 2

The results of Experiment 1 represent the first demonstrationthat we are aware of showing high-stakes retrieval practice isless effective than low-stakes retrieval practice. Experiment2 attempted to replicate these findings while also eliminatingseveral alternative explanations for the effects. One concernwith the previous procedure is that participants in the high-stakes quiz condition were not relieved of performancepressure until immediately before the final test. Thus, it ispossible that participants (i) sustained performance-relatedanxiety during the delay and/or (ii) responded negatively tothe revelation of deception provided immediately before

the final test. These considerations are problematic given thatthe learning disruption account relies on differences in pres-sure experienced specifically during the quizzes. In Experi-ment 2, participants in the high-stakes quiz condition wererelieved of performance pressure immediately after complet-ing the quizzes, with the goal of eliminating sustained anxi-ety or responses to in-the-moment deception.

A second concern is that the amount or quality of retrievalpractice may have been constrained given that participantswere cued to recall the content of individual paragraphs onthe quizzes. That is, it is possible that participants in thelow-stakes quiz condition were able to respond with longerrecalls but did not, given the specific requirement to respondto cues. Or these participants could have used a strategy tostructure their responses more coherently, which could haveled to higher recall quality scores, but were unable to do sogiven the quiz requirements. These possibilities are problem-atic if the quiz constraints prevented real group differencesfrom emerging in our assessments of recall length or quality.To address this concern, we changed the format of the quiz-zes from a cued recall task to a free recall task. This modifi-cation afforded participants more freedom to choose theirown retrieval strategies (see Hinze et al., 2013), potentiallyallowing for the observation of any differences in quizperformance that were not apparent in the cued format fromExperiment 1. One possible outcome of this manipulation isthat participants in the low-stakes quiz condition, relative tothe high-stakes condition, might organize their free recallmore effectively (Zaromb & Roediger, 2010) and/or engagein more elaborative retrieval practice (Carpenter, 2011),which would be evidenced by higher recall quality scores(Hinze et al., 2013).

Method

ParticipantsFifty-seven Northwestern undergraduates (41 female; Mage =19.94, SDage = 1.96) completed the experiment, none ofwhom participated in Experiment 1. As with Experiment 1,participants from this selective university were generallyhigh-achieving, with high self-reported ACT scores (n = 38,M = 32.24, SD = 0.32) and grade point average (n = 49,M = 3.55 out of 4, SD = 0.31), where available. One partici-pant’s metacognitive judgment responses were unavailablebecause of a computer error.

Materials and procedureThe materials and procedure were identical to Experiment 1with the following exceptions. First, we did not include arereading group given specific interest in comparing the

Table 3. Mean metacognitive appraisals before and after study/quiz activities in Experiment 1

Mean prior knowledge (1–4) Mean interest (1–4) Mean judgment of learning (1–10)

Post-reading Post-study/quiz Post-reading Post-study/quiz Post-reading Post-study/quiz

Rereading 2.69 (0.98) NA 2.80 (0.55) 2.25 (0.71) 7.50 (0.89) 8.38 (1.07)Low-stakes quiz 2.85 (0.98) NA 2.76 (0.70) 2.64 (0.71) 7.55 (1.25) 7.33 (1.53)High-stakes quiz 2.93 (0.77) NA 2.80 (0.64) 2.56 (0.66) 7.66 (1.05) 7.13 (1.60)Mean 2.82 (0.91) 2.79 (0.62) 2.49 (0.70) 7.57 (1.06) 7.61 (1.50)

Performance pressure and retrieval practice 601

Copyright © 2014 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 28: 597–606 (2014)

Page 6: Retrieval (Sometimes) Enhances Learning: Performance Pressure Reduces the Benefits of Retrieval Practice

high-stakes and low-stakes quiz groups. Second, the topic-sentence cues provided in the quiz packets were removedto create open-ended free recall quizzes. Finally, participantsin the high-stakes quiz condition were informed immediatelyafter completing the final quiz that they would earn the $5bonus regardless of performance. We also assessed workingmemory capacity (WMC) using the automated version of theOperation Span task (Unsworth, Heitz, Schrock, & Engle,2005), presented at the end of the procedure in the final testsession. We did not observe any effects of WMC on quiz orfinal test performance. However, this may have been due to asignificantly skewed distribution of scores, with severalparticipants committing few, if any, errors. As such, we donot discuss this measure or the null findings associated withit further.

DesignPerformance Pressure Condition (low-stakes quiz, high-stakesquiz) varied between-participants. Final Test Item Type(detail, inference, application) varied within-participants.

Results and discussion

No significant effects of order were observed for text presen-tation or final test, so all results are presented collapsedacross order.

Manipulation checkWe conducted two manipulation checks to determinewhether the performance pressure manipulation increasedself-reported anxiety (STAI-S) and pressure, after control-ling for trait anxiety (STAI-T). For state anxiety, the high-stakes group reported numerically higher levels of stateanxiety (M = 32.71, SD = 9.89) than the low-stakes group(M = 30.08, SD = 7.27), but this effect did not reach signifi-cance F(1, 54) = 2.20, p = .14, ηp2 = .04. However, there wasa main effect of Performance Pressure Condition on self-reported pressure [F(1, 53) = 8.20, ηp2 = .13]. Self-reportedpressure was higher in the high-stakes group (M = 2.23, SD =0.65) than the low-stakes group (M = 1.79, SD = 0.51). Thisprovides some evidence that the manipulation of perfor-mance pressure was effective. While it was expected thatperformance pressure would influence both STAI-S scoresand self-reported pressure, these measures were taken atslightly different time points. STAI-S was assessed immedi-ately after the instructions regarding the stakes of the quiz-zes, while self-reported pressure was assessed immediatelyafter completing the quizzes. Thus, self-reported pressureprovides a more proximal assessment of responses to high-stakes and low-stakes quizzes. Performance Pressure groupsdid not differ significantly on trait anxiety measures based onthe TAI or STAI-T (ts< 1.04).

Quiz performanceAs with Experiment 1, we assessed recall length and qualityto determine whether performance pressure had any disrup-tive or motivational effects on retrieval practice. The high-stakes quiz group provided numerically longer recalls(M = 126.16, SD = 27.23) than did the low-stakes quiz group(M = 116.88, SD= 23.10), although this difference was not

significant, t(55) = 1.38, p = .17. Scores of recall quality werealso similar in the high-stakes (M = 3.35, SD= 0.63) and low-stakes (M = 3.22, SD= 0.53) groups, t< 1. These findings areinconsistent with the retrieval disruption and retrieval moti-vation hypotheses as participants performed similarly underhigh-stakes and low-stakes conditions.

Final test performanceTable 4 shows means and standard deviations for final testperformance on detail, inference, and application tests. Weconducted a 2 × 3 ANOVA to examine the effects of Perfor-mance Pressure Condition (low-stakes quiz, high-stakesquiz) and Test Item Type (detail, inference, application).Overall, we observed a main effect of Test Item Type [F(2,110) = 48.66, ηp2 = .47], with open-ended application perfor-mance lower than performance on multiple choice detail orinference items. More importantly, there was a significantmain effect of Performance Pressure Condition [F(1,55) = 4.32, ηp2 = .07], with no interaction between Test ItemType and Performance Pressure Condition (F< 1). Thismain effect was characterized by the high-stakes quiz groupoverall performing worse than the low-stakes quiz group. Incombination with the equivalent performance of thesegroups on their quizzes, the results are consistent with thelearning disruption hypothesis. As a further test of thishypothesis, as in Experiment 1, we controlled for quiz recalllength and recall quality in two ANCOVAs. These analy-ses revealed a strong effect of Performance Pressure Con-dition on final test performance; participants performedmore poorly on the final test after high-stakes as com-pared with low-stakes quizzes after controlling for anydifferences in quiz recall length [F(1, 54) = 11.56,p = .001, ηp2 = .18] or recall quality [F(1, 54) = 7.79,p = .007, ηp2 = .13]. In sum, as in Experiment 1, high-stakes pressure during quizzes had a negative influenceon participants’ long-term retention and understanding ofscientific content. And as before, these results were notdependent upon the effects of performance pressure onquiz performance.

Metacognitive judgmentsMeans for participants’ metacognitive judgments appear inTable 5. With regard to prior knowledge, we found no differ-ences based on Performance Pressure Condition (t< 1),suggesting that the groups were similarly familiar with thecontent prior to reading. Next, we analyzed interest ratingsusing a mixed ANOVA with Time of rating as a within-participant variable (post-reading, post-quiz) and Perfor-mance Pressure Condition as a between-participant variable.Overall, a main effect of Time showed interest ratings

Table 4. Final test performance based on study/quiz activities andtype of final test item in Experiment 2

MC-detailitems

MC-inferenceitems

Applicationitems

Low-stakes quiz 0.87 (0.09) 0.86 (0.08) 0.69 (0.21)High-stakes quiz 0.82 (0.12) 0.81 (0.15) 0.59 (0.24)

Note: Numbers in parentheses are standard deviations. MC, multiple choice.

602 S. R. Hinze and D. N. Rapp

Copyright © 2014 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 28: 597–606 (2014)

Page 7: Retrieval (Sometimes) Enhances Learning: Performance Pressure Reduces the Benefits of Retrieval Practice

decreased significantly after the quiz activity as comparedwith post-reading [F(1, 54) = 4.24, ηp2 = .07]. There was nomain effect of Performance Pressure Condition (F< 1).Unlike Experiment 1, there was no interaction betweenPerformance Pressure Condition and Time (F< 1). Partici-pants did show decreased interest after practice tasks, butunlike before, low-stakes quizzes did not hold participants’interest more than did high-stakes quizzes.Finally, we analyzed participants’ JOLs after reading and

quizzes to determine whether they were aware of the costsand benefits of high-stakes and low-stakes quizzing. Therewas no significant main effect of Performance PressureCondition [F(1, 54) = 2.33, p = .13] but a main effect of Time[F(1, 54) = 5.28, ηp2 = .09]. JOLs were lower after the quizzesthan after initial reading. The interaction between Time andPerformance Pressure Condition was not significant (F< 1).These data, then, did not replicate the finding that learners’JOLs are sensitive to the difficulties of high-stakes, relativeto low-stakes, quizzing.

GENERAL DISCUSSION

In two experiments, manipulations of performance pressureduring quizzing resulted in detrimental effects on long-termtransfer test performance, relative to quizzing conductedwithout such pressure. These effects emerged despite pre-served performance on the quizzes, suggesting that pressuredid not substantially disrupt (or support) participants’ abilityto retrieve content from memory, despite influencing thelong-term learning gains associated with retrieval practice.And the effects were not due to existing differences in traittest anxiety, influencing performance aggregated acrosscharacteristically test-anxious and not test-anxious students.We observed some evidence that participants were awareof the difficulties associated with performance pressure, asindicated by relatively high levels of self-reported stateanxiety and pressure. In Experiment 1, but not Experiment2, the pressure manipulation was also related to low ratingsof interest and JOLs, suggesting that high-stakes quizzesmay have a negative influence on learners’ attitudes towardthe materials.The effects of the pressure manipulation on retention were

substantial and replicated, suggesting the need for determin-ing the practical implications of these findings. To considerthe effect descriptively, we converted participants’ totalscores into grade-level equivalents (‘A’= 89.5% to 100%,etc.). Across experiments, the most striking finding was thatparticipants in the low-stakes quiz groups rarely scored in the‘D’ (n = 4 out of 49) or ‘F’ range (n= 1 out of 49). But par-ticipants in the high-stakes quiz groups frequently provided

scores in these ranges (n= 11 out of 49 for each of the ‘D’and ‘F’ ranges). In total, 44% of the participants in thehigh-stakes quiz groups scored the equivalent of a ‘D’ oran ‘F’, compared with only 10% from the low-stakes quizgroups. We note that 9 out of 20 participants in the rereadingcondition from Experiment 1 (45%) also scored the equiva-lent of a ‘D’ or an ‘F’. These findings suggest that therereading and high-stakes quiz conditions could haveresulted in problematic assessment outcomes had the perfor-mance on these final tests been a crucial consideration forstudents’ grades. In light of the current findings, it appearsappropriate that calls for implementations of retrievalpractice in classrooms emphasize low-stakes rather thanhigh-stakes quizzes (e.g., McDaniel et al., 2011).

The observed differences in performance emerged pre-cisely from the processing demands associated with the quizactivities. Performance pressure is hypothesized to reducethe effective allocation or recruitment of executive process-ing resources, similar to findings from divided-attentionmanipulations (see Eysenck et al., 2007). In the currentproject, performance was not affected during quizzes, ashas been similarly shown in other divided-attention studiesinvolving initial retrieval success accompanied by disruptionof secondary tasks (Craik et al., 1996; Naveh-Benjamin,Craik, Gavrilescu, & Anderson, 2000; cf. Rohrer & Pashler,2003). The disruption of executive processes during retrievalinstead exhibited downstream effects on long-term retention,which it turns out is also in line with previous divided-attentionprojects. For example, delayed recognition and source mem-ory performance is worse for information quizzed underdivided attention as compared with full attention conditions(Dudukovic, DuBrow, & Wagner, 2009). These convergentfindings indicate that the adaptive application of attentionalresources in response to demands during retrieval mightnonetheless be associated with comprehension decrementsthat subsequently emerge over time.

The pattern of differences observed during quiz and finaltest performance proves critical for interpreting the currentfindings. The low-stakes as compared with the high-stakesgroups demonstrated more robust retention and understand-ing of quizzed information, consistent with the learningdisruption hypothesis. But high-stakes and low-stakesgroups performed similarly on quizzes, providing little inthe way of evidence for the retrieval disruption or retrievalmotivation hypotheses. While participants under perfor-mance pressure were able to immediately retrieve andproduce content on quizzes, this retrieval was less effectivefor long-term retention after high-stakes pressure potentiallybecause other executive control processes were not avail-able. Indeed, executive control processes have been identi-fied as critical for the kinds of effortful and elaborative

Table 5. Metacognitive appraisals before and after study/quiz activities in Experiment 2

Mean prior knowledge (1–4) Mean interest (1–4) Mean judgments of learning (1–10)

Post-reading Post-quiz Post-reading Post-quiz Post-reading Post-quiz

Low-stakes quiz 2.55 (0.87) NA 2.76 (0.71) 2.61 (0.64) 7.89 (1.08) 7.72 (1.14)High-stakes quiz 2.51 (0.84) NA 2.70 (0.51) 2.63 (0.68) 7.43 (1.12) 7.20 (1.56)Mean 2.52 (0.85) 2.73 (0.61) 2.62 (0.65) 7.65 (1.11) 7.45 (1.39)

Performance pressure and retrieval practice 603

Copyright © 2014 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 28: 597–606 (2014)

Page 8: Retrieval (Sometimes) Enhances Learning: Performance Pressure Reduces the Benefits of Retrieval Practice

processing associated with effective learning experiences.Participants might engage these processes with the goal of(re)organizing their responses, differentiating and integratinginformation across texts and text segments, constructing in-ferences based on the text and prior knowledge, and makingmetacognitive judgments during learning. Explicit learningis dependent on these control processes (e.g., Unsworth &Engle, 2005), and learning from retrieval practice seems noexception. Emerging accounts of retrieval practice effectshave reinforced the notion that memorial benefits arise fromeffortful processing of the target content to be retrieved(Carpenter, 2009; Johnson & Mayer, 2009; Pyc & Rawson,2009). Under conditions that recruit or disturb the allocationof executive resources, such as divided attention, stress,pressure, and anxiety, the benefits of retrieval practice maythus be reduced.

One limitation of the current study is that we were unableto identify the specific differences in processing duringquizzes that account for the observed learning disruption.Our data indicate that the quality of the responses does notmediate the effect, suggesting that performance pressuredid not prevent participants from forming a coherent, elabo-rated response (cf. Carpenter, 2011; Hinze et al., 2013). Butit is possible that performance pressure disrupted elaborationby limiting the activation of related semantic knowledge thatcould be used as a retrieval cue (Carpenter, 2009; Pyc &Rawson, 2009) even if this did not manifest as differencesin response quality. Future research using other paradigmsmay be needed to further test whether performance pressureinfluences elaborative retrieval. For example, the influenceof pressure may be different for quizzes formatted as deepcomprehension questions (Johnson & Mayer, 2009) orexplanation prompts (Hinze et al., 2013). Additionally, it ispossible that performance pressure disrupts the re-encodingof information following retrieval (Wing et al., 2013) or dis-rupts the refinement and differentiation of memory represen-tations (see Karpicke & Smith, 2012). The current results donot allow us to differentiate between these or other accountsof retrieval practice benefits. However, the results are consis-tent with the idea that learning from retrieval practice isdependent on more than just the production or generationof content during retrieval and that evidence for differencesin learning from quizzes may emerge only with a delay.

A second limitation is that the current experiments re-stricted the manipulation of performance pressure to quizzes.Final tests served only as assessments of long-term retentionand comprehension, and were low-stakes to allow a rela-tively pure assessment. It is certainly the case that final testsare often high-stakes rather than low-stakes, and futurestudies would be needed to explore performance pressureat multiple time and testing points. The current data speakonly to the role of performance pressure on learning fromquizzes, not to the influence of test-stakes on summativeassessments more generally.

While the main purpose of this study was to compare theeffects of high-stakes and low-stakes quizzes, Experiment 1also included a rereading condition, as has traditionally beenemployed with demonstrations of testing effects (Carrier &Pashler, 1992). Any difference in retention followingretrieval practice as compared with restudying cannot simply

be attributed to study time, and a testing effect represents thisspecific comparison. The testing effect from Experiment 1held for some, but not all, types of final test outcomes.Hypotheses as to the differences observed between rereadingand low-stakes quizzes prove informative for explaining theobserved effects. The multiple choice items that tested fordetails relied heavily on familiarity with specific facts andkey terms from the text. Given that the rereading groupwas allowed to restudy each text, it is possible that they ex-perienced more exposures to key terms than did participantsin the tested groups in which terms were written down onlyonce. In fact, the null testing effect for detailed items seemsto be driven by very high performance in the rereading group(87%) rather than by low performance of participants in thelow-stakes quiz group (84%). In addition, the quizzed groupsdid not receive feedback and were only allowed one attemptat retrieval. Nevertheless, the low-stakes participants showedmore successful performance on application and inferenceitems than did the rereading group, providing informativedemonstrations of the power and scope of low-stakes quiz-zing. Combined with a growing body of work demonstratingthe benefits of retrieval practice especially when combinedwith feedback, these results, again, empirically validateearlier calls for low-stakes quizzing (for recent reviews, seeCarpenter, 2012; Karpicke & Grimaldi, 2012; Roediger &Butler, 2011).The benefits observed for low-stakes quizzes could likely

be further influenced by manipulating characteristics of thestimuli and experience, including offering repeated retrievalattempts (Karpicke & Roediger, 2007) and informative feed-back (Kang et al., 2007), or by manipulating the difficulty ofthe materials. In fact, it might be the case that including suchfeatures in a retrieval practice experience would support sub-sequent performance in the face of even high-stakes quizzingconditions. Regardless of whether high-stakes conditionsmight outperform rereading with other types of manipula-tions, though, the current data advocate that high-stakes quizconditions can lead to less optimal learning than low-stakesconditions. This reinforces the need for careful articulationand explanation of the use of quizzes in regular classroomexperiences, including not just focus on the processing andcomprehension consequences of repeated retrieval but alsoattention to the role of learners’ commonplace responses totests (e.g., stress and anxiety), even when the quizzes areof limited consequence for grading and evaluation (Cassady,2004; Tse & Pu, 2012). Future work would do well toconsider potential individual differences, as participantsmay respond more or less effectively to performance pres-sure. While individual differences were not the target of thecurrent study, we explored but did not observe any consis-tent interactions between performance pressure and trait testanxiety, self-reported prior knowledge, ratings of topic inter-est, or (in Experiment 2) WMC. Nevertheless, we would notbe surprised to find that these or other individual differencefactors moderate the effects of performance pressure in alarger-scale (e.g., classroom-based) study.The findings from this study connect with contemporary

concerns about the controversial role of high-stakes testing(Tuerk, 2005; Walton & Spencer, 2009). One recommenda-tion derived from the current findings is that quizzes, if

604 S. R. Hinze and D. N. Rapp

Copyright © 2014 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 28: 597–606 (2014)

Page 9: Retrieval (Sometimes) Enhances Learning: Performance Pressure Reduces the Benefits of Retrieval Practice

intended to be used as learning events rather than summativeassessments, should be low-stakes to support the greatestlearning benefits. This is not meant to suggest that studentsshould receive only ungraded exams without any perfor-mance-related feedback. Rather, effective feedback can beprovided that avoids encouraging focus on performancerelative to peers, instead providing students with formativeinformation to help them understand the material and updateknowledge (see Bangert-Drowns, Kulik, Kulik, & Morgan,1991; Black & Wiliam, 1998). It may also be possible to re-duce performance-related anxiety and its problematic effectsthrough manipulations such as expressive writing (Ramirez& Beilock, 2011) or by priming thoughts of competence(Lang & Lang, 2010). These manipulations may occasion-ally be necessary given the practical reality that some assess-ments are inherently high-stakes or, even when they are not,may be perceived as such by students. When consideringthese implications, though, we note that the laboratory-basedmanipulation of performance pressure in the current experi-ment may not align perfectly to the kinds of real-worldpressure experienced during classroom or standardized tests.In fact, the pressure experienced based on a $5 prize mayunderestimate the pressure experienced in response to theretrieval demands of a quiz worth a substantial part of one’sgrade or to a standardized college entrance exam. It is alsopossible that these real-world retrieval scenarios instillqualitatively different pressure on students, relative to themonetary prize offered here. These suggestions further high-light the need for careful empirical evaluations and prag-matic considerations of the impact of pressure on potentialtesting experiences.In conclusion, retrieval practice is a powerful tool for

supporting learning, but the practical contingencies of learn-ing experiences, including anxiety about tests, performance,and their consequences, may influence any beneficial effects.While the current study demonstrates that individuals canovercome the effects of performance pressure during quiz-zing, retrieval practice was most effective for long-termlearning under low-stakes as compared with high-stakes con-ditions. Any application of retrieval practice, or quizzing, asa targeted learning intervention would benefit from consider-ing the amount of performance pressure induced by the taskand, more generally, any extraneous demands placed onlearners during retrieval practice.

ACKNOWLEDGEMENTS

We wish to thank Lauren Linzmeier, Craig Kopulsky,Jacqueline Gallo, and Benedict Dungca for their help withdata collection and coding.

REFERENCES

Ashcraft, M. H., & Kirk, E. P. (2001). The relationships among workingmemory, math anxiety, and performance. Journal of Experimental Psy-chology. General, 130, 224–237. doi: 10.1037//0096-3445.130.2.224

Bangert-Drowns, R. L., Kulik, J. A., & Kulik, C.-L. C. (1991). Effects offrequent classroom testing. Journal of Educational Research, 85, 89–99.

Bangert-Drowns, R. L., Kulik, C.-L. C., Kulik, J. A., & Morgan, M. (1991).The instructional effect of feedback in test-like events. Review of Educa-tional Research, 61, 213–238. doi: 10.2307/1170535

Beilock, S. L., & Carr, T. H. (2005). When high-powered people fail: Work-ing memory and “choking under pressure” in math. PsychologicalScience, 16, 101–105. doi: 10.1111/j.0956-7976.2005.00789.x

Beilock, S. L., Kulp, C. a., Holt, L. E., & Carr, T. H. (2004). More on thefragility of performance: Choking under pressure in mathematicalproblem solving. Journal of Experimental Psychology. General, 133,584–600. doi: 10.1037/0096-3445.133.4.584

Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assess-ment in Education: Principles, Policy, & Practice, 5, 7–75.

Butler, A. C. (2010). Repeated testing produces superior transfer of learningrelative to repeated studying. Journal of Experimental Psychology:Learning, Memory, and Cognition, 36, 1118–1133.

Carpenter, S. K. (2009). Cue strength as a moderator of the testing effect:The benefits of elaborative retrieval. Journal of Experimental Psychol-ogy: Learning, Memory, and Cognition, 35, 1563–1569.

Carpenter, S. K. (2011). Semantic information activated during retrievalcontributes to later retention: Support for the mediator effectiveness hy-pothesis of the testing effect. Journal of Experimental Psychology:Learning, Memory, and Cognition, 37, 1547–1552.

Carpenter, S. K. (2012). Testing enhances the transfer of learning. CurrentDirections in Psychological Science, 21, 279–283.

Carpenter, S. K., Pashler, H., & Cepeda, N. J. (2009). Using tests to enhance8th grade students’ retention of U.S. history facts. Applied Cognitive Psy-chology, 23, 760–771.

Carrier, M., & Pashler, H. (1992). The influence of retrieval on retention.Memory and Cognition, 20, 632–642.

Cassady, J. C. (2004). The impact of cognitive test anxiety on text compre-hension and recall in the absence of external evaluative pressure. AppliedCognitive Psychology, 18, 311–325. doi: 10.1002/acp.968

Cassady, J., & Johnson, R. E. (2002). Cognitive test anxiety and academicperformance. Contemporary Educational Psychology, 27, 270–295. doi:10.1006/ceps.2001.1094

Craik, F. I. M., Govoni, R., Naveh-Benjamin, M., & Anderson, N. D.(1996). The effects of divided attention on encoding and retrieval pro-cesses in human memory. Journal of Experimental Psychology: General,125, 159–180. doi: 10.1037/0096-3445.125.2.159

Crooks, T. J. (1988). The impact of classroom evaluation practices on stu-dents. Review of Educational Research, 58, 438–481. doi: 10.3102/00346543058004438

Decaro, M. S., Thomas, R. D., Albert, N. B., & Beilock, S. L. (2011). Chok-ing under pressure: Multiple routes to skill failure. Journal of Experimen-tal Psychology. General, 140, 390–406. doi: 10.1037/a0023466

Dudukovic, N. M., DuBrow, S., &Wagner, A. D. (2009). Attention during mem-ory retrieval enhances future remembering.Memory&Cognition, 37, 953–961.

Eysenck, M. W., Derakshan, N., Santos, R., & Calvo, M. G. (2007). Anx-iety and cognitive performance: Attentional control theory. Emotion, 7,336–353. doi: 10.1037/1528-3542.7.2.336

Hayes, S., MacLeod, C., & Hammond, G. (2009). Anxiety-linked taskperformance: Dissociating the influence of restricted working memorycapacity and increased investment of effort. Cognition & Emotion, 23,753–781. doi: 10.1080/02699930802131078

Hembree, R. (1988). Correlates, causes, effects, and treatment of test anxiety.Re-view of Educational Research, 58, 47–77. doi: 10.3102/00346543058001047

Hinze, S. R., &Wiley, J. (2011). Testing the limits of testing effects using com-pletion tests. Memory, 19, 290–304. doi: 10.1080/09658211.2011.560121

Hinze, S. R., Wiley, J., & Pellegrino, J. W. (2013). The importance of con-structive comprehension processes in learning from tests. Journal ofMemory and Language, 69, 151–164. doi: 10.1016/j.jml.2013.03.002

Johnson, C. I., &Mayer, R. E. (2009). A testing effect with multimedia learning.Journal of Educational Psychology, 101, 621–629. doi: 10.1037/a0015183

Kang, S. H. K., McDermott, K. B., & Roediger, H. L. (2007). Test formatand corrective feedback modify the effect of testing on long-term reten-tion. European Journal of Cognitive Psychology, 19, 528–558.

Karpicke, J. D., & Blunt, J. R. (2011). Retrieval practice produces more learn-ing than elaborative studying with concept mapping. Science, 331, 3–4.

Karpicke, J. D., Butler, A.C., & Roediger, H. L. (2009). Metacognitivestrategies in student learning: Do students practice retrieval when theystudy on their own? Memory, 17, 471–479.

Performance pressure and retrieval practice 605

Copyright © 2014 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 28: 597–606 (2014)

Page 10: Retrieval (Sometimes) Enhances Learning: Performance Pressure Reduces the Benefits of Retrieval Practice

Karpicke, J. D., & Grimaldi, P. J. (2012). Retrieval-based learning: Aperspective for enhancing meaningful learning. Educational PsychologyReview, 24, 401–418.

Karpicke, J. D., & Roediger, H. L. (2007). Repeated retrieval during learn-ing is the key to long-term retention. Journal of Memory and Language,57, 151–162.

Karpicke, J. D., & Smith, M. A. (2012). Separate mnemonic effects ofretrieval practice and elaborative encoding. Journal of Memory andLanguage, 67, 17–29.

Karpicke, J. D., & Zaromb, F. M. (2010). Retrieval mode distinguishes thetesting effect from the generation effect. Journal of Memory andLanguage, 62, 227–239.

Lang, J. W. B., & Lang, J. (2010). Priming competence diminishes the linkbetween cognitive test anxiety and test performance. Implications for theinterpretation of test scores. Psychological Science, 21, 811–819. doi:10.1177/0956797610369492

Lehman, S., Schraw, G., McCrudden, M., & Hartley, K. (2007). Processingand recall of seductive details in scientific text. Contemporary Educa-tional Psychology, 32, 569–587.

McDaniel, M. A., & Masson, M. E. J. (1985). Altering memory representa-tions through retrieval. Journal of Experimental Psychology: Learning,Memory, and Cognition, 11, 371–385.

McDaniel, M. A., Agarwal, P. K., Huelser, B. J., McDermott, K. B., &Roediger, H. L., III. (2011). Test-enhanced learning in a middle schoolscience classroom: The effects of quiz frequency and placement. Journalof Educational Psychology, 103, 399–414.

Naveh-Benjamin, M. (1991). A comparison of training programs intendedfor different types of test-anxious students: Further support for aninformation-processing model. Journal of Educational Psychology, 83,134–139. doi: 10.1037/0022-0663.83.1.134

Naveh-Benjamin,M., Craik, F. I. M., Gavrilescu, D., &Anderson, N. D. (2000).Asymmetry between encoding and retrieval processes: Evidence from di-vided attention and a calibration analysis.Memory&Cognition, 28, 965–976.

Pashler, H., Bain, P., Bottge, B., Graesser, A., Koedinger, K., McDaniel, M.,& Metcalfe, J. (2007). Organizing instruction and study to improvestudent learning (NCER 2007–2004). Washington, DC: NationalCenter for Education Research, Institute of Education Sciences, U.S.Department of Education. Retrieved March 16, 2014, from: http://ncer.ed.gov

Pyc, M. A., & Rawson, K. A. (2009). Testing the retrieval effort hypothesis:Does greater difficulty correctly recalling information lead to higherlevels of memory? Journal of Memory and Language, 60, 437–447.

Ramirez, G., & Beilock, S. L. (2011). Writing about testing worries boostsexam performance in the classroom. Science, 331, 211–213. doi:10.1126/science.1199427

Roediger, H. L., & Butler, A. C. (2011). The critical role of retrieval practicein long-term retention. Trends in Cognitive Sciences, 15, 20–27.

Roediger, H. L., & Karpicke, J. D. (2006a). Test-enhanced learning: Takingmemory tests improves long-term retention. Psychological Science, 17,249–255.

Roediger, H. L., & Karpicke, J. D. (2006b). The power of testing memory:Basic research and implications for educational practice. Perspectives onPsychological Science, 1, 181–210.

Rohrer, D., & Pashler, H. (2003). Concurrent task effects on memory re-trieval. Psychonomic Bulletin & Review, 10, 96–103

Sarason, I. G., & Sarason, B. R. (1990). Test anxiety. In H. Leitenberg (Ed.),Handbook of social and evaluation anxiety (pp. 475–495). New York:Plenum Press.

Schwarzer, R., & Jerusalem, M. (1992). Advances in anxiety theory: a cog-nitive process approach. In K. A. Hagtvet, T. B. Johnsen (Eds.), Ad-vances in test anxiety research (Vol 7, pp. 475–495). Swetts &Zeitlinger: Lisse, The Netherlands.

Spielberger, C. D. (1980). Test anxiety inventory. Palo Alto, CA: ConsultingPsychologists Press.

Spielberger, C. D., Gorsuch, R. L., & Lushene, R. E. (1988). STAI—Manualfor the State Trait Anxiety Inventory (3rd edn). Palo Alto, CA: ConsultingPsychologists Press.

Tobias, S. (1985). Test anxiety: Interference, defective skills, and cognitivecapacity. Educational Psychologist, 20, 135–142.

Tse, C. S., & Pu, X. (2012). The effectiveness of test-enhanced learningdepends on trait test anxiety and working-memory capacity. Journal ofExperimental Psychology: Applied, 18, 253–264.

Tuerk, P. W. (2005). Psychology and Washington: Research in the high-stakes era: Achievement, resources, and no child left behind. Psycholog-ical Science, 16, 419–425.

Unsworth, N., & Engle, R. W. (2005). Individual differences in workingmemory capacity and learning: Evidence from the serial reaction timetask. Memory & Cognition, 33, 213–220.

Unsworth, N., Heitz, R. P., Schrock, J. C., & Engle, R. W. (2005). An auto-mated version of the operation span task. Behavior Research Methods,37, 498–505.

Walton, G. M., & Spencer, S. J. (2009). Intellectual ability of negativelystereotyped students. Psychological Science, 20, 1132–1139.

Wing, E. A., Marsh, E. J., & Cabeza, R. (2013). Neural correlates ofretrieval-based memory enhancement: An fMRI study of the testingeffect. Neuropsychologia, 51, 2360–2370.

Zaromb, F. M., & Roediger, H. L. (2010). The testing effect in free recall isassociated with enhanced organizational processes. Memory & Cogni-tion, 38, 995–1008.

606 S. R. Hinze and D. N. Rapp

Copyright © 2014 John Wiley & Sons, Ltd. Appl. Cognit. Psychol. 28: 597–606 (2014)