14
Pupil perceptions of national tests in science: perceived importance, invested effort, and test anxiety Hanna Eklöf & Mikaela Nyroos Received: 19 September 2011 / Revised: 22 March 2012 / Accepted: 12 April 2012 / Published online: 11 May 2012 # Instituto Superior de Psicologia Aplicada, Lisboa, Portugal and Springer Science+Business Media BV 2012 Abstract Although large-scale national tests have been used for many years in Swedish compulsory schools, very little is known about how pupils actually react to these tests. The question is relevant, however, as pupil reactions in the test situation may affect test performance as well as future attitudes towards assessment. The question is relevant also in light of the changing assessment culture in Sweden and other European countries. The main purpose of the present study was to explore how a sample of grade 9 pupils perceived their first encounter with national tests in science, mainly in terms of perceived importance of the test, reported invested effort, and feelings of test anxiety, and how these aspects were related to test performance. Results show that a majority of the pupils seemed to perceive the test as rather important and claimed that they spent effort on the test. There was, however, also a fair group of students who did not perceive the test as very important. Ratings of perceived importance and invested effort and motivation were positively related to perfor- mance, particularly for the boys. Many students also reported that they felt anxious before and during the test, but the relationship between test anxiety and test performance was rather weak. Findings illuminate how pupils may perceive and behave in the assessment situation, and point to the need of further studies investigating the psychology of test-taking. Keywords Test-taking psychology . Large-scale assessment . Motivation . Anxiety . Pupil perspective Introduction In any assessment situation, it is the test-taker and not the test developer, test administrator, or test user who produces the scores that any inference will be based on. It follows that the test-takersreactions in the assessment situation are important to acknowledge, as these reactions may have an impact on the test outcome, and also on how future assessments are perceived by the person being assessed. In Sweden, there is currently a trend towards more Eur J Psychol Educ (2013) 28:497510 DOI 10.1007/s10212-012-0125-6 H. Eklöf (*) : M. Nyroos Department of Applied Educational Science, Umeå University, 901 87 Umeå, Sweden e-mail: [email protected]

Pupil perceptions of national tests in science: perceived importance, invested effort, and test anxiety

  • Upload
    mikaela

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Pupil perceptions of national tests in science:perceived importance, invested effort, and test anxiety

Hanna Eklöf & Mikaela Nyroos

Received: 19 September 2011 /Revised: 22 March 2012 /Accepted: 12 April 2012 /Published online: 11 May 2012# Instituto Superior de Psicologia Aplicada, Lisboa, Portugal and Springer Science+Business Media BV 2012

Abstract Although large-scale national tests have been used for many years in Swedishcompulsory schools, very little is known about how pupils actually react to these tests. Thequestion is relevant, however, as pupil reactions in the test situation may affect testperformance as well as future attitudes towards assessment. The question is relevant alsoin light of the changing assessment culture in Sweden and other European countries. Themain purpose of the present study was to explore how a sample of grade 9 pupils perceivedtheir first encounter with national tests in science, mainly in terms of perceived importanceof the test, reported invested effort, and feelings of test anxiety, and how these aspects wererelated to test performance. Results show that a majority of the pupils seemed to perceive thetest as rather important and claimed that they spent effort on the test. There was, however,also a fair group of students who did not perceive the test as very important. Ratings ofperceived importance and invested effort and motivation were positively related to perfor-mance, particularly for the boys. Many students also reported that they felt anxious beforeand during the test, but the relationship between test anxiety and test performance was ratherweak. Findings illuminate how pupils may perceive and behave in the assessment situation,and point to the need of further studies investigating the psychology of test-taking.

Keywords Test-taking psychology . Large-scale assessment . Motivation . Anxiety .

Pupil perspective

Introduction

In any assessment situation, it is the test-taker and not the test developer, test administrator,or test user who produces the scores that any inference will be based on. It follows that thetest-takers’ reactions in the assessment situation are important to acknowledge, as thesereactions may have an impact on the test outcome, and also on how future assessments areperceived by the person being assessed. In Sweden, there is currently a trend towards more

Eur J Psychol Educ (2013) 28:497–510DOI 10.1007/s10212-012-0125-6

H. Eklöf (*) :M. NyroosDepartment of Applied Educational Science, Umeå University, 901 87 Umeå, Swedene-mail: [email protected]

assessment in as well as of school, and more large-scale national tests have been imple-mented in more subjects and grades than ever before. There are also indications that thestakes of the national assessment system will rise in the coming years. Recently, the aims ofthe national tests have changed, and the previous emphasis on formative purposes has beenreplaced with an emphasis on summative and accountability purposes. How the pupilstaking the tests perceive and react to this changing assessment context is however notknown. Therefore, it seems highly important to explore how tests like the national testsare perceived by the pupils, how pupils behave in the test situation, and how theseperceptions and behaviors interact with test performance. It is also of interest to investigatewhether these perceptions are different for boys and girls, or for different school subjects.The purpose of the present study was to start exploring this issue, mainly by looking at twoaspects of the psychology of test-taking, test motivation and test anxiety, in a sample ofSwedish grade 9 pupils taking a national test in biology, physics, or chemistry.

The Swedish national assessment system

National tests have been used in Swedish compulsory schools and Swedish upper secondaryschools for many years. Sweden was in fact one of the first countries to introduce them(Eurydice 2009). From the 1960s to 1994, Sweden had a norm-referenced system forassessment and grading, and standardized national tests were used in this system mainlyas an instrument for grade calibration. From 1994 and onwards, Sweden has a goal-(criterion-) referenced system for assessment and grading, and in this system, national testsare used in a number of subjects and grades. Until 2009, national tests have been adminis-tered in Swedish, English, and mathematics on two occasions in compulsory school: grade 5(optional and no English test) and grade 9 (mandatory). In the spring semester 2009, nationaltests in biology, physics, and chemistry were used for the very first time in the last year ofcompulsory school. From 2010, national tests will be mandatory in grades 3, 6, and 9, and inthe coming years, national tests in the social sciences will also be implemented.

The result on the national test is not formally decisive of pupil grades, and the teacher isfree to decide what weight he or she wants to put at the national test result. Hence, the testsare not high-stakes tests in a formal sense. Still, on an aggregated level, there is a strongcorrelation between national test results and pupil grades (Swedish National Agency forEducation 2007), schools get publicly criticized if the final grades differ too much from thenational test results, and the general impression is that the national tests are perceived as highstakes by pupils and teachers. Also, as noted, very recently the official purposes of thenational tests have changed, and the tests now have a more outspoken role as a tool forgrading and as a tool for evaluating school practice.

The psychology of test-taking and assessment validity

Without sufficient knowledge, pupils cannot perform well on achievement tests. Withoutsufficient motivation, pupils may choose not to do their best even if they possess thenecessary knowledge. With a high level of test anxiety, on the other hand, pupils might beunable to demonstrate their true proficiency level even if they possess sufficient knowledgeand sufficient motivation to do their best. Thus, psychological aspects of test-taking mightlead to a test result that is a poor estimate of the pupils’ actual level of knowledge.

Assessments of knowledge in the educational system in general aims at generatinginformation about individuals or groups of individuals, information that then can be usedfor different purposes. That the information is used for one purpose or the other in different

498 H. Eklöf, M. Nyroos

contexts and with different consequences makes the issue of validity, the quality, relevance,and meaning of the assessment very important to consider (Benson 1998; Messick 1995).Two major threats to validity are construct-irrelevant variance and construct underrepresen-tation. Construct-irrelevant variance means that the obtained results are a result of more thanthe variable of interest; that the result is contaminated by other, irrelevant variables and thatwe end up measuring more than we think we are measuring. Construct underrepresentationmeans that the measure fails to capture the entire construct of interest, but only parts of it. Ifpsychological aspects in the assessment situation, such as motivation or anxiety, affectperformance and the impact of these variables is unknown, this would be a case ofconstruct-irrelevant variance and the following interpretation and possibly also the use ofassessment results will not be valid.

Still, we actually do not know very much about how pupils react and behave in differentassessment situations and how these reactions and behaviors interact with achievement andwith future attitudes towards school, different school subjects, and assessment in school. Thepupil perspective is often forgotten, although it is the pupils who are struggling with the testsin school, and the pupils who often suffer the consequences of a good or a poor result. As the2009 administration was the pupils’ very first encounter with national tests in science, and asno studies of this kind have been performed before in the Swedish national test context, itseems important to learn how the pupils reacted to these tests.

Previous research on test-taking motivation, test anxiety, and test performance

Assessing the knowledge and proficiency of individuals is a complex process. Manyvariables are involved, there are many potential threats to assessment validity, and a numberof examinee characteristics could be relevant to consider in the assessment situation. Studieshave, however, shown that test-taking motivation in terms of reported invested effort and testanxiety, are among the most influential (Eccles and Wigfield 2002; Zeidner 1998). Forexample, in a study in the PISA (Programme of International Student Assessment) contextincluding a large number of motivational variables, self-reported effort and worry were themost powerful predictors of test achievement, together explaining 28% of the variance(Baumert and Demmrich 2001). In the Swedish TIMSS Advanced context, reported effortwas one of rather few significant predictors of test performance, corresponding to the effectof socio-economic status or language spoken at home (Swedish National Agency forEducation 2009).

Several studies have shown that reported motivation and effort varies with the stakes ofthe test as well as with test format, and that a low level of motivation to spend effort on a testis detrimental to test performance (Brown and Wahlberg 1993; Sundre and Kitsantas 2004;Wise and DeMars 2005), also when ability is controlled for (Thelk et al. 2009).

A high degree of test anxiety has also been shown to be detrimental to test performance ina large number of studies (cf. Gumora and Arsenio 2002; Owens et al. 2008). Notsurprisingly, motivation and anxiety have been found to co-vary. When a test is perceivedas very important and motivation is high, anxiety also tends to be elevated. Further,motivation and anxiety have been shown to have opposite effects on performance, and theoptimal state of mind for an optimal performance is a high level of motivation but a low levelof anxiety (Wolf and Smith 1995). Research has further shown that the influence of testanxiety is most pronounced in those situations when the testing situation is competitive(Wolf and Smith 1995) and in assessment situations with high intrinsic value or importance(Covington and Omelich 1987). However, test anxiety seems to vary as a function of age,gender, and achievement level. Elderly, compared to younger subjects, tend to score higher

Pupil perceptions of national tests in science 499

on test anxiety scales (Zeidner 1998), females report higher values of test anxiety than males(Chapell et al. 2005; Inzlicht and Ben-Zeev 2000), and low achievers are being moreaffected by test anxiety (Birenbaum and Gutvirtz 1993; Wigfield and Eccles 1989). Schoolsubject probably also is a variable influencing experienced levels of test anxiety. Forexample, mathematics stands out as a very stressful subject, while English and history seemless stressful (Boekaerts et al. 2003).

By assessing motivation, perceived importance of the test, reported level of effort as wellas reported feelings of test anxiety in connection with the first administration of the nationaltests in science in grade 9, we can get a first impression of how the pupils perceive these testsand how these test-psychological aspects are related to test performance. Hence, the purposeof the present study was to explore the following questions: Did the pupils perceive thenational test as an important test? Did the pupils feel motivated to do their best on the test?Were the pupils nervous and anxious about failing when taking the test? How wereimportance, effort, and anxiety related to each other and to performance? Do patterns lookthe same for boys and girls and for different school subjects?

Method

Participants

The sample consisted of 1,189 students in the ninth grade (15–16 years old), 612 girls(51.5%) and 577 boys (48.5%). The sample was a random sample of Swedish schools withpupils in the ninth grade. A total of 27 different schools participated in the study. Thesampling of schools was done by the Swedish National Agency for Education. All studentsin the sample took a National test in either biology (n0342), physics (n0363), or chemistry(n0484).

Measures

The questionnaire

The questionnaire used in the present study is based on the propositions of the expectancy-valuetheory of achievement motivation (Atkinson 1957; Eccles and Wigfield 2002; Pintrich and DeGroot 1990). Although often applied to general achievement motivation, the expectancy-valuemodel is valid also for the specific test-taking motivation construct, as shown by previousstudies (Eklöf 2006; Sundre and Moore 2002; Thelk et al. 2009). The value component in themodel, in terms of perceived importance and invested effort, is the component that has mostoften been stressed in research on test-taking motivation, but some have also suggested that anaffective component in terms of test anxiety should be included in the task-specific expectancy-value model (Pintrich and De Groot 1990; Wolf and Smith 1995).

The questionnaire accordingly contained items that asked for perceived importance of thetest, motivation, invested effort, and test anxiety. The items asking for perceived importance,motivation, and invested effort were mainly adapted from the Student Opinion Scale (SOS)(see Sundre and Moore 2002). The SOS is a ten-item scale with two five-item subscales(Importance and Effort). The SOS has been used in the USA for a number of years and in alarge number of studies, and there is ample support for the reliability, validity, and usefulnessof the scale (cf. Thelk et al. 2009). It should be noted, however, that the measure used in thepresent study is not identical to the SOS (see Eklöf, submitted for a description of the

500 H. Eklöf, M. Nyroos

development and psychometric evaluation of the present scale). The items asking fortest anxiety were adapted from the Children’s Test Anxiety Scale (the CTAS) (Wrenand Benson 2004). The CTAS is a well-established and validated 30-item scale thathas shown satisfactory reliability and high practicality in real school settings (Zeidner2007). The entire scale consisting of 30 items has previously been analyzed and usedin a Swedish context (Nyroos et al., accepted). In the present study, a small numberof items that has shown desirable properties were selected from the scale in order toobtain a brief measure of test anxiety in the grade 9 national test context. All itemswere measured on a four-point scale with ratings ranging from a highly unfavorableattitude to a highly favorable attitude (10disagree a lot, 40agree a lot). Pupilscompleted the questionnaire directly after completion of the national test.

The national test

The achievement test in the present study was the national test in either biology, physics, orchemistry that was administered to all Swedish grade 9 pupils (participating in regularschooling) for the very first time in spring 2009. The test consists of two parts. The firstpart is a paper-and-pencil test which contains mainly constructed-response items, but also anumber of multiple-choice items. Test items are assumed to cover important curricular goals.The pupils have 150 min for completing this part of the test. The second part is a laboratorypart and the allocated time for this part is 60 min. The test scores used in the present studyare scores from the paper-and-pencil test only. The national tests are externally developed,but scored by the teachers. The teachers are also the ones who proctor the test administration.The stakes of the test in this particular administration could be considered as “semi-high” orpossibly “varying”. From a system perspective, this first administration of the tests wasregarded as a try-out. However, teachers could use the test results as a support for grading ifthey wanted to. Hence, it is possible that different schools and different teachers haddifferent opinions about the dignity of these national tests. From 2010, schools have touse the results from these tests as a support for grading and the stakes of the test willtherefore rise in coming administrations. It can be noted that this national test is not the onlynational test the pupils took the given semester. In the spring semester in grade 9, all pupilsalso have national tests in mathematics, Swedish, and English.

Data analysis

All data analyses were made in SPSS version 18.0 (PASW). All analyses were done for thetotal sample, gender-wise and subject-wise. Where the pattern of results was similar for boysand girls, and/or for the different subjects, respectively, only the results for the total sampleare presented in the text. Means and standard deviations were calculated on the item andscale level, as was the percentage of students agreeing or disagreeing with each item.Possible differences in mean ratings on the scale level between boys and girls and betweenpupils taking national tests in different subjects were analyzed through t tests and univariateanalysis of variance (ANOVA) followed by Bonferroni post hoc analysis of mean differ-ences. Where significant differences were flagged, effect sizes (Cohen’s d) were calculatedin order to estimate the magnitude of the difference between mean values. All tests ofsignificance were two-tailed. Inter-item correlations and internal consistency reliability(Cronbach’s alpha) were calculated for the derived scales. No reliability estimateswere obtained for the national tests as we did not have access to raw data but onlythe test scores.

Pupil perceptions of national tests in science 501

Results and analysis

The results section is structured as follows: First, descriptive data on student responses onthe item level are presented and briefly discussed. With some exceptions, results on the itemlevel are presented for the total sample only. There were some statistically significantdifferences between groups, but effect sizes were generally small and the practical relevanceof these differences was judged insignificant. Scale means and reliabilities for the Impor-tance, Effort, and Anxiety scales are reported together with tests of differences of scalemeans (for the total sample, boys and girls, respectively). Second, relationships betweenperceived importance of the test, reported invested effort, and test anxiety are presented,followed by the relationships between ratings on these scales and test performance. This isdone for the total sample, for boys and girls, and for the different subjects, respectively.Where relevant, pupil ratings are compared with student ratings from a study of test-takingmotivation performed in the low-stakes TIMSS (Trends in International Mathematics andScience Study) Advanced context (see Eklöf 2010).

Perceived importance of the national test

Whether an individual perceives a test as important or not is theoretically assumed, and hasbeen empirically demonstrated, to be related to the individual’s motivation to spend effort onthe test (Thelk et al. 2009). Increasing the stakes on tests may, however, also have negativeinfluence on the testing situation (Firestone et al. 1998) and on feelings of test anxiety. InTable 1, results from five items asking whether the pupils perceived the national test as animportant test are presented (Items I1–I5).

About half the pupils reported that the national test was an important test to them, whilethe other half reported that the national test was not very important to them. A majority ofpupils, however, thought that it was important to get a good result on the national test. Amajority of the pupils further reported that they were curious about the score they receivedon the test and that they cared about their result on the test. Still, few pupils seem to haveperceived the national tests as very high-stakes, and there is a fair group of pupils who do notseem to perceive the national test as important. This group is somewhat larger among thepupils taking the chemistry test than among the pupils taking the biology or the physics test.Thus, from a pupil perspective the national tests appear not to be fully of high-stakecharacter. As noted above, it is possible that different teachers have put different weighton these tests when presenting them to the pupils. Also, these pupils took national tests inother school subjects just weeks before the national test in science, and it is possible that thestakes of the science national test were perceived as lower than the stakes of the othernational tests, which they on this particular occasion also were.

Nevertheless, compared to a sample of grade 12 students participating in the TIMSSAdvanced 2008 field test (a true low-stakes test), the pupils in the present sample found thenational test an important test. In the TIMSS Advanced sample, 93% of the studentsperceived the test as unimportant, 68% of the sample did not care about the result they goton the test, and 64% claimed that it was not important for them to do their best on the test(see Eklöf 2010).

The five items in Table 1 were supposed to form one Importance scale. However, one ofthe items (I5) was basically unrelated to the other four items according to correlationanalysis, exploratory factor analysis, and reliability analysis, thus this item was not includedin the scale. Thus, four items in Table 1 form a scale measuring perceived importance of thetest. Table 2 presents mean values on this scale for the total group, boys and girls, and by

502 H. Eklöf, M. Nyroos

subject. The internal consistency reliability of this four-item Importance scale was α00.78,which is acceptable but somewhat lower that the reliabilities typically reported from theoriginal SOS scale (see Thelk et al. 2009). On a scale level, there were no statisticallysignificant differences in perceived importance of the test between pupils taking the biology,physics, or chemistry test. Girls reported a higher level of perceived importance on the scalelevel than the boys did, but the gender difference was not statistically significant [t (1,087)01.695, p0ns].

Effort invested on the national test

To investigate pupils’ reported level of motivation and effort when completing the nationaltest, six items asking for different aspects of motivation and effort were included in thequestionnaire. Results on these items are presented in Table 1 (Items E1–E6) in terms ofpercentage of pupils in each response category. Results are not entirely consistent. Amajority of pupils report that they did their best on the test, and a majority of pupils alsoreported that they felt motivated to do their best. Still, more than 30% of the pupils reportedthat they did not feel motivated to do their best on the test, and more than half the samplereported that they could have tried harder on the test. Doing one’s best is perhaps interpretedby the pupils as “doing enough given the significance of the assessment context”. Again, it ispossible that the present test was not perceived as high-stakes, depending on the teacher’sattitude towards the test and possibly due to the fact that the pupils had had other importanttests previously during the spring semester.

Table 1 Items asking for perceived importance of the test, reported motivation and effort and test anxiety, interms of percentages in each response category for the total group

Item Disagree alot (%)

Disagree(%)

Agree(%)

Agree alot (%)

I1 This was an important test to me 17.9 34.3 33.6 14.2

I2 It was important to me to get a good result on this test 7.8 30.5 42.5 19.1

I3 I am very curious about the result I received on this test 10.2 22.4 38.9 28.5

I4 I do not care about the results I receive on this testa 34.4 43.0 15.3 7.2

I5 I am not interested to compare my result on the test tothose of my classmatesa

13.0 28.1 37.0 21.9

E1 I did my best on this test 2.5 9.7 43.3 44.5

E2 I worked with all items in the test without giving up 4.7 23.4 49.7 22.2

E3 I felt motivated to do my best on this test 6.8 25.9 45.0 22.3

E4 I was not concentrating on the task when I worked withthis testa

21.4 43.0 27.2 8.4

E5 I could have tried harder on this testa 10.9 28.3 42.9 17.9

E6 I spent more effort on this test than I do on other tests wehave in school

29.3 48.7 17.5 4.6

A1 I felt calm and secure while taking this testa 28.5 7.8 14.9 48.8

A2 Before taking this test, I worried about how difficult itwould be

11.5 18.3 36.9 33.3

A3 I was scared of failing on this test 14.5 26.8 31.7 27.0

A4 I was so nervous when I took this test that I forgot thingsthat I usually know

19.9 38.9 28.4 12.8

a Reversed before analyzed. Items I5 and E6 not included in derived scales

Pupil perceptions of national tests in science 503

Pupil ratings on the Effort items were compared to ratings on corresponding items amonga sample of grade 12 students participating in the low-stakes TIMSS Advanced 2008 fieldtest. The comparison shows that the pupils in the present sample reported more investedeffort than the students in the TIMSS sample. In the TIMSS Advanced sample, almost 80%of the students claimed that they tried less hard on the TIMSS test than they do on other testsin school; more than 80% claimed that they could have tried harder on the test, and less than50% reported that they felt motivated to do their best on the test.

The six items in Table 1 were assumed to form one scale. However, one item (E6) wasweakly related to the other items in the scale according to correlation analysis, factoranalysis, and reliability analysis, thus this item was not included in the scale. Thus, theEffort scale consisted of five items. The internal consistency reliability of this five-itemEffort scale was α00.72 in the present sample, which is acceptable but lower than thereliability typically reported in the original SOS-Effort scale, where reliabilities in the mid-1980s are common (see Thelk et al. 2009). Note, however, that the present Effort scale wasnot identical to the SOS Effort scale. Table 2 presents mean values on this scale for the totalgroup, boys and girls, and by subjects. There was a statistically significant difference interms of mean level of reported effort where pupils taking the physics test reported moreeffort than pupils taking the chemistry test [t (782)02.704, p00.007]. However, the effectsize was small, d00.0.19. Boys on average reported a higher level of motivation andinvested effort than the girls did, but the gender difference was not significant accordingto the t test that was performed [t (1,091)0−1.866, p0ns].

Reported test anxiety

Test anxiety is a psychological aspect of test-taking that can affect achievement in a givenassessment situation as well as future attitudes and feelings towards assessment and evalu-ation more generally. To investigate pupils’ reported level of test anxiety when completingthe national test, four items asking for different aspects of test anxiety were administered.Results on these items are displayed in Table 1 (Items A1–A4).

Results suggest that many pupils experienced a certain amount of test anxiety before andduring test-taking. Although a majority of pupils reported that they felt calm and secure

Table 2 Means and standard deviations for the Effort scale, Importance scale, and Anxiety scale as well asmean values for the different subjects for the total group, boys and girls, respectively

All Boys Girls Biology Physics Chemistry

n01,168 n0571 n0597 n0328 n0362 n0478

M SD M SD M SD M SD M SD M SD

Test-taking effort 2.83 0.58 2.86 0.59 2.80 0.56 2.84 0.56 2.89 0.59 2.77 0.57

Perceived importance 2.77 0.71 2.74 0.72 2.81 0.69 2.81 0.68 2.81 0.69 2.72 0.73

Test anxiety 2.56 0.68 2.42 0.66 2.70 0.66 2.54 0.66 2.59 0.70 2.56 0.67

Biology 16.51 6.86 15.24 6.53 17.48 6.95

Physics 18.55 6.05 19.03 6.01 18.04 6.06

Chemistry 16.62 8.55 16.36 8.69 16.88 8.42

Possible values in the test-psychological scales ranged from 1 to 4. Raw scores are used in the Swedishnational test results and the different tests did not have the same maximum score (biology033, physics037,chemistry045)

504 H. Eklöf, M. Nyroos

during the test session, it was still more than 35% of the pupils that reported that they did notfeel calm during the test. Further, more than 70% of the pupils reported that they worriedbefore the test, more than half the pupils reported that they were scared of failing on the test,and more than 40% of the pupils reported that they were so nervous when they took the testthat they forgot things that they usually know. These figures can be compared to resultsobtained from the low-stakes TIMSS Advanced test. In TIMSS Advanced, 93.5% of thestudents in the sample (n0163) claimed that they did not feel nervous before the test and83% of the students claimed that they felt calm while taking the test. Further, only 3% of theTIMSS students agreed a lot (and 9% agreed) with the statement that they felt so nervousduring the test that they could not achieve at their best. The corresponding number in thepresent sample is 14% and 29%, respectively. Hence, the problem with test anxiety seemsmore pronounced in the national test context than in the low-stakes TIMSS Advancedcontext.

The internal consistency reliability of this four-item anxiety scale was α00.67, a ratherlow reliability coefficient. Table 2 presents means and standard deviations for the totalgroup, boys and girls, and by subject. On a scale level, there were no statistically significantdifferences between pupils taking tests in different subjects in terms of test anxiety. How-ever, the girls on average reported a significantly higher level of test anxiety than the boysdid: t (1,106)07.055, p00.000, d00.42. This finding is in line with previous research (e.g.,Chapell et al. 2005).

Relationships between the importance, effort, and anxiety scales

As studies have shown that anxiety tends to go up when the intrinsic value or perceivedimportance of the test is high (Covington and Omelich 1987; Wolf and Smith 1995) apositive and at least moderate relationship between reported test anxiety and perceivedimportance of the test was expected. As perceived importance is assumed to be associatedwith a higher level of motivation to spend effort on the test, a positive relationship betweenthese two scales was also expected. Results also showed a positive correlation between theAnxiety scale and the Importance scale (r00.34). Hence, those pupils who perceived thenational test as important (desirable) also were more likely to feel anxious before and duringthe test (less desirable). Still, the relationship was somewhat weaker than expected. It can benoted, however, that the relationship between perceived importance of the test and testanxiety was stronger for the girls (r044) than for the boys (r00.23). Test anxiety also seemsto be more strongly related to perceived importance of the task than to reported effort andmotivation: the Effort scale and the Anxiety scale were uncorrelated (r0−0.02). Thecorrelation between the Importance scale and the Effort scale was positive and the size ofthe correlation was moderate (r00.55). These findings make theoretical sense and are in linewith findings from other studies (Eklöf 2010; Thelk et al. 2009; Wolf and Smith 1995).

Relationships between importance, effort, anxiety, and test performance

Table 3 displays the relationships, in terms of correlation coefficients (Pearson’s r), betweenthe three derived scales and test score. This is done for the total sample, for boys and girls,and for the different subjects, respectively.

Results show that in the total sample and in all sub-samples, there is a positive relation-ship between reported test-taking effort and test score as well as between perceived impor-tance of the test and test score. The relationship between reported test anxiety and testperformance is negative, as expected, however somewhat lower than expected, particularly

Pupil perceptions of national tests in science 505

for some sub-samples (e.g., the girls and the pupils taking the chemistry test). In mostinstances, the relationship between effort and test score relationship is stronger than therelationship between importance and test score, which has also been found in other studies(cf. Thelk et al. 2009). Some gender differences are worth noting: the relationship betweenreported effort and test performance was stronger for the boys than for the girls, as was therelationship between test anxiety and test performance. As concerns test score, the girlsscored significantly better than the boys on the biology test, while there were no genderdifferences in physics or chemistry score. In general, the pupils did very poorly on thechemistry test, while they scored somewhat better in biology and physics.

Concluding discussion

The present paper has presented a first impression of how a sample of Swedish grade 9pupils perceived the newly implemented national tests in science. The presentation hasfocused on psychological aspects of test-taking in terms of perceived importance of the test,reported effort and motivation, and reported level of test anxiety. These are important issuesto learn about, for at least three reasons. The first has to do with the quality of theassessment. In order to make valid interpretations of test results and to use the results in avalid way, we need to know whether the results are good measures of pupil knowledge orwhether we are in fact confounding knowledge with motivation and/or anxiety. Secondly, weneed to learn more about pupil reactions and behaviors in the assessment situation in order tobetter prepare the pupils for these situations and help to obtain a positive atmosphere aroundtests and assessments, so that the pupils feel that they are motivated to do their best withoutfeeling too anxious about failing on the test. This may have beneficial effects on testperformance then and there, but also on future attitudes towards tests and assessments aswell as the school subject as such. Thirdly, the Swedish national assessment system ischanging, the consequences of these changes need to be monitored, and it is important tostart benchmarking now how pupils perceive and react to these and other tests.

Overall, findings from the present study largely support theoretical assumptions and previousempirical findings in other assessment contexts. Many pupils in the present study seem to haveperceived the new national test as an important test, and many pupils reported that they weremotivated to do their best on the test and that they spent effort on the test items. This also seemsto be coupled with a certain amount of test anxiety for some of the pupils. On the other hand, asomewhat surprisingly large proportion of pupils in the sample did not seem to experience thenational tests as important. As noted previously, this might have to do with the fact that this firstadministration was regarded a pilot administration from a system perspective, and it will beinteresting to follow pupil reactions and behaviors in future administrations of these national

Table 3 Correlations between reported test-taking effort, perceived importance of the test, test anxiety, andnational test score (Pearson’s r), for the total group, gender-wise and subject-wise

All Boys Girls Biology Physics Chemistry

Test-taking effort 0.25* 0.32* 0.18* 0.25* 0.29* 0.22*

Perceived importance 0.20* 0.22* 0.19* 0.21* 0.20* 0.21*

Test anxiety −0.10* −0.16* −0.06 −0.14* −0.19* −0.04

*Correlation is significant at the 0.01 level. Note, however, that sample size was relatively large (n>300) in allcomparisons, why even rather low coefficients are deemed statistically significant

506 H. Eklöf, M. Nyroos

tests. Still, compared to previous findings in the low-stakes TIMSS context, the pupils in thepresent sample scored much higher on the Importance, Effort, and Anxiety scales. This wasexpected, as the stakes of the national test are higher than the TIMSS test and according totheoretical assumptions, the perceived value of a good performance should rise.

The relationship between scores on the Importance and Effort scales, respectively, andtest score was positive and significant. The strength of the correlations is similar to findingsin a number of previous studies in low-stakes assessment contexts (cf. Eklöf 2006; Eklöf etal., under review; Wise and DeMars 2005). Correlations may not seem very high but mightstill be practically important. They indicate that reported effort and perceived importance ofthe test may have effects on test performance and that these might be variables important toconsider, also in the national test context. The relationship between ratings of test anxietyand test score, on the other hand, was somewhat unexpectedly low. A fair proportion ofpupils reported feeling anxious before and during the test, and several previous studies haveshown that test anxiety can have a strong detrimental effect on test performance. Therelationship between the Anxiety scale and the Importance scale was also somewhat weakerthan what could have been expected. It is possible that the items measuring test anxiety inthe present study were not adequate. The reliability of the test anxiety scale was rather low,and the number of items in the scale was small. It is also possible that test anxiety did nothave very strong effects on performance in this particular test administration. Regardless ofwhich, this issue needs further attention in future studies, as there are good reasons to believethat test anxiety will be an important variable to monitor over time in a system with anincreasing number of tests and indications of rising test stakes.

In general, there were small differences between groups of pupils taking tests in differentsubjects in the present study. Pupils taking the chemistry test were somewhat more negativein terms of perceived importance and invested effort, and the correlations between effort andtest score and anxiety and test score were somewhat lower than for the other subjects. Thechemistry test was also very difficult for the pupils in the sample, which may have impactedon the relationship between effort, anxiety, and test score. Overall, however, the same patternemerged for all three subjects. As concerns gender differences, it was shown that ratings ofeffort and anxiety were more strongly related to performance for the boys than for the girls.This difference has been observed in previous studies as well (Eklöf 2007; Eklöf et al., underreview). It is possible that boys’ performances are more affected by feelings of test-takingmotivation and test anxiety; that the boys to a higher degree need to feel motivated in orderto do their best. It is also possible that boys and girls have different response styles whencompleting questionnaires. From the design of the present study, these differences cannot beexplained, but the finding is interesting and the issue is worthy of further attention.

Motivation and anxiety can be viewed as two sides of the same coin, and it is apedagogical as well as a psychological challenge to motivate the pupils to do their best,and at the same time making the pupils comfortable in the assessment situation. Teachers’/proctors’ interpretation and spoken purpose of the national tests likely contribute to pupils’perception of them. Creating a safe and allowing classroom climate, and preparing potentialpupils to cope with possible feelings of anxiety, while at the same time emphasizing theimportance of doing one’s best, is a delicate task for teachers to pursue.

The present study has a number of limitations and one has to do with uncertainties asconcerns reliability of the measures used. The reliabilities of the new national tests are not yetknown to us, and the internal consistency reliabilities of the Effort, Importance, and Anxietyscales were somewhat lower than expected, which may have impacted the power to observedifferences. Hence, in this respect, results from this study must be interpreted with caution.Gathering psychometric information on the national tests is an important task to pursue in future

Pupil perceptions of national tests in science 507

studies, as is further investigation of the psychometric properties of the scales used in thepresent study. Further, the information we have about pupils’ reactions to these tests are basedsolely on self-reports on questionnaire items. The present study is limited also as we know littleabout pupil background and school setting, and nothing about how individual teachers, whoalso are the ones who proctor these tests, have presented the tests to the pupils. We also do notknow how these national tests were perceived compared to the national tests in for examplemathematics, which have been in use in grade 9 for many years.

Still, although the present study has a number of limitations and is restricted to one nationalsample, we believe that it taps into issues that are urgent and important to consider from anassessment validity perspective as well as an assessment policy perspective. This is true notleast in the light of the changing assessment climate in Sweden and in other European countries.We do not knowwhat effects the increasing number of tests will have on pupil attitudes towardstests, towards the subject tested, or towards future studies. We also do not know if and how theincreasing number of tests will affect feelings of test anxiety and pupils’ motivation to do theirbest on each single test. One could imagine that pupils, with an increasing number of tests, willnot be motivated to spend effort on every test, but that they will prioritize and select where toinvest effort and not. One could also imagine that an increasing number of high-stakes tests willlead to larger problemswith test anxiety. This could be a problem not only for the pupils but alsofor teachers, schools, and policy makers, particularly if tests results are used for accountabilitypurposes. It thus feels urgent to keep monitoring these processes as they happen.

Further exploration of the psychology of test-taking in a variety of assessment contextsusing a variety of methods could contribute with valuable theoretical understandings as wellas applied knowledge about what happens when pupils face assessment tasks, and how theoutcomes of these assessment tasks could be validly interpreted and used.

References

Atkinson, J. W. (1957). Motivational determinants of risk-taking behavior. Psychological Review, 64, 359–372.

Baumert, J., & Demmrich, A. (2001). Test motivation in the assessment of student skills: the effects ofincentives on motivation and performance. European Journal of Psychology of Education, 16, 441–462.

Benson, J. (1998). Developing a strong program of construct validation: a test anxiety example. EducationalMeasurement: Issues and Practice, 17, 10–17.

Birenbaum, M., & Gutvirtz, Y. (1993). The relationship between test anxiety and seriousness of errors inalgebra. Journal of Psychoeducational Assessment, 11(1), 12–19.

Boekaerts, M., Otten, R., & Voeten, R. (2003). Examination performance: are student’s causal attributionsschool-subject specific? Anxiety, Stress, and Coping, 16(3), 331–342.

Brown, S. M., & Wahlberg, H. J. (1993). Motivational effects on test scores of elementary students. TheJournal of Educational Research, 86, 133–136.

Chapell, M. S., Blanding, B., Silverstein, M. E., Takahahi, M., Newman, B., Gubi, A., & McCann, N. (2005).Test anxiety and academic performance in undergraduate and graduate students. Journal of EducationalPsychology, 97(2), 268–274.

Covington, M. V., & Omelich, C. L. (1987). “I knew it cold before the exam”: a test of the anxiety-blockagehypothesis. Journal of Educational Psychology, 79, 393–400.

Eccles, J. S., &Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual Review of Psychology, 53,109–132.

Eklöf, H. (2006). Development and validation of scores from an instrument measuring student test-takingmotivation. Educational and Psychological Measurement, 66, 643–656.

Eklöf, H. (2007). Test-taking motivation and mathematics performance in TIMSS 2003. International Journalof Testing, 7, 311–326.

Eklöf, H. (2010). Reported motivation and effort in the TIMSS Advanced 2008 Field Study. Gothenburg: Paperpresented at the 4th IEA International Research Conference. July 2010.

508 H. Eklöf, M. Nyroos

Eurydice, (2009). National testing of Pupils in Europe: Objectives, Organisation and Use of Results. Down-loaded 2010-0316 from: http://www.eurydice.org

Firestone, W. A., Mayrowetz, D., & Fairman, J. (1998). Performance-based assessment and instructionalchange: the effects of testing in Maine and Maryland. Educational Evaluation and Policy Analysis, 20(2),95–113.

Gumora, G., & Arsenio, W. F. (2002). Emotionality, emotion regulation, and school performance in middleschool children. Journal of School Psychology, 40(5), 395–413.

Inzlicht, M., & Ben-Zeev, T. (2000). A threatening intellectual environment: why females are susceptible toexperiencing problem-solving deficits in the presence of males. Psychological Science, 11(5), 365–371.

Messick, S. (1995). Validity of psychological assessment: validation of inferences from persons’ responsesand performance as scientific inquiry into score meaning. American Psychologist, 50, 741–749.

Owens, M., Stevenson, J., Norgate, R., & Hadwin, J. A. (2008). Processing efficiency theory in children:working memory as a mediator between trait anxiety and academic performance. Anxiety, Stress, andCoping, 21(4), 417–430.

Pintrich, P. R., & De Groot, E. V. (1990). Motivational and self-regulated learning components of classroomacademic performance. Journal of Educational Psychology, 82, 33–40.

Swedish National Agency for Education. (2007). Provbetyg–Slutbetyg–Likvärdig bedömning? En statistiskanalys av sambandet mellan nationella prov och slutbetyg i grundskolan, 1998–2006. In Test grade–finalgrade–fair assessment? A statistical analysis of the relationship between national tests and final gradesin compulsory school, 1998–2006. Stockholm: Swedish National Agency for Education. Report no. 300.

Swedish National Agency for Education. (2009). TIMSS Advanced 2008. Svenska gymnasieelevers kun-skaper i avancerad matematik och fysik i ett internationellt perspektiv. In TIMSS Advanced 2008. Swedishstudents’ proficiency in advanced mathematics and physics in an international perspective. Stockholm:Swedish National Agency for Education. Report 336, 2009.

Sundre, D. L., & Kitsantas, A. (2004). An exploration of the psychology of the examinee: can examinee self-regulation and test-taking motivation predict consequential and non-consequential test performance?Contemporary Educational Psychology, 29, 6–26.

Sundre, D. L., & Moore, D. L. (2002). The student opinion scale: a measure of examinee motivation.Assessment Update, 14, 8–9.

Thelk, A. D., Sundre, D. L., Horst, S. J., & Finney, S. J. (2009). Motivation matters: using the Student OpinionScale to make valid inferences about student performance. The Journal of General Education, 58, 129–151.

Wigfield, A., & Eccles, J. S. (1989). Test anxiety in elementary and secondary school students. EducationalPsychologist, 24(2), 159–183.

Wise, S. L., & DeMars, C. (2005). Low examinee effort in low-stakes assessment: problems and possiblesolutions. Educational Assessment, 10(1), 1–17.

Wolf, L. F., & Smith, J. K. (1995). The consequence of consequence: motivation, anxiety, and test perfor-mance. Applied Measurement in Education, 8, 227–242.

Wren, D. G., & Benson, J. (2004). Measuring test anxiety in children: scale development and internalconstruct validation. Anxiety, Stress, and Coping, 17(3), 227–240.

Zeidner, M. (1998). Test anxiety: the state of the art. New York: Kluwer.Zeidner, M. (2007). Test anxiety in educational contexts: concepts, findings, and future directions. In P. A.

Schutz & R. Pekrun (Eds.), Emotion and education (pp. 165–184). San Diego: Elsevier.

Hanna Eklöf. Department of Applied Educational Science, Umeå University, 901 87 Umeå, Sweden. E-mail:[email protected]; Web site: www.edusci.umu.se

Current themes of research:

Hanna Eklöf does research in the field of educational psychology, psychometrics, and large-scale national andinternational assessment, with a particular focus on motivational issues in the testing situation and onvalidity issues. She currently has a research grant from the Swedish Research Council for the project“Pupils’ perceptions of and attitudes towards tests in school”. She also recently received an award fromUmeå University with funding for the project “The tests and the test-takers”.

Pupil perceptions of national tests in science 509

Most relevant publications:

Eklöf, H. (2006). Development and validation of scores from an instrument measuring student test-takingmotivation. Educational and Psychological Measurement, 66, 643–656.

Eklöf, H. (2007). Test-taking motivation and mathematics performance in TIMSS 2003. International Journalof Testing, 7, 311-326.

Eklöf, H. (2010). Skill and will: test-taking motivation and assessment quality. Assessment in Education:Principles, Policy, & Practice, 17, 345–356.

Mikaela Nyroos. Department of Applied Educational Science, Umeå University, 901 87 Umeå, Sweden. E-mail:[email protected]; Web site: www.edusci.umu.se

Current themes of research:

Large-scale assessment and its implications for learning, test anxiety. Involved in a research project exploringthe effects of national examination in mathematics. The aim is to develop profound knowledge about hownational examinations in mathematics influence young pupils’ mathematical learning in compulsoryeducation. The project is particularly paying attention to cognitive implications for learning in mathe-matics, and pupils’ experiences of assessment and test situations.

Most relevant publications:

Nyroos, M. (2011). Introducing national examination in Swedish primary education: implications for testanxiety. Paper presented at the China–Sweden Symposium on Science and Humanities Education in the21st Century, Southwest University, Chongqing, China

Nyroos, M. (2010). Young pupils and national test: cognitive implication for learning in mathematics. Paperpresented at the 16nd European Conference on Educational Research, Helsinki, Finland

Nyroos, M., Rönnberg, L. & Lundahl, L. (2004). A Matter of Timing: time use, freedom and influence inschool from a pupil perspective. European Educational Research Journal Vol 3 (4), 743–758

510 H. Eklöf, M. Nyroos