E-Assessment in Denmark – E-exams and CAT as a Pedagogic Tool Jakob Wandall, Chief Adviser,...

Preview:

Citation preview

E-Assessment in Denmark – E-exams and CAT as a Pedagogic Tool

Jakob Wandall, Chief Adviser, Skolestyrelsen (Danish National School Agency)

Tallinn, Estonia14’th October 2010

Danish Assessment in schools

• Background • Test and exams in Denmark

– The Exams

– The story behind the testing system

– The design of the adaptive system

– The differences between danish tests and exams

• Challenges & Future possibilities

Background: The Danish School system

Facts of the public primary and lower secondary education in Denmark, ”Folkeskolen”:

– 600.000 pupils, 60.000 pr. form

– 1.300 ordinary folkeskoler, 600 other institutions (e.g. for special education)

– School governed by decentralized local governments significant differences from municipality to municipality

– 4-5 pupils pr. new computer (<3 year)

– highspeed internet-connection almost everywhere

Infrastructure suited for It-based assessment

Background: the Danish School system

Teachers in Denmark

• Until 2004: No common objectives between 1. and 9. form in “Folkeskolen” significant differences from school to school

• Poor tradition for standardized assessment of pupils – e.g. weak tradition for testing and use test results

• Strong culture for independent, self-governing teachers with focus on “soft evaluation methods”

• Weak tradition for leadership in school

• Anxiousness of the control aspect in the test system.

Background: the Danish School system

Types of assessmentFormal – provided by the MoE• Final exams in different subjects by the end of

9’th form • National tests in different subjects/forms

Informal - Locally provided • Tests and other kinds of assessment and

evaluation

Background: the Danish School system

Tests & Exams - in the Danish School system

Long tradition for exams• Preparation for confirmation in 1736 – School was

provided by the church

• First exams for school, that were defined by law in 1814. Decree about peasants-school (precurser for the act of Folkeskolen)

• Revision of examination regulations in 1899, 1903, 1937, 1958, 1975, 1993 og 2006

• 1975 - 2006 exams were optional – mandatory from 2006, with a chance for dispensation.

Weak tradition for using test• Increasing use of test during the last decade

• National test introduced in the legislation in 2006

School leaving exams• Final exams in the end of form 9 compulsory in

public schools from 2006• Renewed, modernized and increased in number

(more subjects and some E-based exams from 2005)

• Ensuring a better foundation for completing a post-compulsory education

• Each student must sit for minimum 9 exams• Assessed by external examiner and the teacher• No failing criteria – but formally qualifying -

requirement for further education • Aggregated results are published for each school Summative evaluation, High Stake

3 types of exams:

• Verbal/practical exams (7)

• Written exams – eg. Essays (2 + 3)

• Standardised test exams – including 2 E-based (5 + 3)

School leaving exams in form 9

• E-based exams has been administered in science (Biology and Geography) since 2005

• Delivered via internet (on-line)

• Linear format – 20 tasks, 50-60 items, ½ hour

• Traditional types of Items - Mostly multiple choice

• Not tested on pupils

• Large diversity in contend (no IRT)

E-based standardised exams

Example: Biology – summer 2009

Example: Biology – summer 2009

Example: Geography – summer 2009

Exams: scale and activity

The scale comparable with ECTS-scale12 = A10 = B 7 = C 4 = D 2 = E

00 = F-03 = Fx

Activity: 700.000 exams in 2008/09

High rate of completion: Completed: 98,3%

Exempted: 0,7%

Absentees: 1,0%

The National testing system- a tool for improving the culture of evaluation

Background for the Danish National tests

1. PISA-surveys (2000-2006)

2. OECD-review on Denmark (2004) and National reports on assessment in school (2004)

3. Governments initiatives according to the OECD-Recommendations

The Danish PISA results, 2000 - 2006

450

460

470

480

490

500

510

520

530

Reading Math Science Prob.solving

2000

2003

2006

OECD-average 2006

Review of National Policies for Education, Denmark, OECD 2004 Denmark has one of the most expensive education systems in the world, and for years perceived it to be one of the best in the world.

However the disappointing results of recent international tests to measure schooling outcomes confirmed earlier evidence, that the system actually is underperforming. ”

Strengths, weaknesses, 35 recommendations

Some weaknesses:• Poor tradition of pupils assessment, • Insufficient teacher qualifications in assessment techniques• Insufficient exchange of best practices between teachers

Some recommendations for the Minister:• Development of criteria-based tests.• Development of different assessment methods and -

materials.• Carry out a policy based on the principle, that test results

don’t get published for ranking purposes.

OECD-review 2004

Government initiatives to improve the culture of evaluation

PupilsPlan (written)

National Board

Internationalsurveys

Local government.supervision

Qualityreports

Evaluationhomepage

Nationaltests

CompulsoryExams

Common Objectives

Culture of evaluation

Test and assessment systems in some countries

Reference countries to Denmark

Implementation of OECD’s recommendations

• The Government initiatives followed the OECD recommendations with some modifications – e.g. in the testing system

• The OECD-team recommended criteria-based test - recommendation based on a different tradition of testing/evaluation

• Background: The OECD-team came from England, Canada and Finland

Two different traditions for assessment/use of test results

1. The Nordic / continental Terminology, Nordic/German origin (Danish: Prøve,

vurdering, bedømmelse, opgave, karakter)

2. The Anglo-AmericanTerminology, English origin (Danish: test, score, item)

Traditions for assessment/testing, Why testing? What are the results used for?

Traditions: Nordic/continental English/American

Purpose Learning Pupil Analysis The “system”/Professionals

Focus on equality and solidarity

Focus on ambitions and elite

Solicitude Fairness

Use of results Pedagogy Control of outcome

Didactics / Teaching Financing / grants

Formative / Low-stake Summative / High-stake

Control/financing

Learning

PupilAnalysis Teacher

Th

e u

se o

f th

e

te

st r

esu

lts

The purpose of testing

Traditions for testing, Why test pupils – What are the results used for

Pedagogy/Teaching

• Test in combination with other assessment tools– Easy to use for the teacher– Flexible systems from the schools point of view– Low/no cost for the school

• Priority to pedagogical purposes – Formative – Low stake- For teachers assessment of the pupils- The teacher sets the rules

• Effective Self-correcting tests– valid, – reliable – detailed results – Max. 45 minutes (1 lesson)

Central administered internet-based Computer adaptive testing - with focus on pedagogy

Criteria for choice of strategy

The Danish testing system

The national tests

X 10 compulsory tests

Forms where tests can be used

Main features of the Danish national tests

• IT-based, automatically scored, provided freely by the National School Agency.

• Pedagogical purpose The teacher sets the rules for the test• The teacher are booking & administering the tests and

interpreting the test results• Different feedback on various levels (teacher, headmaster,

municipality)• The teacher gets detailed results online for

– the class, – the individual pupil and – Details of the tests, including the items in the individual tests.

CAT (Computer Adaptive Testing)

• Adaptive = Adapts to the individual pupils ability:Correct answer More difficult questionsWrong answer Easier questions

The test is most efficient when item difficulty = pupil ability

• More effective testing more detailed results: Adaptive within 3 “profile areas” 3 tests in 1(E.g. English: Reading, Vocabulary and Language usage)

• Simple principles, but a few tricky conditions:– Extensive demands to the technology – both capacity and stability.– Very large Item-pools with the exact right mix of high quality items

Creating the item poolsCondition for adaptive testing is:• That the difficulty of the items are well defined and stabile

over time (homogeneity). • No differential item function. • Sufficient number of items, so that even the fastest pupils

don’t run out of items• That the items are evenly distributed on item-difficulty

challenges for all pupils.

Therefore following requirements have to be met: • Minimum 540 items pr test (180 pr. Profile area) – evenly

distributed on difficulty levels• All items are tested on a large number of pupils (500-700)• The items are required to fit a Rasch model• Not more than 3 runs pr pupil/test (including compulsory

test)

Item number

Item

dif

ficu

lty

/ pu

pil

abili

ty le

vel

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30 35 40 45 50

An exampel: Adaptive test –

English reading

Item number

Item

dif

ficu

lty

/ pu

pil

abili

ty le

vel

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30 35 40 45 50

Adaptive test – English reading≤ 5 items pr.

profile area> 5 items & SEM > 0,3 SEM ≤ 0,3

Adaptive test – 3 profile areas

Item number

Item

dif

ficu

lty

/ pu

pil

abili

ty le

vel

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30 35 40 45 50

Reading R-ResultVocabulary V-ResultLanguage usage L-ResultWrong Correct

The test results

• Different feedback on various levels (parents, teacher, headmaster, municipality)

• The teacher gets detailed results online for – the class, – the individual pupil and – Details of the tests, including the items in the

individual tests.

Presentation of the results:Only online for teachers (no print-outs)

Test results (average of pupils results) from the class (translated by Google)

Graphical illustration of the class results

Presentation of the results:

Presentation of the results:Results of the individual pupils and the class

Presentation of the results:Details of the individual pupils

Presentation of results:A letter for the parents

Presentation of the results:Details of the individual pupils

Presentation of the results:Detailed information on the pupils test course

Presentation of the results:View the individual responses

Website launched

2007 2008 2009 2010

Review

Contract signed The first 3

test launched

Contract prolonged

Status on the implementation:Lesson learned – it takes time!

• The first 3 test was launched 2007, with reduced item banks• There has been conducted an expert review that showed, that

the psychometric standard was at a very high level, but the quality of the items and size of the item pools were insufficient

• Consequence: The planned full scale launch march 2008 vas delayed – instead - Development and testing of items in 2008 and 2009

• Full scale tests of the test system in autumn 2009• Test system launched march 2010

Development and testing of items

2009 2011

Test system launched

Folkeskolen’s final exams• High-stake, criteria based, goal-orientet, mixture of

standardised and non-standardised exams. ”Teaching to the test” not a problem, as test-objectives are similar to the objectives for the teaching (Final national Common Objectives).

• Precise rules for how to conduct an exam - same items, same time, same conditions – eg. aids ect. – for all pupils.

• Formal documentation of the education – school results are published

The differences between tests and exams

The differences between tests and exams

National tests• Low-stake, norm referenced, standardised proficiency

test with a diagnostic element, which can be used as progress / added value testing.

• The teacher decides almost everything – when, how, where, how long time, how many items etc.

• Internal assessment tool

Prioritised purposes:1: Pedagogic tool for teachers and headmasters – Only

access for those who need the results2: Documentation for the academic level

Exam- and test items

Items look alike, but their purpose are different:• Main goal for exams is to provide documentation with legal effect.• Main goal for tests is to provide an assessment of the student’s academic

level for pedagogical purposes.

General requirements for items – distractors must be both plausible and unambiguous :

• Examination items: Focus on unambiguous (one correct answer!) –accessible for public / the right to complain

• Test items: Focus on plausibility. (more or less correct answers – gives challenges to pupils more difficult questions).

Quality control • Exams are designed by exam commissions – skilled members with teacher

background and long experience• Test items are written by experienced teachers, quality controlled by

experts, tested on 5-700 pupils and fits the Rasch model

Present challenges&

Future possibilities

Challenges for Denmark

• Technique: To make the it-systems work flawlessly (booking, test and results)

• Test development: To further develop the test system (e.g. only one item pool pr. subject, including time dimension in the difficulty - decoding items)

• Expand item pools: To maintain and further develop high quality item pools

• Reporting results: To develop user friendly ways to present the results

• Guidance in good practice: To convince the teachers/Schools to use the information – and describe how to do it!!

3. Generations of testing

• Exams:– Purpose: Summative – Assessment of learning (to prove)

• National testing:– Purpose: Formative– Assessment for learning (improvement)

• Next generation: Future testing – Purpose: Optimize learning– Assessment embedded in training

Future possibilities

Existing technology: • Example 1: It-based books, articles and

newspapers with individually auto generated cloze-items. Testing and practice reading simultaneously (American Lexile test).

• Example 2: Tests with items that renews itself (changing text/numbers in items) and requires a correct answer (counting number of try's instead of number of correct answers).

• Example 3: It-based learning style analysis based on pupils responses on questionnaires.

Technology around the cornerDeveloping game-like tests:• Games built around problem solving, • asking about the outcomes and goals • asking how to get there particularly by invoking appropriate challenges to

keep the user on task, • place a premium on user proficiencies to create, innovate and produce, • can involve others in competitive or collaborate actions to solve problems

and create ‘personal bests’, collect information on users as they develop through the game,

• track information across time, • integrate learning and assessment, • provide feedback during the task so that the user becomes more efficient

and effective, • are equitable in that they do not favor slow or fast learners and ignore

background variables such as the home, socio-economic status, or race for an elaboration of each of these ideas.

So many directions for more effective testing that engages learners, emphases testing student’s strategies, and so much learning can accrue from this involvement in testing.

(John Hattie, 2010)

Guidelines on E-based testinghttp://www.intestcom.org/guidelines/index.php

Thank you for your attention!

Check it out: evaluering.uvm.dk