E-Assessment in Denmark – E-exams and CAT as a Pedagogic Tool
Jakob Wandall, Chief Adviser, Skolestyrelsen (Danish National School Agency)
Tallinn, Estonia14’th October 2010
Danish Assessment in schools
• Background • Test and exams in Denmark
– The Exams
– The story behind the testing system
– The design of the adaptive system
– The differences between danish tests and exams
• Challenges & Future possibilities
Background: The Danish School system
Facts of the public primary and lower secondary education in Denmark, ”Folkeskolen”:
– 600.000 pupils, 60.000 pr. form
– 1.300 ordinary folkeskoler, 600 other institutions (e.g. for special education)
– School governed by decentralized local governments significant differences from municipality to municipality
– 4-5 pupils pr. new computer (<3 year)
– highspeed internet-connection almost everywhere
Infrastructure suited for It-based assessment
Background: the Danish School system
Teachers in Denmark
• Until 2004: No common objectives between 1. and 9. form in “Folkeskolen” significant differences from school to school
• Poor tradition for standardized assessment of pupils – e.g. weak tradition for testing and use test results
• Strong culture for independent, self-governing teachers with focus on “soft evaluation methods”
• Weak tradition for leadership in school
• Anxiousness of the control aspect in the test system.
Background: the Danish School system
Types of assessmentFormal – provided by the MoE• Final exams in different subjects by the end of
9’th form • National tests in different subjects/forms
Informal - Locally provided • Tests and other kinds of assessment and
evaluation
Background: the Danish School system
Tests & Exams - in the Danish School system
Long tradition for exams• Preparation for confirmation in 1736 – School was
provided by the church
• First exams for school, that were defined by law in 1814. Decree about peasants-school (precurser for the act of Folkeskolen)
• Revision of examination regulations in 1899, 1903, 1937, 1958, 1975, 1993 og 2006
• 1975 - 2006 exams were optional – mandatory from 2006, with a chance for dispensation.
Weak tradition for using test• Increasing use of test during the last decade
• National test introduced in the legislation in 2006
School leaving exams• Final exams in the end of form 9 compulsory in
public schools from 2006• Renewed, modernized and increased in number
(more subjects and some E-based exams from 2005)
• Ensuring a better foundation for completing a post-compulsory education
• Each student must sit for minimum 9 exams• Assessed by external examiner and the teacher• No failing criteria – but formally qualifying -
requirement for further education • Aggregated results are published for each school Summative evaluation, High Stake
3 types of exams:
• Verbal/practical exams (7)
• Written exams – eg. Essays (2 + 3)
• Standardised test exams – including 2 E-based (5 + 3)
School leaving exams in form 9
• E-based exams has been administered in science (Biology and Geography) since 2005
• Delivered via internet (on-line)
• Linear format – 20 tasks, 50-60 items, ½ hour
• Traditional types of Items - Mostly multiple choice
• Not tested on pupils
• Large diversity in contend (no IRT)
E-based standardised exams
Example: Biology – summer 2009
Example: Biology – summer 2009
Example: Geography – summer 2009
Exams: scale and activity
The scale comparable with ECTS-scale12 = A10 = B 7 = C 4 = D 2 = E
00 = F-03 = Fx
Activity: 700.000 exams in 2008/09
High rate of completion: Completed: 98,3%
Exempted: 0,7%
Absentees: 1,0%
The National testing system- a tool for improving the culture of evaluation
Background for the Danish National tests
1. PISA-surveys (2000-2006)
2. OECD-review on Denmark (2004) and National reports on assessment in school (2004)
3. Governments initiatives according to the OECD-Recommendations
The Danish PISA results, 2000 - 2006
450
460
470
480
490
500
510
520
530
Reading Math Science Prob.solving
2000
2003
2006
OECD-average 2006
Review of National Policies for Education, Denmark, OECD 2004 Denmark has one of the most expensive education systems in the world, and for years perceived it to be one of the best in the world.
However the disappointing results of recent international tests to measure schooling outcomes confirmed earlier evidence, that the system actually is underperforming. ”
”
Strengths, weaknesses, 35 recommendations
Some weaknesses:• Poor tradition of pupils assessment, • Insufficient teacher qualifications in assessment techniques• Insufficient exchange of best practices between teachers
Some recommendations for the Minister:• Development of criteria-based tests.• Development of different assessment methods and -
materials.• Carry out a policy based on the principle, that test results
don’t get published for ranking purposes.
OECD-review 2004
Government initiatives to improve the culture of evaluation
PupilsPlan (written)
National Board
Internationalsurveys
Local government.supervision
Qualityreports
Evaluationhomepage
Nationaltests
CompulsoryExams
Common Objectives
Culture of evaluation
Test and assessment systems in some countries
Reference countries to Denmark
Implementation of OECD’s recommendations
• The Government initiatives followed the OECD recommendations with some modifications – e.g. in the testing system
• The OECD-team recommended criteria-based test - recommendation based on a different tradition of testing/evaluation
• Background: The OECD-team came from England, Canada and Finland
Two different traditions for assessment/use of test results
1. The Nordic / continental Terminology, Nordic/German origin (Danish: Prøve,
vurdering, bedømmelse, opgave, karakter)
2. The Anglo-AmericanTerminology, English origin (Danish: test, score, item)
Traditions for assessment/testing, Why testing? What are the results used for?
Traditions: Nordic/continental English/American
Purpose Learning Pupil Analysis The “system”/Professionals
Focus on equality and solidarity
Focus on ambitions and elite
Solicitude Fairness
Use of results Pedagogy Control of outcome
Didactics / Teaching Financing / grants
Formative / Low-stake Summative / High-stake
Control/financing
Learning
PupilAnalysis Teacher
Th
e u
se o
f th
e
te
st r
esu
lts
The purpose of testing
Traditions for testing, Why test pupils – What are the results used for
Pedagogy/Teaching
• Test in combination with other assessment tools– Easy to use for the teacher– Flexible systems from the schools point of view– Low/no cost for the school
• Priority to pedagogical purposes – Formative – Low stake- For teachers assessment of the pupils- The teacher sets the rules
• Effective Self-correcting tests– valid, – reliable – detailed results – Max. 45 minutes (1 lesson)
Central administered internet-based Computer adaptive testing - with focus on pedagogy
Criteria for choice of strategy
The Danish testing system
The national tests
X 10 compulsory tests
Forms where tests can be used
Main features of the Danish national tests
• IT-based, automatically scored, provided freely by the National School Agency.
• Pedagogical purpose The teacher sets the rules for the test• The teacher are booking & administering the tests and
interpreting the test results• Different feedback on various levels (teacher, headmaster,
municipality)• The teacher gets detailed results online for
– the class, – the individual pupil and – Details of the tests, including the items in the individual tests.
CAT (Computer Adaptive Testing)
• Adaptive = Adapts to the individual pupils ability:Correct answer More difficult questionsWrong answer Easier questions
The test is most efficient when item difficulty = pupil ability
• More effective testing more detailed results: Adaptive within 3 “profile areas” 3 tests in 1(E.g. English: Reading, Vocabulary and Language usage)
• Simple principles, but a few tricky conditions:– Extensive demands to the technology – both capacity and stability.– Very large Item-pools with the exact right mix of high quality items
Creating the item poolsCondition for adaptive testing is:• That the difficulty of the items are well defined and stabile
over time (homogeneity). • No differential item function. • Sufficient number of items, so that even the fastest pupils
don’t run out of items• That the items are evenly distributed on item-difficulty
challenges for all pupils.
Therefore following requirements have to be met: • Minimum 540 items pr test (180 pr. Profile area) – evenly
distributed on difficulty levels• All items are tested on a large number of pupils (500-700)• The items are required to fit a Rasch model• Not more than 3 runs pr pupil/test (including compulsory
test)
Item number
Item
dif
ficu
lty
/ pu
pil
abili
ty le
vel
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40 45 50
An exampel: Adaptive test –
English reading
Item number
Item
dif
ficu
lty
/ pu
pil
abili
ty le
vel
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40 45 50
Adaptive test – English reading≤ 5 items pr.
profile area> 5 items & SEM > 0,3 SEM ≤ 0,3
Adaptive test – 3 profile areas
Item number
Item
dif
ficu
lty
/ pu
pil
abili
ty le
vel
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40 45 50
Reading R-ResultVocabulary V-ResultLanguage usage L-ResultWrong Correct
The test results
• Different feedback on various levels (parents, teacher, headmaster, municipality)
• The teacher gets detailed results online for – the class, – the individual pupil and – Details of the tests, including the items in the
individual tests.
Presentation of the results:Only online for teachers (no print-outs)
Test results (average of pupils results) from the class (translated by Google)
Graphical illustration of the class results
Presentation of the results:
Presentation of the results:Results of the individual pupils and the class
Presentation of the results:Details of the individual pupils
Presentation of results:A letter for the parents
Presentation of the results:Details of the individual pupils
Presentation of the results:Detailed information on the pupils test course
Presentation of the results:View the individual responses
Website launched
2007 2008 2009 2010
Review
Contract signed The first 3
test launched
Contract prolonged
Status on the implementation:Lesson learned – it takes time!
• The first 3 test was launched 2007, with reduced item banks• There has been conducted an expert review that showed, that
the psychometric standard was at a very high level, but the quality of the items and size of the item pools were insufficient
• Consequence: The planned full scale launch march 2008 vas delayed – instead - Development and testing of items in 2008 and 2009
• Full scale tests of the test system in autumn 2009• Test system launched march 2010
Development and testing of items
2009 2011
Test system launched
Folkeskolen’s final exams• High-stake, criteria based, goal-orientet, mixture of
standardised and non-standardised exams. ”Teaching to the test” not a problem, as test-objectives are similar to the objectives for the teaching (Final national Common Objectives).
• Precise rules for how to conduct an exam - same items, same time, same conditions – eg. aids ect. – for all pupils.
• Formal documentation of the education – school results are published
The differences between tests and exams
The differences between tests and exams
National tests• Low-stake, norm referenced, standardised proficiency
test with a diagnostic element, which can be used as progress / added value testing.
• The teacher decides almost everything – when, how, where, how long time, how many items etc.
• Internal assessment tool
Prioritised purposes:1: Pedagogic tool for teachers and headmasters – Only
access for those who need the results2: Documentation for the academic level
Exam- and test items
Items look alike, but their purpose are different:• Main goal for exams is to provide documentation with legal effect.• Main goal for tests is to provide an assessment of the student’s academic
level for pedagogical purposes.
General requirements for items – distractors must be both plausible and unambiguous :
• Examination items: Focus on unambiguous (one correct answer!) –accessible for public / the right to complain
• Test items: Focus on plausibility. (more or less correct answers – gives challenges to pupils more difficult questions).
Quality control • Exams are designed by exam commissions – skilled members with teacher
background and long experience• Test items are written by experienced teachers, quality controlled by
experts, tested on 5-700 pupils and fits the Rasch model
Present challenges&
Future possibilities
Challenges for Denmark
• Technique: To make the it-systems work flawlessly (booking, test and results)
• Test development: To further develop the test system (e.g. only one item pool pr. subject, including time dimension in the difficulty - decoding items)
• Expand item pools: To maintain and further develop high quality item pools
• Reporting results: To develop user friendly ways to present the results
• Guidance in good practice: To convince the teachers/Schools to use the information – and describe how to do it!!
3. Generations of testing
• Exams:– Purpose: Summative – Assessment of learning (to prove)
• National testing:– Purpose: Formative– Assessment for learning (improvement)
• Next generation: Future testing – Purpose: Optimize learning– Assessment embedded in training
Future possibilities
Existing technology: • Example 1: It-based books, articles and
newspapers with individually auto generated cloze-items. Testing and practice reading simultaneously (American Lexile test).
• Example 2: Tests with items that renews itself (changing text/numbers in items) and requires a correct answer (counting number of try's instead of number of correct answers).
• Example 3: It-based learning style analysis based on pupils responses on questionnaires.
Technology around the cornerDeveloping game-like tests:• Games built around problem solving, • asking about the outcomes and goals • asking how to get there particularly by invoking appropriate challenges to
keep the user on task, • place a premium on user proficiencies to create, innovate and produce, • can involve others in competitive or collaborate actions to solve problems
and create ‘personal bests’, collect information on users as they develop through the game,
• track information across time, • integrate learning and assessment, • provide feedback during the task so that the user becomes more efficient
and effective, • are equitable in that they do not favor slow or fast learners and ignore
background variables such as the home, socio-economic status, or race for an elaboration of each of these ideas.
So many directions for more effective testing that engages learners, emphases testing student’s strategies, and so much learning can accrue from this involvement in testing.
(John Hattie, 2010)
Guidelines on E-based testinghttp://www.intestcom.org/guidelines/index.php
Thank you for your attention!
Check it out: evaluering.uvm.dk