27
1

1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

Embed Size (px)

Citation preview

Page 1: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

1

Page 2: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

Designing an assessment system

Presentation to the Scottish Qualifications Authority, August 2007

Dylan Wiliam

Institute of Education, University of London

www.dylanwiliam.net

Page 3: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

3

Overview

• The purposes of assessment• The structure of the assessment system• The locus of assessment• The extensiveness of the assessment• Assessment format• Scoring models• Quality issues• The role of teachers• Contextual issues

Page 4: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

4

Functions of assessmentThree functions of assessment:

• For evaluating institutions (evaluative)• For describing individuals (summative)• For supporting learning

– Monitoring learning: Whether learning is taking place– Diagnosing (informing) learning: What is not being learnt– Forming learning: What to do about it

No system can easily support all three functions• Traditionally, we have grouped the first two, and ignored the third

– Learning is sidelined; summative and evaluative functions are weakened• Instead, we need to separate the first (evaluative) from the other two

Page 5: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

5 Time

Scores

“All the women are strong, all the men are good-looking, and all the children are above average.” Garrison Keillor

The Lake Wobegon effect

Page 6: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

6

Goodhart’s law

All performance indicators lose their usefulness when used as objects of policy• Privatization of British Rail• Targets in the Health Service• “Bubble” students in high-stakes settings

Page 7: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

7

Reconciling different pressures

The “high-stakes” genie is out of the bottle, and we cannot put it back

The clearer you are about what you want, the more likely you are to get it, but the less likely it is to mean anything

The only thing left to us is to try to develop “tests worth teaching to”

This is fundamentally an issue of validity.

Page 8: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

8

Validity

Validity is a property of inferences, not of assessments

“One validates, not a test, but an interpretation of data arising from a specified procedure” (Cronbach, 1971; emphasis in original)

• No such thing as a valid (or indeed invalid) assessment• No such thing as a biased assessment• A pons asinorum for thinking about assessment

Page 9: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

9

Threats to validity

Inadequate reliabilityConstruct-irrelevant variance• The assessment includes aspects that are irrelevant to the construct of interest

– the assessment is “too big”Construct under-representation• The assessment fails to include important aspects of the construct of interest

– the assessment is “too small”With clear construct definition all of these are technical—not value—issues

Page 10: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

10

Two key challenges

Construct-irrelevant variance• Sensitivity to instruction

Construct under-representation• Extensiveness of assessment

Page 11: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

11

Sensitivity to instruction

1 year

Distribution of attainment on an item highly sensitive to instruction

Page 12: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

12

Sensitivity to instruction (2)

1 year

Distribution of attainment on an item moderately sensitive to instruction

Page 13: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

13

Sensitivity to instruction (3)

1 year

Distribution of attainment on an item relatively insensitive to instruction

Page 14: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

14

Sensitivity to instruction (4)

1 yearDistribution of attainment on an item completely insensitive to instruction

Page 15: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

15

Consequences (1)

SD = chronological age/10

Page 16: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

16

Consequences (2)

SD = chronological age/5

Page 17: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

17

Consequences (3)

SD = chronological age/4

Page 18: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

18

Insensitivity to instruction

Primarily attributable to the fact that learning is slower than assumed

Exacerbated by the normal mechanisms of test development

Leads to erroneous attributions about the effects of schooling

Page 19: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

19

A sensitivity to instruction indexTest Sensitivity index

IQ-type test (insensitive) 0

NAEP 6

TIMSS 8

ETS “STEP” tests (1957) 8

ITBS 10

Completely sensitive test 100

Page 20: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

20

Extensiveness of assessment

Using teacher assessment in certification is attractive:• Increases reliability (increased test time)• Increases validity (addresses aspects of construct under-representation)But problematic• Lack of trust (“Fox guarding the hen house”) • Problems of biased inferences (construct-irrelevant variance)• Can introduce new kinds of construct under-representation

Page 21: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

21

The challenge

To design an assessment system that is:

• Distributed– So that evidence collection is not undertaken entirely at the end

• Synoptic– So that learning has to accumulate

Page 22: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

22

A possible model

All students are assessed at test time

Different students in the same class are assigned different tasks

The performance of the class defines an “envelope” of scores, e.g.• Advanced: 5 students• Proficient: 8 students• Basic: 10 students• Below basic: 2 students

Teacher allocates levels on the basis of whole-year performance

Page 23: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

23

Benefits and problems

Benefits• The only way to teach to the test is to improve everyone’s performance

on everything (which is what we want!)• Validity and reliability are enhanced

Problems• Students’ scores are not “inspectable”• Assumes student motivation

Page 24: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

24

The effects of context

Beliefs about what constitutes learning;

Beliefs in the reliability and validity of the results of various tools;

A preference for and trust in numerical data, with bias towards a single number;

Trust in the judgments and integrity of the teaching profession;

Belief in the value of competition between students;

Belief in the value of competition between schools;

Belief that test results measure school effectiveness;

Fear of national economic decline and education’s role in this;

Belief that the key to schools’ effectiveness is strong top-down management;

Page 25: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

25

ConclusionThere is no “perfect” assessment system anywhere. Each nation’s assessment system is exquisitely tuned to local constraints and affordances.

Assessment practices have impacts on teaching and learning which may be strongly amplified or attenuated by the national context.

The overall impact of particular assessment practices and initiatives is determined at least as much by culture and politics as it is by educational evidence and values.

Page 26: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

26

Conclusion (2)

It is probably idle to draw up maps for the ideal assessment policy for a country, even although the principles and the evidence to support such an ideal might be clearly agreed within the ‘expert’ community.

Instead, focus on those arguments and initiatives which are least offensive to existing assumptions and beliefs, and which will nevertheless serve to catalyze a shift in them while at the same time improving some aspects of present practice.

Page 27: 1. Designing an assessment system Presentation to the Scottish Qualifications Authority, August 2007 Dylan Wiliam Institute of Education, University of

27

Questions?

Comments?

Institute of EducationUniversity of London20 Bedford WayLondon WC1H 0AL

Tel +44 (0)20 7612 6000Fax +44 (0)20 7612 6126Email [email protected]