30
Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 1 The Place of Intended Impact in Assessment Use Arguments* Lyle F. Bachman Department of Applied Linguistics U.C.L.A. Los Angeles, California Adrian Palmer Department of Linguistics University of Utah Salt Lake City, Utah *The material in this presentation and handout is based upon the books Language Testing in Practice, Lyle F. Bachman & Adrian Palmer. © Oxford University Press (1996) and Language Assessment in Action, Oxford University Press (forthcoming) as well as on various other articles by Lyle F.

Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Embed Size (px)

Citation preview

Page 1: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 1

The Place of Intended Impact in Assessment Use Arguments*

Lyle F. Bachman

Department of Applied Linguistics

U.C.L.A.Los Angeles, California

Adrian Palmer

Department of Linguistics

University of UtahSalt Lake City, Utah

*The material in this presentation and handout is based upon the books Language Testing in Practice, Lyle F. Bachman & Adrian Palmer. © Oxford University Press (1996) and Language Assessment in Action, Oxford University Press (forthcoming) as well as on various other articles by Lyle F. Bachman.

Page 2: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 2

References• Bachman, L. F. "Building and

supporting a case for assessment use." Language Assessment Quarterly, 2(1). 2005.

• Bachman, Lyle F and Adrian Palmer. Language Testing In Practice. Oxford University Press. 1996. http://www.oup.co.uk/

• Bachman, Lyle F and Adrian Palmer. Language Assessment In Action. Oxford University Press. Forthcoming.

• Toulmin, S. E. The Uses of Argument. Cambridge: Cambridge University Press. 2003.

• Watson, Jenny Peterson & Sindhvananda, Kanchana. "Notes on the Thammasat University English Program". Bangkok: Thammasat University Faculty of Liberal Arts. 1972.

• Palmer, Adrian. "Procedures for student classification and grading in courses I-IV". Bangkok: Thammasat University Faculty of Liberal Arts. 1972.

Page 3: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 3

Outline of Presentation

• How to make an Assessment Use Argument to justify using a test to have specific types of intended impact in a specific situation.

• How to use this argument to argue for two different testing options (different methods of testing).

• How to go about making a decision to use one option or the other.

Page 4: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 4

Four Qualities of Useful Language

Assessments

1. Reliability: consistency of measurement

2. Construct validity: the meaningfulness of the interpretations that we make on the basis of assessment scores

3. Authenticity: the degree of correspondence between the characteristics of a given assessment task and the characteristics of a relevant non-assessment language use task

4. Intended Impact: the intended effects that taking a assessment, administering and taking a assessment, and using assessment results have on students, teachers, educational systems, and society

Page 5: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 5

Qualities of Usefulness Associated With Links in Assessment Use ArgumentBachman & Palmer (Forthcoming)

1. Performance on Assessment Tasks

2. Results/Scores

3. Interpretation of Results

4. Uses/Decisions

Authenticity Warrants

Construct Validity

Warrants

Reliability Warrants

Intended Impact Warrants

Page 6: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 6

Summary of Reasoning in Example Assessment Use Argument

4. USE/DECISIONSAssign grades at end of

grammar unit.

3. INTERPRETATIONNumbers are interpreted

as students' knowledge of grammar

2. RESULTS/SCORESNumbers are assigned to

performance

1. PERFORMANCE ON ASSESSMENT TASK

Students select answers on M-C Grammar Test Tasks

ReliabilityFor the following reasons…we can consistently associate grammar scores with students' performance on M-C tasks

Construct ValidityFor the following reasons…scores can be interpreted in terms of "knowledge of grammar

AuthenticityFor the following reasons…the M-C task is appropriate for measuring the students' knowledge of grammar in this situation.

Intended ImpactFor the following reasons…using the interpretations of the students' knowledge of grammar to assign grades will have the intended impact on test takers and test users.

Page 7: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 7

Backing (Supporting Evidence) for Warrants (Reasoning)

2. RESULTS/SCORESScores (numbers) are assigned to performance.

1. PERFORMANCE ON ASSESSMENT TASKSStudents check answers on M-C answer sheet.

Reliability Warrants(reasons)

Backing(supporting evidence)

Page 8: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 8

Kinds of Backing

• Prior research

• Evidence specifically collected for this purpose

• Accepted community social practice and values

• Government regulations

• Laws

• Legal precedents

Page 9: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 9

Example of Backing (Evidence) for Specific Reliability Warrant (Reasoning)

2. RESULTS/SCORESScores (numbers) are assigned to performance.

1. PERFORMANCE ON ASSESSMENT TASKSStudents mark answers on M-C grammar test.

BackingOn 2/34/06, measured test/retest reliability = .91

Reliability Warrant

Scores are consistent from one administration to another.

Page 10: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 10

Complete Assessment Use Argument

Bachman & Palmer (Forthcoming)

1. Performance on Assessment Tasks

2. Results/Scores

3. Interpretation of Results

4. Uses/Decisions

Backing

Backing

Backing Backing

Authenticity Warrants

Construct Validity

Warrants

Reliability Warrants

Intended Impact Warrants

Page 11: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 11

Thammasat University Proficiency Test (TUPT)

Kanchana Sindhvananda, J. Peterson, A. Palmer, and Thammasat Faculty of Liberal Arts Ajarns. (1971)

• High-stakes test used to make decisions affecting all students in Thammasat University

• Purpose– Measure knowledge of

• grammar,• vocabulary • reading comprehension

– To make decisions about• exemption from university ESL courses

primarily involving reading• placement in required ESL courses

primarily involving reading• grading in required ESL courses primarily

involving reading

Page 12: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 12

Criteria for Student Classification and Grading in Courses I-IV

Page 13: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 13

Intended Impact & Options

Situations Test

Method

Intended Impact

Situation 1

Thammasat 1971

Multiple- choice

Efficient and hassle-free placement and grading in reading-based ESL program

Situation 2

Thammasat 1973

(hypothet.)

Option 1

Multiple-choice

1. Efficient and hassle-free placement and grading in reading and writing-based ESL program

2. Washback: teachers and students

Situation 2

Thammasat 1973

(hypothet.)

Option 2Multiple- choice and essay

1. Efficient and hassle- free placement and grading in reading and writing-based ESL program

2. Washback: teachers and students

Page 14: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 14

Intended Impact Argument Warrants

4. Use/decisions 1. Exempt highly

proficient students from ESL classes

2. Place remaining students in appropriate ESL classes

3. Assign grades of A and B in ESL courses (lower grades to be assigned using other measures)

Intended Impact Warrants

1. Individuals a. Eliminating unnecessary

instruction frees students to take other courses.

b. Instruction at appropriate level is more effective.

c. Regularized grading al lows for systematic interpreta tion of grades and reduces complaints of unfairness.

2. Systems a. Relevance of construct to

decisions: University courses focus on grammar, vocabulary, and reading comprehensi on, so measures of these constructs are needed to place students appropriately (co mmon practice).

a. Regularized in struction at different levels over time and across classes maximizes use of resources.

3. Interpretations

of results 1. Knowledge of

grammar, vocabulary, and reading comprehension

Page 15: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 15

Intended Impact ArgumentBacking

Intended Impact Warrants

1. Individuals a. Eliminating unnecessary instruction frees students to

take other courses. b. Instruction at appropriate level is more effective. c. Regularized grading allows for systematic interpretation

of grades and reduces complaints of unfairness. 2. Systems a. Relevance of construct to decisions: University courses

focus on grammar, vocabulary, and reading comprehension, so measures of these constructs are needed to place students appropriately.

b. Regularized instruction at different levels over time and across classes maximizes use of resources.

Backing 1. Individuals a. Documented communication from advanced students (see ษ ) b. Standard practice c. Documented communication from teachers and students on

fairness of grades (see ษ ) 2. Systems a. Standard practice. b. Documented teacher feedback on time spent in class preparation

and assessment (see …)

Page 16: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 16

Authenticity Argument Warrants

4. Use/decisions 1. Exempt highly

proficient students from ESL classes

2. Place remaining students in appropriate ESL classes

3. Assign grades of A and B in ESL courses (lower grades to be assigned using other measures)

Authenticity Warrants

1. Relevant instructional task selection: instructional materials consist to a large extent of reading passages and specific selections from passages illustrating grammar, vocabulary, and reading comprehension teaching points.

2. Correspondence of instructional task and test task characteristics: Reading passages are similar in difficulty and content to instructional passages. Many instructional tasks involve selected responses and limited constructed responses.

3. Interpretations

of results 1. Knowledge of

grammar, vocabulary, and reading comprehension

Page 17: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 17

Authenticity Argument Backing

Backing 1. Examples of instructional reading passages and

instructional tasks can be found in the following course texts (references here).

2. Reading difficulty formulas have been used to calculate difficulty of reading passages in instructional materials and calibrate difficulty of test passages (see TUPT manual). Both instructional and test passages are based upon topics involving general (non technical) background knowledge and selected and limited constructed responses.

Authenticity Warrants 1. Relevant instructional task selection:

instructional materials consist to a large extent of reading passages and specific selections from passages illustrating grammar, vocabulary, and rhetorical organization teaching points.

2. Correspondence of instructional task and test task characteristics: Reading passages are similar in difficulty and content to instructional passages. Many instructional tasks involve selected and limited constructed responses.

Page 18: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 18

Construct Validity Warrants

3. Interpretations of results

1. Knowledge of grammar, vocabulary, and reading comprehension

Construct Validity Warrants

1. The constructs grา ammar, vocabulary,

and reading comprehensionำา have been carefully defined.

2. The selected response grammar, vocabulary, and reading comprehension test tasks allow the test takers to demonstrate their knowledge of grammar, vocabulary, and reading comprehension

2. Results/Scores Total number of correct responses

Page 19: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 19

Construct Validity Backing

Backing 1. The construct definitions have been developed by

a committee of teachers with a background in test design. (See definitions of constructs in test design statement.)

2. The test tasks have been designed to focus attention on the testing point in contexts that do not in and of themselves create additional difficultly for test takers. For example, tasks designed to test grammar do not involve difficult vocabulary as well.

Construct Validity Warrants

1. The constructs าgrammar, vocabulary, and reading comprehension have been carefully defined.

2. The selected response grammar, vocabulary, and reading comprehension test tasks allow the test takers to demonstrate their knowledge of grammar, vocabulary, and reading comprehension.

Page 20: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 20

Reliability Warrants

2. Results/Scores Total number of correct responses

1. Performance on Assessment Tasks

Test takers check M-C answers

Reliability Warrants 1. Scoring criteria and

procedures are consistent across administrations, and tasks.

2. Task characteristics are consistent across multiple tasks.

3. Scores are consistent across test administrations.

Page 21: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 21

Reliability Backing

Backing 1. Single criterion is used for scoring each set of test tasks

(vocab, gram, and reading comprehension). Test is machine scored, so procedures are identical for all test tasks.

2. All tasks in each section of the test consist of stems and alternatives with specified characteristics as described in test manual.

3. Measured test/retest reliability (March, 1971).

Form A Form B Mean 86.21 88.48 SD 20.49 19.47 N 164 Pearson r .93

Reliability Warrants 1. Scoring criteria and procedures are

consistent across administrations, and tasks

2. Task characteristics are consistent across multiple tasks

3. Scores are consistent across administrations

Page 22: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 22

Situation 2: Same as for Situation 1

With The Following Additions• Purpose

– Also to measure knowledge of the following constructs in task involving essay writing:

• grammar• vocabulary• rhetorical organization

– To make decisions about…• exemption from new university ESL

writing courses• placement in new required ESL writing

courses• grading in new required ESL writing

courses

• Additional intended impact: promote positive washback on writing teachers and students in writing courses

Page 23: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 23

Additional Intended Impact Argument Warrants

4. Additional Use/Decisions

1. Exempt highly proficient students from ESL writing classes

2. Place remaining students in appropriate ESL writing classes

3. Assign grades of A and B in ESL writing courses, (lower grades to be assigned using other measures)

3. Additional Interpretations

of Results 1. Knowledge of

grammar, vocabulary, and rhetorical organization in tasks involving essay writing

Additional Intended Impact

Warrants

1. Individuals a. No additional warrants 2. Systems a. Relevance of construct

to decisions: New university writing courses focus on knowledge of grammar, vocabulary & rhetorical organization in essay writing tasks, so measures of these constructs in essay writing tasks are needed to place students appropriately.

Page 24: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 24

Additional Intended Impact Argument Backing

Additional Intended Impact Warrants

1. Individuals a. No additional warrants 2. Systems a. Relevance of construct to decisions: New

university writing courses focus on knowledge of grammar, vocabulary & rhetorical organization in essay writing tasks, so measures of these constructs in essay writing tasks are needed to place students appropriately.

Additional Backing 1. Individuals 2. Systems a. Documented feedback from instructors

that students who control grammar and vocabulary in reading tasks cannot necessarily perform well on tasks involving essay writing

Page 25: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 25

Additional Authenticity Argument Warrants

Additional Authenticity Warrants 1. Relevant instructional

task selection: instructional materials also involve tasks involving essay writing.

2. Correspondence of assessment task / instructional task characteristics: Assessment essay topics are similar to topics involving general knowledge used in instructional tasks. Length of assessment essay tasks is similar to length of instructional essay tasks.

3. Interpretations of results

1. Knowledge of grammar, vocabulary, and rhetorical organization in tasks involving essay writing

4. Use/decisions 1. Exempt highly proficient

students from new ESL essay writing classes

2. Place remaining students in appropriate new ESL essay writing classes

3. Assign grades of A and B in new essay writing courses, (lower grades to be assigned using other measures)

Page 26: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 26

Additional Authenticity ArgumentBacking

Backing 1. Description of curriculum. 2. Example instructional materials and

proposed essay test blueprint.

Additional Authenticity Warrants 1. Relevant instructional task selection:

instructional materials also involve tasks involving essay writing.

2. Correspondence of assessment task / instructional task characteristics: Assessment essay topics are similar to topics involving general knowledge used in instructional tasks. Length of assessment essay tasks is similar to length of instructional essay writing tasks.

Page 27: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 27

Additional Construct Validity Warrants

3. Interpretations of results

1. Knowledge of grammar, vocabulary, and rhetorical organization

Construct Validity Warrants

1. The constructs าknowledge of grammar, vocabulary, and rhetorical organization ำา have been carefully defined.

2. The extended production essay writing test task allows the test takers to demonstrate their knowledge of grammar, vocabulary, and rhetorical organization

2. Results/Scores Rating levels.

Page 28: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 28

Additional Construct Validity Backing

Backing 1. The construct definitions have been developed by

a committee of teachers with a background in test design. (See definitions of constructs in test design statement.)

2. The test tasks have been designed to focus attention on the testing point in contexts that do not in and of themselves create additional difficultly for test takers. For example, essay-writing tasks involve topical knowledge common to all test takers.

Construct Validity Warrants

1. The constructs าknowledge of grammar, vocabulary, and rhetorical organization have been carefully defined.

2. The extended production essay writing test task allows the test takers to demonstrate their knowledge of grammar, vocabulary, and rhetorical organization.

Page 29: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 29

Comparative Assessment Use Arguments

1. Performance on Assessment M-C Ta sks

2. Results/Scores

3. Interpretation of Results

4. Uses/Decisions

Authenticity Warrants

Construct Validity

Warrants

Reliability Warrants

Intended Impact Warrants

Assessment Use Argument For Option #1

1. Performance on Assessment M-C and Essay Tasks

2. Results/Scores

3. Interpretation of Results

4. Uses/Decisions

Authenticity Warrants

Construct Validity

Warrants

Reliability Warrants

Intended Impact Warrants

Assessment Use Argument For Option #2

Option #1 Option #2

Page 30: Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 30

How to Decide Between Alternatives

• Describe additional decisions and intended impact

– Program directors need to make the following decision: Should they add an essay writing task to the English test given to all students entering Thammasat University?

– Program directors want to increase students' ability to write essays because essay writing is an ability that students currently lack. This ability is needed both in instructional and real-life language use tasks that the students need to perform.

• To make this decision, they need to develop Assessment Use Arguments for two alternatives:

1. Do not add an essay writing task. Continue to use only the M-C tasks to place and grade students in essay writing classes.

2. Add an additional essay writing task and use this to place and grade students in essay writing classes.

• Then decide1. which argument they prefer and can live with…2. on the basis of whether developing the test

according to the preferred argument is worth the cost.