Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department

Updated 11/16/06

©1996 & forthcoming, Bachman & Palmer & OUP Page 1

The Place of Intended Impact in Assessment Use Arguments*

Lyle F. Bachman

Department of Applied Linguistics

U.C.L.A.Los Angeles, California

Adrian Palmer

Department of Linguistics

University of UtahSalt Lake City, Utah

*The material in this presentation and handout is based upon the books Language Testing in Practice, Lyle F. Bachman & Adrian Palmer. © Oxford University Press (1996) and Language Assessment in Action, Oxford University Press (forthcoming) as well as on various other articles by Lyle F. Bachman.

Updated 11/16/06


References• Bachman, L. F. "Building and

supporting a case for assessment use." Language Assessment Quarterly, 2(1). 2005.

• Bachman, Lyle F and Adrian Palmer. Language Testing In Practice. Oxford University Press. 1996. http://www.oup.co.uk/

• Bachman, Lyle F and Adrian Palmer. Language Assessment In Action. Oxford University Press. Forthcoming.

• Toulmin, S. E. The Uses of Argument. Cambridge: Cambridge University Press. 2003.

• Watson, Jenny Peterson & Sindhvananda, Kanchana. "Notes on the Thammasat University English Program". Bangkok: Thammasat University Faculty of Liberal Arts. 1972.

• Palmer, Adrian. "Procedures for student classification and grading in courses I-IV". Bangkok: Thammasat University Faculty of Liberal Arts. 1972.

http://www.oup.co.uk/

Updated 11/16/06


Outline of Presentation

• How to make an Assessment Use Argument to justify using a test to have specific types of intended impact in a specific situation.

• How to use this argument to argue for two different testing options (different methods of testing).

• How to go about making a decision to use one option or the other.

Updated 11/16/06


Four Qualities of Useful Language

Assessments

1. Reliability: consistency of measurement

2. Construct validity: the meaningfulness of the interpretations that we make on the basis of assessment scores

3. Authenticity: the degree of correspondence between the characteristics of a given assessment task and the characteristics of a relevant non-assessment language use task

4. Intended Impact: the intended effects that taking a assessment, administering and taking a assessment, and using assessment results have on students, teachers, educational systems, and society

Updated 11/16/06


Qualities of Usefulness Associated With Links in Assessment Use ArgumentBachman & Palmer (Forthcoming)

1. Performance on Assessment Tasks

2. Results/Scores

3. Interpretation of Results

4. Uses/Decisions

Authenticity Warrants

Construct Validity

Warrants

Reliability Warrants

Intended Impact Warrants

Updated 11/16/06


Summary of Reasoning in Example Assessment Use Argument

4. USE/DECISIONSAssign grades at end of

grammar unit.

3. INTERPRETATIONNumbers are interpreted

as students' knowledge of grammar

2. RESULTS/SCORESNumbers are assigned to

performance

1. PERFORMANCE ON ASSESSMENT TASK

Students select answers on M-C Grammar Test Tasks

ReliabilityFor the following reasons…we can consistently associate grammar scores with students' performance on M-C tasks

Construct ValidityFor the following reasons…scores can be interpreted in terms of "knowledge of grammar

AuthenticityFor the following reasons…the M-C task is appropriate for measuring the students' knowledge of grammar in this situation.

Intended ImpactFor the following reasons…using the interpretations of the students' knowledge of grammar to assign grades will have the intended impact on test takers and test users.

Updated 11/16/06


Backing (Supporting Evidence) for Warrants (Reasoning)

2. RESULTS/SCORESScores (numbers) are assigned to performance.

1. PERFORMANCE ON ASSESSMENT TASKSStudents check answers on M-C answer sheet.

Reliability Warrants(reasons)

Backing(supporting evidence)

Updated 11/16/06


Kinds of Backing

• Prior research

• Evidence specifically collected for this purpose

• Accepted community social practice and values

• Government regulations

• Laws

• Legal precedents

Updated 11/16/06


Example of Backing (Evidence) for Specific Reliability Warrant (Reasoning)

2. RESULTS/SCORESScores (numbers) are assigned to performance.

1. PERFORMANCE ON ASSESSMENT TASKSStudents mark answers on M-C grammar test.

BackingOn 2/34/06, measured test/retest reliability = .91

Reliability Warrant

Scores are consistent from one administration to another.

Updated 11/16/06


Complete Assessment Use Argument

Bachman & Palmer (Forthcoming)


2. Results/Scores


4. Uses/Decisions

Backing

Backing

Backing Backing


Construct Validity

Warrants



Updated 11/16/06


Thammasat University Proficiency Test (TUPT)

Kanchana Sindhvananda, J. Peterson, A. Palmer, and Thammasat Faculty of Liberal Arts Ajarns. (1971)

• High-stakes test used to make decisions affecting all students in Thammasat University

• Purpose– Measure knowledge of

• grammar,• vocabulary • reading comprehension

– To make decisions about• exemption from university ESL courses

primarily involving reading• placement in required ESL courses

primarily involving reading• grading in required ESL courses primarily

involving reading

Updated 11/16/06


Criteria for Student Classification and Grading in Courses I-IV

Updated 11/16/06


Intended Impact & Options

Situations Test

Method

Intended Impact

Situation 1

Thammasat 1971

Multiple- choice

Efficient and hassle-free placement and grading in reading-based ESL program

Situation 2

Thammasat 1973

(hypothet.)

Option 1

Multiple-choice

1. Efficient and hassle-free placement and grading in reading and writing-based ESL program

2. Washback: teachers and students

Situation 2

Thammasat 1973

(hypothet.)

Option 2Multiple- choice and essay

1. Efficient and hassle- free placement and grading in reading and writing-based ESL program

2. Washback: teachers and students

Updated 11/16/06


Intended Impact Argument Warrants

4. Use/decisions 1. Exempt highly

proficient students from ESL classes

2. Place remaining students in appropriate ESL classes

3. Assign grades of A and B in ESL courses (lower grades to be assigned using other measures)


1. Individuals a. Eliminating unnecessary

instruction frees students to take other courses.

b. Instruction at appropriate level is more effective.

c. Regularized grading al lows for systematic interpreta tion of grades and reduces complaints of unfairness.

2. Systems a. Relevance of construct to

decisions: University courses focus on grammar, vocabulary, and reading comprehensi on, so measures of these constructs are needed to place students appropriately (co mmon practice).

a. Regularized in struction at different levels over time and across classes maximizes use of resources.

3. Interpretations

of results 1. Knowledge of

grammar, vocabulary, and reading comprehension

Updated 11/16/06


Intended Impact ArgumentBacking


1. Individuals a. Eliminating unnecessary instruction frees students to

take other courses. b. Instruction at appropriate level is more effective. c. Regularized grading allows for systematic interpretation

of grades and reduces complaints of unfairness. 2. Systems a. Relevance of construct to decisions: University courses

focus on grammar, vocabulary, and reading comprehension, so measures of these constructs are needed to place students appropriately.

b. Regularized instruction at different levels over time and across classes maximizes use of resources.

Backing 1. Individuals a. Documented communication from advanced students (see ษ ) b. Standard practice c. Documented communication from teachers and students on

fairness of grades (see ษ ) 2. Systems a. Standard practice. b. Documented teacher feedback on time spent in class preparation

and assessment (see …)

Updated 11/16/06


Authenticity Argument Warrants

4. Use/decisions 1. Exempt highly

proficient students from ESL classes

2. Place remaining students in appropriate ESL classes

3. Assign grades of A and B in ESL courses (lower grades to be assigned using other measures)


1. Relevant instructional task selection: instructional materials consist to a large extent of reading passages and specific selections from passages illustrating grammar, vocabulary, and reading comprehension teaching points.

2. Correspondence of instructional task and test task characteristics: Reading passages are similar in difficulty and content to instructional passages. Many instructional tasks involve selected responses and limited constructed responses.

3. Interpretations

of results 1. Knowledge of

grammar, vocabulary, and reading comprehension

Updated 11/16/06


Authenticity Argument Backing

Backing 1. Examples of instructional reading passages and

instructional tasks can be found in the following course texts (references here).

2. Reading difficulty formulas have been used to calculate difficulty of reading passages in instructional materials and calibrate difficulty of test passages (see TUPT manual). Both instructional and test passages are based upon topics involving general (non technical) background knowledge and selected and limited constructed responses.

Authenticity Warrants 1. Relevant instructional task selection:

instructional materials consist to a large extent of reading passages and specific selections from passages illustrating grammar, vocabulary, and rhetorical organization teaching points.

2. Correspondence of instructional task and test task characteristics: Reading passages are similar in difficulty and content to instructional passages. Many instructional tasks involve selected and limited constructed responses.

Updated 11/16/06


Construct Validity Warrants

3. Interpretations of results

1. Knowledge of grammar, vocabulary, and reading comprehension


1. The constructs grา ammar, vocabulary,

and reading comprehensionำา have been carefully defined.

2. The selected response grammar, vocabulary, and reading comprehension test tasks allow the test takers to demonstrate their knowledge of grammar, vocabulary, and reading comprehension

2. Results/Scores Total number of correct responses

Updated 11/16/06


Construct Validity Backing

Backing 1. The construct definitions have been developed by

a committee of teachers with a background in test design. (See definitions of constructs in test design statement.)

2. The test tasks have been designed to focus attention on the testing point in contexts that do not in and of themselves create additional difficultly for test takers. For example, tasks designed to test grammar do not involve difficult vocabulary as well.


1. The constructs าgrammar, vocabulary, and reading comprehension have been carefully defined.

2. The selected response grammar, vocabulary, and reading comprehension test tasks allow the test takers to demonstrate their knowledge of grammar, vocabulary, and reading comprehension.

Updated 11/16/06



2. Results/Scores Total number of correct responses


Test takers check M-C answers

Reliability Warrants 1. Scoring criteria and

procedures are consistent across administrations, and tasks.

2. Task characteristics are consistent across multiple tasks.

3. Scores are consistent across test administrations.

Updated 11/16/06


Reliability Backing

Backing 1. Single criterion is used for scoring each set of test tasks

(vocab, gram, and reading comprehension). Test is machine scored, so procedures are identical for all test tasks.

2. All tasks in each section of the test consist of stems and alternatives with specified characteristics as described in test manual.

3. Measured test/retest reliability (March, 1971).

Form A Form B Mean 86.21 88.48 SD 20.49 19.47 N 164 Pearson r .93

Reliability Warrants 1. Scoring criteria and procedures are

consistent across administrations, and tasks

2. Task characteristics are consistent across multiple tasks

3. Scores are consistent across administrations

Updated 11/16/06


Situation 2: Same as for Situation 1

With The Following Additions• Purpose

– Also to measure knowledge of the following constructs in task involving essay writing:

• grammar• vocabulary• rhetorical organization

– To make decisions about…• exemption from new university ESL

writing courses• placement in new required ESL writing

courses• grading in new required ESL writing

courses

• Additional intended impact: promote positive washback on writing teachers and students in writing courses

Updated 11/16/06


Additional Intended Impact Argument Warrants

4. Additional Use/Decisions

1. Exempt highly proficient students from ESL writing classes

2. Place remaining students in appropriate ESL writing classes

3. Assign grades of A and B in ESL writing courses, (lower grades to be assigned using other measures)

3. Additional Interpretations

of Results 1. Knowledge of

grammar, vocabulary, and rhetorical organization in tasks involving essay writing

Additional Intended Impact

Warrants

1. Individuals a. No additional warrants 2. Systems a. Relevance of construct

to decisions: New university writing courses focus on knowledge of grammar, vocabulary & rhetorical organization in essay writing tasks, so measures of these constructs in essay writing tasks are needed to place students appropriately.

Updated 11/16/06


Additional Intended Impact Argument Backing

Additional Intended Impact Warrants

1. Individuals a. No additional warrants 2. Systems a. Relevance of construct to decisions: New

university writing courses focus on knowledge of grammar, vocabulary & rhetorical organization in essay writing tasks, so measures of these constructs in essay writing tasks are needed to place students appropriately.

Additional Backing 1. Individuals 2. Systems a. Documented feedback from instructors

that students who control grammar and vocabulary in reading tasks cannot necessarily perform well on tasks involving essay writing

Updated 11/16/06


Additional Authenticity Argument Warrants

Additional Authenticity Warrants 1. Relevant instructional

task selection: instructional materials also involve tasks involving essay writing.

2. Correspondence of assessment task / instructional task characteristics: Assessment essay topics are similar to topics involving general knowledge used in instructional tasks. Length of assessment essay tasks is similar to length of instructional essay tasks.


1. Knowledge of grammar, vocabulary, and rhetorical organization in tasks involving essay writing

4. Use/decisions 1. Exempt highly proficient

students from new ESL essay writing classes

2. Place remaining students in appropriate new ESL essay writing classes

3. Assign grades of A and B in new essay writing courses, (lower grades to be assigned using other measures)

Updated 11/16/06


Additional Authenticity ArgumentBacking

Backing 1. Description of curriculum. 2. Example instructional materials and

proposed essay test blueprint.

Additional Authenticity Warrants 1. Relevant instructional task selection:

instructional materials also involve tasks involving essay writing.

2. Correspondence of assessment task / instructional task characteristics: Assessment essay topics are similar to topics involving general knowledge used in instructional tasks. Length of assessment essay tasks is similar to length of instructional essay writing tasks.

Updated 11/16/06


Additional Construct Validity Warrants


1. Knowledge of grammar, vocabulary, and rhetorical organization


1. The constructs าknowledge of grammar, vocabulary, and rhetorical organization ำา have been carefully defined.

2. The extended production essay writing test task allows the test takers to demonstrate their knowledge of grammar, vocabulary, and rhetorical organization

2. Results/Scores Rating levels.

Updated 11/16/06


Additional Construct Validity Backing

Backing 1. The construct definitions have been developed by

a committee of teachers with a background in test design. (See definitions of constructs in test design statement.)

2. The test tasks have been designed to focus attention on the testing point in contexts that do not in and of themselves create additional difficultly for test takers. For example, essay-writing tasks involve topical knowledge common to all test takers.


1. The constructs าknowledge of grammar, vocabulary, and rhetorical organization have been carefully defined.

2. The extended production essay writing test task allows the test takers to demonstrate their knowledge of grammar, vocabulary, and rhetorical organization.

Updated 11/16/06


Comparative Assessment Use Arguments

1. Performance on Assessment M-C Ta sks

2. Results/Scores


4. Uses/Decisions


Construct Validity

Warrants



Assessment Use Argument For Option #1

1. Performance on Assessment M-C and Essay Tasks

2. Results/Scores


4. Uses/Decisions


Construct Validity

Warrants



Assessment Use Argument For Option #2

Option #1 Option #2

Updated 11/16/06


How to Decide Between Alternatives

• Describe additional decisions and intended impact

– Program directors need to make the following decision: Should they add an essay writing task to the English test given to all students entering Thammasat University?

– Program directors want to increase students' ability to write essays because essay writing is an ability that students currently lack. This ability is needed both in instructional and real-life language use tasks that the students need to perform.

• To make this decision, they need to develop Assessment Use Arguments for two alternatives:

1. Do not add an essay writing task. Continue to use only the M-C tasks to place and grade students in essay writing classes.

2. Add an additional essay writing task and use this to place and grade students in essay writing classes.

• Then decide1. which argument they prefer and can live with…2. on the basis of whether developing the test

according to the preferred argument is worth the cost.

Documents

Updated 11/16/06©1996 & forthcoming, Bachman & Palmer & OUPPage 1 The Place of Intended Impact in Assessment Use Arguments * Lyle F. Bachman Department