Upload
kianna-covey
View
229
Download
0
Tags:
Embed Size (px)
Citation preview
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 1
The Place of Intended Impact in Assessment Use Arguments*
Lyle F. Bachman
Department of Applied Linguistics
U.C.L.A.Los Angeles, California
Adrian Palmer
Department of Linguistics
University of UtahSalt Lake City, Utah
*The material in this presentation and handout is based upon the books Language Testing in Practice, Lyle F. Bachman & Adrian Palmer. © Oxford University Press (1996) and Language Assessment in Action, Oxford University Press (forthcoming) as well as on various other articles by Lyle F. Bachman.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 2
References• Bachman, L. F. "Building and
supporting a case for assessment use." Language Assessment Quarterly, 2(1). 2005.
• Bachman, Lyle F and Adrian Palmer. Language Testing In Practice. Oxford University Press. 1996. http://www.oup.co.uk/
• Bachman, Lyle F and Adrian Palmer. Language Assessment In Action. Oxford University Press. Forthcoming.
• Toulmin, S. E. The Uses of Argument. Cambridge: Cambridge University Press. 2003.
• Watson, Jenny Peterson & Sindhvananda, Kanchana. "Notes on the Thammasat University English Program". Bangkok: Thammasat University Faculty of Liberal Arts. 1972.
• Palmer, Adrian. "Procedures for student classification and grading in courses I-IV". Bangkok: Thammasat University Faculty of Liberal Arts. 1972.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 3
Outline of Presentation
• How to make an Assessment Use Argument to justify using a test to have specific types of intended impact in a specific situation.
• How to use this argument to argue for two different testing options (different methods of testing).
• How to go about making a decision to use one option or the other.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 4
Four Qualities of Useful Language
Assessments
1. Reliability: consistency of measurement
2. Construct validity: the meaningfulness of the interpretations that we make on the basis of assessment scores
3. Authenticity: the degree of correspondence between the characteristics of a given assessment task and the characteristics of a relevant non-assessment language use task
4. Intended Impact: the intended effects that taking a assessment, administering and taking a assessment, and using assessment results have on students, teachers, educational systems, and society
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 5
Qualities of Usefulness Associated With Links in Assessment Use ArgumentBachman & Palmer (Forthcoming)
1. Performance on Assessment Tasks
2. Results/Scores
3. Interpretation of Results
4. Uses/Decisions
Authenticity Warrants
Construct Validity
Warrants
Reliability Warrants
Intended Impact Warrants
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 6
Summary of Reasoning in Example Assessment Use Argument
4. USE/DECISIONSAssign grades at end of
grammar unit.
3. INTERPRETATIONNumbers are interpreted
as students' knowledge of grammar
2. RESULTS/SCORESNumbers are assigned to
performance
1. PERFORMANCE ON ASSESSMENT TASK
Students select answers on M-C Grammar Test Tasks
ReliabilityFor the following reasons…we can consistently associate grammar scores with students' performance on M-C tasks
Construct ValidityFor the following reasons…scores can be interpreted in terms of "knowledge of grammar
AuthenticityFor the following reasons…the M-C task is appropriate for measuring the students' knowledge of grammar in this situation.
Intended ImpactFor the following reasons…using the interpretations of the students' knowledge of grammar to assign grades will have the intended impact on test takers and test users.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 7
Backing (Supporting Evidence) for Warrants (Reasoning)
2. RESULTS/SCORESScores (numbers) are assigned to performance.
1. PERFORMANCE ON ASSESSMENT TASKSStudents check answers on M-C answer sheet.
Reliability Warrants(reasons)
Backing(supporting evidence)
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 8
Kinds of Backing
• Prior research
• Evidence specifically collected for this purpose
• Accepted community social practice and values
• Government regulations
• Laws
• Legal precedents
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 9
Example of Backing (Evidence) for Specific Reliability Warrant (Reasoning)
2. RESULTS/SCORESScores (numbers) are assigned to performance.
1. PERFORMANCE ON ASSESSMENT TASKSStudents mark answers on M-C grammar test.
BackingOn 2/34/06, measured test/retest reliability = .91
Reliability Warrant
Scores are consistent from one administration to another.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 10
Complete Assessment Use Argument
Bachman & Palmer (Forthcoming)
1. Performance on Assessment Tasks
2. Results/Scores
3. Interpretation of Results
4. Uses/Decisions
Backing
Backing
Backing Backing
Authenticity Warrants
Construct Validity
Warrants
Reliability Warrants
Intended Impact Warrants
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 11
Thammasat University Proficiency Test (TUPT)
Kanchana Sindhvananda, J. Peterson, A. Palmer, and Thammasat Faculty of Liberal Arts Ajarns. (1971)
• High-stakes test used to make decisions affecting all students in Thammasat University
• Purpose– Measure knowledge of
• grammar,• vocabulary • reading comprehension
– To make decisions about• exemption from university ESL courses
primarily involving reading• placement in required ESL courses
primarily involving reading• grading in required ESL courses primarily
involving reading
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 12
Criteria for Student Classification and Grading in Courses I-IV
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 13
Intended Impact & Options
Situations Test
Method
Intended Impact
Situation 1
Thammasat 1971
Multiple- choice
Efficient and hassle-free placement and grading in reading-based ESL program
Situation 2
Thammasat 1973
(hypothet.)
Option 1
Multiple-choice
1. Efficient and hassle-free placement and grading in reading and writing-based ESL program
2. Washback: teachers and students
Situation 2
Thammasat 1973
(hypothet.)
Option 2Multiple- choice and essay
1. Efficient and hassle- free placement and grading in reading and writing-based ESL program
2. Washback: teachers and students
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 14
Intended Impact Argument Warrants
4. Use/decisions 1. Exempt highly
proficient students from ESL classes
2. Place remaining students in appropriate ESL classes
3. Assign grades of A and B in ESL courses (lower grades to be assigned using other measures)
Intended Impact Warrants
1. Individuals a. Eliminating unnecessary
instruction frees students to take other courses.
b. Instruction at appropriate level is more effective.
c. Regularized grading al lows for systematic interpreta tion of grades and reduces complaints of unfairness.
2. Systems a. Relevance of construct to
decisions: University courses focus on grammar, vocabulary, and reading comprehensi on, so measures of these constructs are needed to place students appropriately (co mmon practice).
a. Regularized in struction at different levels over time and across classes maximizes use of resources.
3. Interpretations
of results 1. Knowledge of
grammar, vocabulary, and reading comprehension
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 15
Intended Impact ArgumentBacking
Intended Impact Warrants
1. Individuals a. Eliminating unnecessary instruction frees students to
take other courses. b. Instruction at appropriate level is more effective. c. Regularized grading allows for systematic interpretation
of grades and reduces complaints of unfairness. 2. Systems a. Relevance of construct to decisions: University courses
focus on grammar, vocabulary, and reading comprehension, so measures of these constructs are needed to place students appropriately.
b. Regularized instruction at different levels over time and across classes maximizes use of resources.
Backing 1. Individuals a. Documented communication from advanced students (see ษ ) b. Standard practice c. Documented communication from teachers and students on
fairness of grades (see ษ ) 2. Systems a. Standard practice. b. Documented teacher feedback on time spent in class preparation
and assessment (see …)
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 16
Authenticity Argument Warrants
4. Use/decisions 1. Exempt highly
proficient students from ESL classes
2. Place remaining students in appropriate ESL classes
3. Assign grades of A and B in ESL courses (lower grades to be assigned using other measures)
Authenticity Warrants
1. Relevant instructional task selection: instructional materials consist to a large extent of reading passages and specific selections from passages illustrating grammar, vocabulary, and reading comprehension teaching points.
2. Correspondence of instructional task and test task characteristics: Reading passages are similar in difficulty and content to instructional passages. Many instructional tasks involve selected responses and limited constructed responses.
3. Interpretations
of results 1. Knowledge of
grammar, vocabulary, and reading comprehension
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 17
Authenticity Argument Backing
Backing 1. Examples of instructional reading passages and
instructional tasks can be found in the following course texts (references here).
2. Reading difficulty formulas have been used to calculate difficulty of reading passages in instructional materials and calibrate difficulty of test passages (see TUPT manual). Both instructional and test passages are based upon topics involving general (non technical) background knowledge and selected and limited constructed responses.
Authenticity Warrants 1. Relevant instructional task selection:
instructional materials consist to a large extent of reading passages and specific selections from passages illustrating grammar, vocabulary, and rhetorical organization teaching points.
2. Correspondence of instructional task and test task characteristics: Reading passages are similar in difficulty and content to instructional passages. Many instructional tasks involve selected and limited constructed responses.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 18
Construct Validity Warrants
3. Interpretations of results
1. Knowledge of grammar, vocabulary, and reading comprehension
Construct Validity Warrants
1. The constructs grา ammar, vocabulary,
and reading comprehensionำา have been carefully defined.
2. The selected response grammar, vocabulary, and reading comprehension test tasks allow the test takers to demonstrate their knowledge of grammar, vocabulary, and reading comprehension
2. Results/Scores Total number of correct responses
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 19
Construct Validity Backing
Backing 1. The construct definitions have been developed by
a committee of teachers with a background in test design. (See definitions of constructs in test design statement.)
2. The test tasks have been designed to focus attention on the testing point in contexts that do not in and of themselves create additional difficultly for test takers. For example, tasks designed to test grammar do not involve difficult vocabulary as well.
Construct Validity Warrants
1. The constructs าgrammar, vocabulary, and reading comprehension have been carefully defined.
2. The selected response grammar, vocabulary, and reading comprehension test tasks allow the test takers to demonstrate their knowledge of grammar, vocabulary, and reading comprehension.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 20
Reliability Warrants
2. Results/Scores Total number of correct responses
1. Performance on Assessment Tasks
Test takers check M-C answers
Reliability Warrants 1. Scoring criteria and
procedures are consistent across administrations, and tasks.
2. Task characteristics are consistent across multiple tasks.
3. Scores are consistent across test administrations.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 21
Reliability Backing
Backing 1. Single criterion is used for scoring each set of test tasks
(vocab, gram, and reading comprehension). Test is machine scored, so procedures are identical for all test tasks.
2. All tasks in each section of the test consist of stems and alternatives with specified characteristics as described in test manual.
3. Measured test/retest reliability (March, 1971).
Form A Form B Mean 86.21 88.48 SD 20.49 19.47 N 164 Pearson r .93
Reliability Warrants 1. Scoring criteria and procedures are
consistent across administrations, and tasks
2. Task characteristics are consistent across multiple tasks
3. Scores are consistent across administrations
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 22
Situation 2: Same as for Situation 1
With The Following Additions• Purpose
– Also to measure knowledge of the following constructs in task involving essay writing:
• grammar• vocabulary• rhetorical organization
– To make decisions about…• exemption from new university ESL
writing courses• placement in new required ESL writing
courses• grading in new required ESL writing
courses
• Additional intended impact: promote positive washback on writing teachers and students in writing courses
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 23
Additional Intended Impact Argument Warrants
4. Additional Use/Decisions
1. Exempt highly proficient students from ESL writing classes
2. Place remaining students in appropriate ESL writing classes
3. Assign grades of A and B in ESL writing courses, (lower grades to be assigned using other measures)
3. Additional Interpretations
of Results 1. Knowledge of
grammar, vocabulary, and rhetorical organization in tasks involving essay writing
Additional Intended Impact
Warrants
1. Individuals a. No additional warrants 2. Systems a. Relevance of construct
to decisions: New university writing courses focus on knowledge of grammar, vocabulary & rhetorical organization in essay writing tasks, so measures of these constructs in essay writing tasks are needed to place students appropriately.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 24
Additional Intended Impact Argument Backing
Additional Intended Impact Warrants
1. Individuals a. No additional warrants 2. Systems a. Relevance of construct to decisions: New
university writing courses focus on knowledge of grammar, vocabulary & rhetorical organization in essay writing tasks, so measures of these constructs in essay writing tasks are needed to place students appropriately.
Additional Backing 1. Individuals 2. Systems a. Documented feedback from instructors
that students who control grammar and vocabulary in reading tasks cannot necessarily perform well on tasks involving essay writing
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 25
Additional Authenticity Argument Warrants
Additional Authenticity Warrants 1. Relevant instructional
task selection: instructional materials also involve tasks involving essay writing.
2. Correspondence of assessment task / instructional task characteristics: Assessment essay topics are similar to topics involving general knowledge used in instructional tasks. Length of assessment essay tasks is similar to length of instructional essay tasks.
3. Interpretations of results
1. Knowledge of grammar, vocabulary, and rhetorical organization in tasks involving essay writing
4. Use/decisions 1. Exempt highly proficient
students from new ESL essay writing classes
2. Place remaining students in appropriate new ESL essay writing classes
3. Assign grades of A and B in new essay writing courses, (lower grades to be assigned using other measures)
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 26
Additional Authenticity ArgumentBacking
Backing 1. Description of curriculum. 2. Example instructional materials and
proposed essay test blueprint.
Additional Authenticity Warrants 1. Relevant instructional task selection:
instructional materials also involve tasks involving essay writing.
2. Correspondence of assessment task / instructional task characteristics: Assessment essay topics are similar to topics involving general knowledge used in instructional tasks. Length of assessment essay tasks is similar to length of instructional essay writing tasks.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 27
Additional Construct Validity Warrants
3. Interpretations of results
1. Knowledge of grammar, vocabulary, and rhetorical organization
Construct Validity Warrants
1. The constructs าknowledge of grammar, vocabulary, and rhetorical organization ำา have been carefully defined.
2. The extended production essay writing test task allows the test takers to demonstrate their knowledge of grammar, vocabulary, and rhetorical organization
2. Results/Scores Rating levels.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 28
Additional Construct Validity Backing
Backing 1. The construct definitions have been developed by
a committee of teachers with a background in test design. (See definitions of constructs in test design statement.)
2. The test tasks have been designed to focus attention on the testing point in contexts that do not in and of themselves create additional difficultly for test takers. For example, essay-writing tasks involve topical knowledge common to all test takers.
Construct Validity Warrants
1. The constructs าknowledge of grammar, vocabulary, and rhetorical organization have been carefully defined.
2. The extended production essay writing test task allows the test takers to demonstrate their knowledge of grammar, vocabulary, and rhetorical organization.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 29
Comparative Assessment Use Arguments
1. Performance on Assessment M-C Ta sks
2. Results/Scores
3. Interpretation of Results
4. Uses/Decisions
Authenticity Warrants
Construct Validity
Warrants
Reliability Warrants
Intended Impact Warrants
Assessment Use Argument For Option #1
1. Performance on Assessment M-C and Essay Tasks
2. Results/Scores
3. Interpretation of Results
4. Uses/Decisions
Authenticity Warrants
Construct Validity
Warrants
Reliability Warrants
Intended Impact Warrants
Assessment Use Argument For Option #2
Option #1 Option #2
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP Page 30
How to Decide Between Alternatives
• Describe additional decisions and intended impact
– Program directors need to make the following decision: Should they add an essay writing task to the English test given to all students entering Thammasat University?
– Program directors want to increase students' ability to write essays because essay writing is an ability that students currently lack. This ability is needed both in instructional and real-life language use tasks that the students need to perform.
• To make this decision, they need to develop Assessment Use Arguments for two alternatives:
1. Do not add an essay writing task. Continue to use only the M-C tasks to place and grade students in essay writing classes.
2. Add an additional essay writing task and use this to place and grade students in essay writing classes.
• Then decide1. which argument they prefer and can live with…2. on the basis of whether developing the test
according to the preferred argument is worth the cost.