Inference & Culture Slide 1April 29, 2003
Argument Substance and Argument Structurein Educational Assessment
Robert J. Mislevy
Department of Measurement, Statistics, & Evaluation
University of Maryland, College Park, MD
April 29, 2003
Presented at Conference on Inference, Culture, and Ordinary Thinking in Dispute Resolution, Benjamin N. Cardozo School of Law, Yeshiva University, New York, New York, April 27-29, 2003. This work builds on research with Linda Steinberg and Russell Almond at Educational Testing Service on the structure of educational assessments.
Inference & Culture Slide 2April 29, 2003
Central Points
Educational assessment has changed considerably over the last century.
Why? Strikingly different psychological perspectives on nature of learning and knowledge.
Can be seen as elaborations of same argument structure.» Wigmore, Toulmin
Inference & Culture Slide 3April 29, 2003
Messick (1994) on assessment design:
[B]egin by asking what complex of knowledge, skills, or other attribute should be assessed, presumably because they are tied to explicit or implicit objectives of instruction or are otherwise valued by society.
Next, what behaviors or performances should reveal those constructs, and what tasks or situations should elicit those behaviors?
Thus, the nature of the construct guides the selection or construction of relevant tasks as well as the rational development of construct-based scoring criteria and rubrics.
Inference & Culture Slide 4April 29, 2003
Toulmin's (1958) structure for arguments
Reasoning flows from data (D) to claim (C) by justification of a warrant (W), which in turn is supported by backing (B). The inference may need to be qualified by alternative explanations (A), which may have rebuttal evidence (R) to support them.
C
D
W
B
A
R
since
soon
accountof
unless
supports
Inference & Culture Slide 5April 29, 2003
Perspectives on learning and knowledge
Trait/differential (~1900 - ) Behaviorist (~1950 - 1980) Information-processing (~1970 - ) Sociocultural (~1980 - )
Inference & Culture Slide 6April 29, 2003
Trait/Differential Perspective A relatively stable characteristic of a person—
an attribute, enduring process, or disposition—which is consistently manifested to some degree when relevant, despite considerable variation in the range of settings and circumstances. (Messick, 1989)
Interest in people's differential status on common traits
Useful in selection, prediction, and educational decisions—not so much for instruction
Inference & Culture Slide 7April 29, 2003
Spearman’s “Theorem of indifference of the indicator”
This means that, for the purpose of indicating the amount of g possessed by a person, any test will do just well as any other, provided only that its correlation with g is equally high. ...
Another consequence of the indifference of the indicator consists in the significance that should be attached to personal estimates of “intelligence” made by teachers and others. However unlike may be the kinds of observation from which these estimates may have been derived, still insofar as they have a sufficiently broad basis to make the influence of g dominate over that of the s’s [subjects], they will tend to measure precisely the same thing.
Inference & Culture Slide 8April 29, 2003
An Analytical Reasoning ItemPet Shop Display
Arturo is planning the parakeet display for his pet shop. He has five parakeets, Alice, Bob, Carla, Diwakar, and Etria. Each is a different color; not necessarily in the same order, they are white, speckled, green, blue, and yellow. Arturo has two cages. The top cage holds three birds, and the bottom cage holds two. The display must meet the following additional conditions:
Alice is in the bottom cage. Bob is in the top cage and is not speckled. Carla cannot be in the same cage as the blue parakeet. Etria is green. The green parakeet and the speckled parakeet are in the same cage.
If Carla is in the top cage, which of the following must be true?a) The green parakeet is in the bottom cage.b) The speckled parakeet is in the bottom cage.c) Diwakar is in the top cage.d) Diwakar is in the bottom cage.e) The blue parakeet is in the top cage.
Inference & Culture Slide 9April 29, 2003
LSAT on AR Items LSAT's description of AR takes a trait perspective:
"Analytical reasoning items are designed to measure the ability to understand a structure of relationships and to draw conclusions about the structure."
AR items are in the LSAT not because either lawyers or law students routinely have to solve problems just like these in their jobs or their studies, but because there is evidence that students who can solve these kinds of puzzles tend to perform better in law school than students who don't.
C: Su e h as a h ig h va lue
of An a lyt ica l R e aso n in g .
W : S tu d en ts w ho a re h ig h o n
Ana lytica l Re a sonin g te nd to d o
we ll on log ic al p u zzle s tha tqu e ry re la t io n s tha t fo llo w fro m
exp lic it re la t io ns a nd co n stra ints.
B: Em pirica l stud ie s sh ow
h ig h co rre la t io ns b etwe e n
AR te st sco res an d colleg eg rad es, ope n -en d ed
p rob lem so lvin g ta sks, an d
ra ting s of e m plo ye e s
r easo n ing s kills on th e jo b .
A: Su e a nsw e re d
co rrect ly as a re sult
of a lu cky gu e ss.
since
so
o n
a cco un to f
u n less
su ppo rts
R: Su e sp e nt less
th a n 10 se co n dson th is item .
D 1: Su e
a n sw ered th e
Pe t Sh op ite m
co rre ctly.
D 2 : L og ica l
st ru ctu re a nd
co n te n ts of Pet
Sh o p ite m .
an d
C: Su e h as a h ig h va lue
of An a lyt ica l R e aso n in g .
W : S tu d en ts w ho a re h ig h o n
Ana lytica l Re a sonin g te nd to d o
we ll on log ic al p u zzle s tha tqu e ry re la t io n s tha t fo llo w fro m
exp lic it re la t io ns a nd co n stra ints.
B: Em pirica l stud ie s sh ow
h ig h co rre la t io ns b etwe e n
AR te st sco res an d colleg eg rad es, ope n -en d ed
p rob lem so lvin g ta sks, an d
ra ting s of e m plo ye e s
r easo n ing s kills on th e jo b .
A: Su e a nsw e re d
co rrect ly as a re sult
of a lu cky gu e ss.
since
so
o n
a cco un to f
u n less
su ppo rts
R: Su e sp e nt less
th a n 10 se co n dson th is item .
D 1: Su e
a n sw ered th e
Pe t Sh op ite m
co rre ctly.
D 2 : L og ica l
st ru ctu re a nd
co n te n ts of Pet
Sh o p ite m .
an d
1) Note that the warrant requires a conjunction of data about the nature of Sue's performance and the nature of the performance situation.
1) Note that the warrant requires a conjunction of data about the nature of Sue's performance and the nature of the performance situation.
W1: Correspondenceof darkest mark andkeyed responsemeans correctanswer.
C: Sue has a high valueof Analytical Reasoning.
W
B
A
since
so
onaccount
of
unless
supports
R
D1: Sueanswered thePet Shop itemcorrectly.
D2: Logicalstructure andcontents of PetShop item.
and
D11 : Sue'smarks on theanswer sheet forPet Shop item.
D12Answer key forthe Pet Shopitem.
since
and
W2: Elements inschemas for validAR items.
D22Particularcontent of PetShop item.
since
2) A closer look at the “data”:
Must reason from unique work products and item materials, to aspects addressed in the general warrant.
2) A closer look at the “data”:
Must reason from unique work products and item materials, to aspects addressed in the general warrant.
Inference & Culture Slide 13April 29, 2003
Multiple pieces of evidence of the same kind
C: Sue has a high valueof Analytical Reasoning.
W:Students who are high onAnalytical Reasoning tend to dowell on logical puzzles thatquery relations that follow fromexplicit relations and constraints.
B: ...
A: ...
since
so
onaccount
of
unless
supports
R: ...
D11: Sue'sanswer to Item 1
D21 structure
and contentsof Item 1
and
D1n: Sue'sanswer to Item n
...D2n structure
and contentsof Item n
...
Inference & Culture Slide 14April 29, 2003
Multiple pieces of evidence of different kinds
C: Sue has a high valueof Analytical Reasoning.
W1:[[warrant relogic puzzles]]
A : [[Alternatives rerecommendations]]
since
unless
D11: Sue'sanswer to Item 1
Dn1 Teacher
recommendationabout Sue
D12 : Structure& content ofPet Shop item
Dn2 Conditions
of observation
for recommendation
...
A0: ...unless
A : [[Alternatives relogic puzzles]]
:Wn: [[Warrant rerecommendations]]
and and
so
unless
since
Inference & Culture Slide 15April 29, 2003
Statistical Modeling of Assessment Data
X1
.X2
.X3
.
p()
p(X1|)
p(X2|)
p(X3|)
Claims in terms of values of unobservable variables in student model (SM)--characterize student knowledge.
Data modeled as depending probabilistically on SM vars.
Estimate conditional distributions of data given SM vars.
Bayes theorem to infer SM variables given data.
Claims in terms of values of unobservable variables in student model (SM)--characterize student knowledge.
Data modeled as depending probabilistically on SM vars.
Estimate conditional distributions of data given SM vars.
Bayes theorem to infer SM variables given data.
Inference & Culture Slide 16April 29, 2003
Behaviorist PerspectiveThe educational process consists of providing a series of
environments that permit the student to learn new behaviors or modify or eliminate existing behaviors and to practice these behaviors to the point that he displays them at some reasonably satisfactory level of competence and regularity under appropriate circumstances. …
The evaluation of the success of instruction and of the student’s learning becomes a matter of placing the student in a sample of situations in which the different learned behaviors may appropriately occur and noting the frequency and accuracy with which they do occur.
D.R. Krathwohl & D.A. Payne, 1971, p. 17-18.
C : Sue's probability ofcorrectly answering a 2-digit subtraction problemwith borrowing is p
W:Sampling theory machineryA: [e.g., observational
errors, data errors,misclassification ofresponses orperformance situations,distractions, etc.]
since
so
unless
and
for reasoning from observedproportion of r correctresponses in n targetedsituations, to true proportion p.
D11: Sue'sanswer to Item j
D11: Sue'sanswer to Item j
D1j : Sue'sanswer to Item j
D2j structure
and contentsof Item j
D2j structure
and contentsof Item j
D2j structure
and contentsof Item j
The warrant encompasses definitions of the class of stimulus situations, response classifications, and sampling theory.
The warrant encompasses definitions of the class of stimulus situations, response classifications, and sampling theory.
C : Sue's probability ofcorrectly answering a 2-digit subtraction problemwith borrowing is p
W:Sampling theory machineryA: [e.g., observational
errors, data errors,misclassification ofresponses orperformance situations,distractions, etc.]
since
so
unless
and
for reasoning from observedproportion of r correctresponses in n targetedsituations, to true proportion p.
D11: Sue'sanswer to Item j
D11: Sue'sanswer to Item j
D1j : Sue'sanswer to Item j
D2j structure
and contentsof Item j
D2j structure
and contentsof Item j
D2j structure
and contentsof Item j
The claim addresses the expected value of performance of the targeted kind in the targeted situations.
The claim addresses the expected value of performance of the targeted kind in the targeted situations.
C : Sue's probability ofcorrectly answering a 2-digit subtraction problemwith borrowing is p
W:Sampling theory machineryA: [e.g., observational
errors, data errors,misclassification ofresponses orperformance situations,distractions, etc.]
since
so
unless
and
for reasoning from observedproportion of r correctresponses in n targetedsituations, to true proportion p.
D11: Sue'sanswer to Item j
D11: Sue'sanswer to Item j
D1j : Sue'sanswer to Item j
D2j structure
and contentsof Item j
D2j structure
and contentsof Item j
D2j structure
and contentsof Item j
The task data address the salient features of the stimulus situations (i.e., tasks).
The task data address the salient features of the stimulus situations (i.e., tasks).
C : Sue's probability ofcorrectly answering a 2-digit subtraction problemwith borrowing is p
W:Sampling theory machineryA: [e.g., observational
errors, data errors,misclassification ofresponses orperformance situations,distractions, etc.]
since
so
unless
and
for reasoning from observedproportion of r correctresponses in n targetedsituations, to true proportion p.
D11: Sue'sanswer to Item j
D11: Sue'sanswer to Item j
D1j : Sue'sanswer to Item j
D2j structure
and contentsof Item j
D2j structure
and contentsof Item j
D2j structure
and contentsof Item j
The student data address the salient features of the responses.
The student data address the salient features of the responses.
Inference & Culture Slide 21April 29, 2003
The Information-Processing Perspective Epitomized in Newell and Simon’s (1972) Human
Problem Solving Examines the procedures by which people acquire,
store, and use knowledge to solve problems. Modeling problem-solving in terms of the capabilities
and the limitations of human thought and memory. Importance of knowledge structures, relationships,
procedures in learning domains. Use of rules, production systems, task
decompositions, and means-ends analyses.
Inference & Culture Slide 22April 29, 2003
Responses consistent with the "subtract smaller from larger" bug
821- 285 664
885- 221 664
63- 15 52
17- 9 12
W :Sampling theory
since
so
and
for items withfeature setdefining Class 1
D11D11D11j : Sue'sanswer to Item j, Class 1
D2j
of Item j
D2j
of Item j
D21j structure
and contents
of Item j, Class1
C : Sue's probability of
answering a Class 1subtraction problem withborrowing is p1
W0: Theory about how persons withconfigurations {K1,...,Km} would belikely to respond to items withdifferent salient features.
W :Sampling theory
since
so
and
for items withfeature setdefining Class n
D11D11D1nj : Sue'sanswer to Item j, Class n
D2j
of Item j
D2j
of Item j
D2nj structure
and contents
of Item j, Class n
C : Sue's probability of
answering a Class nsubtraction problem withborrowing is pn
since
and
so
...
...
C: Sue's configuration ofproduction rules foroperating in the domain(knowledge and skill) is K
W :Sampling theory
since
so
and
for items withfeature setdefining Class 1
D11D11D11j : Sue'sanswer to Item j, Class 1
D2j
of Item j
D2j
of Item j
D21j structure
and contents
of Item j, Class1
C : Sue's probability of
answering a Class 1subtraction problem withborrowing is p1
W0: Theory about how persons withconfigurations {K1,...,Km} would belikely to respond to items withdifferent salient features.
W :Sampling theory
since
so
and
for items withfeature setdefining Class n
D11D11D1nj : Sue'sanswer to Item j, Class n
D2j
of Item j
D2j
of Item j
D2nj structure
and contents
of Item j, Class n
C : Sue's probability of
answering a Class nsubtraction problem withborrowing is pn
since
and
so
...
...
C: Sue's configuration ofproduction rules foroperating in the domain(knowledge and skill) is K
Like behaviorist inference at level of behavior in classes of structurally similar tasks.
Like behaviorist inference at level of behavior in classes of structurally similar tasks.
W :Sampling theory
since
so
and
for items withfeature setdefining Class 1
D11D11D11j : Sue'sanswer to Item j, Class 1
D2j
of Item j
D2j
of Item j
D21j structure
and contents
of Item j, Class1
C : Sue's probability of
answering a Class 1subtraction problem withborrowing is p1
W0: Theory about how persons withconfigurations {K1,...,Km} would belikely to respond to items withdifferent salient features.
W :Sampling theory
since
so
and
for items withfeature setdefining Class n
D11D11D1nj : Sue'sanswer to Item j, Class n
D2j
of Item j
D2j
of Item j
D2nj structure
and contents
of Item j, Class n
C : Sue's probability of
answering a Class nsubtraction problem withborrowing is pn
since
and
so
...
...
C: Sue's configuration ofproduction rules foroperating in the domain(knowledge and skill) is K
Patterns among behaviorist claims are data for inferences about unobservable production rules that govern behavior.
Patterns among behaviorist claims are data for inferences about unobservable production rules that govern behavior.
D1,t+1: Sue'sactions attime t+1
W: [theory about strategies andprocedures people at various levels oftroubleshooting expertise tend toemploy when iteratively solvingproblems in the domain.]
since
and
so
C: Sue's level oftroubleshootingskill with is K.
D1,t: Sue'sactions attime t
D1,t-1: Sue'sactions attime t-1
D2,t: Contextafter time t
D2,t-1:Context aftertime t-1
...
D1,t-2: Sue'sactions attime t-2
D2,t-2:Context aftertime t-2
...
Assessing inquiry processes:Time dependencies in a troubleshooting task. Past behavior & consequences becomes part of setting for next action.
Assessing inquiry processes:Time dependencies in a troubleshooting task. Past behavior & consequences becomes part of setting for next action.
Inference & Culture Slide 27April 29, 2003
The Sociocultural Perspective
Stresses how knowledge is conditioned and constrained by the technologies, information resources, representation systems, and social situations ...
Incorporates explanatory concepts that have proved useful in fields such as ethnography and sociocultural psychology to study collaborative work, … mutual understanding in conversation, and other characteristics of interaction that are relevant to the functional success of the participants’ activities.
Greeno, Collins, & Resnick, 1997, p. 7.
AP Studio Art Portfolios
D11D11D3j : Artpiece j in theconcentration.
D1 :Student's learning
in the course ofcarrying out theconcentration.
W0: [Specification of general rubric tothe goals and and approach thestudent describes in the narrative]
C
since
and
so
C: The level ofperformance forthe Concentrationsection is K.
D2 :Conditions under
which the work wascarried out.
Statements in narrative explaining theconcentration, its influences, goals, etc.
B: Generalrubric
tailors
AP Studio Art Portfolios
D11D11D3j : Artpiece j in theconcentration.
D1 :Student's learning
in the course ofcarrying out theconcentration.
W0: [Specification of general rubric tothe goals and and approach thestudent describes in the narrative]
C
since
and
so
C: The level ofperformance forthe Concentrationsection is K.
D2 :Conditions under
which the work wascarried out.
Statements in narrative explaining theconcentration, its influences, goals, etc.
B: Generalrubric
tailors
Claim concerns level of performance represented by unique project, in socially-determined general evaluation scheme.
Claim concerns level of performance represented by unique project, in socially-determined general evaluation scheme.
AP Studio Art Portfolios
D11D11D3j : Artpiece j in theconcentration.
D1 :Student's learning
in the course ofcarrying out theconcentration.
W0: [Specification of general rubric tothe goals and and approach thestudent describes in the narrative]
C
since
and
so
C: The level ofperformance forthe Concentrationsection is K.
D2 :Conditions under
which the work wascarried out.
Statements in narrative explaining theconcentration, its influences, goals, etc.
B: Generalrubric
tailors
Data from student are (1) works of art and (2) explanation of project goals, approach, rationale.
Data from student are (1) works of art and (2) explanation of project goals, approach, rationale.
AP Studio Art Portfolios
D11D11D3j : Artpiece j in theconcentration.
D1 :Student's learning
in the course ofcarrying out theconcentration.
W0: [Specification of general rubric tothe goals and and approach thestudent describes in the narrative]
C
since
and
so
C: The level ofperformance forthe Concentrationsection is K.
D2 :Conditions under
which the work wascarried out.
Statements in narrative explaining theconcentration, its influences, goals, etc.
B: Generalrubric
tailors
Student text helps assure performance conditions meet the requirements of the warrant.
Student text helps assure performance conditions meet the requirements of the warrant.
AP Studio Art Portfolios
D11D11D3j : Artpiece j in theconcentration.
D1 :Student's learning
in the course ofcarrying out theconcentration.
W0: [Specification of general rubric tothe goals and and approach thestudent describes in the narrative]
C
since
and
so
C: The level ofperformance forthe Concentrationsection is K.
D2 :Conditions under
which the work wascarried out.
Statements in narrative explaining theconcentration, its influences, goals, etc.
B: Generalrubric
tailors
Student text contributes to how raters apply general evaluation rubric to this student’s work.
Student text contributes to how raters apply general evaluation rubric to this student’s work.
D1,t+1: Sue'sspeech act attime t+1
W: [theory about what people atvarious levels of conversationalcompetence will behave in contextswith specified features]
C
since
and
so
C: Sue's level ofconversationalcompetence is K.
D1,t: Sue'sspeech act attime t
D1,t-1: Sue'sspeech act attime t-1
D3,t+1: I'sspeech act attime t+1
D3,t: I'sspeech act attime t
D3,t-1: I'sspeech act attime t-1
...
D2,t: Contextafter time t
D2,t-1:Context aftertime t-1
...
D1,t-2: Sue'sspeech act attime t-2
D2,t-2:Context aftertime t-2
...
D3,t-2: I'sspeech act attime t-2
Conversational Competence
D1,t+1: Sue'sspeech act attime t+1
W: [theory about what people atvarious levels of conversationalcompetence will behave in contextswith specified features]
C
since
and
so
C: Sue's level ofconversationalcompetence is K.
D1,t: Sue'sspeech act attime t
D1,t-1: Sue'sspeech act attime t-1
D3,t+1: I'sspeech act attime t+1
D3,t: I'sspeech act attime t
D3,t-1: I'sspeech act attime t-1
...
D2,t: Contextafter time t
D2,t-1:Context aftertime t-1
...
D1,t-2: Sue'sspeech act attime t-2
D2,t-2:Context aftertime t-2
...
D3,t-2: I'sspeech act attime t-2
Conversational Competence
Challenges:1) Time dependencies.2) Interlocutor’s behavior affects context-- is required by warrant for evidence about certain aspects of competence.3) How constrained? Naturalistic vs. interviewer.
Challenges:1) Time dependencies.2) Interlocutor’s behavior affects context-- is required by warrant for evidence about certain aspects of competence.3) How constrained? Naturalistic vs. interviewer.
Inference & Culture Slide 35April 29, 2003
Conclusion
What changes?Developments in psychology, technology, and social factors (e.g., accommodations) continually place demands on assessment that outstrip familiar forms.
What doesn’t change?We want to draw inferences about what students know and can do as seen from some perspective; that perspective tells us what kinds of things we need to see them do, in what kinds of situations, to ground those inferences.
Inference & Culture Slide 36April 29, 2003
Conclusion
We see elaborations, extensions, and specializations of enduring principles of evidentiary reasoning.
We find continued value in tools such as Toulmin diagrams, Wigmore charts, and Bayesian inference networks to understand yesterday's assessments, manage today's, and design the assessments of tomorrow.