The Role of Cognition in Educational Assessment Design Joanna S. Gorin Arizona State University

The Role of Cognition in Educational Assessment Design

Joanna S. Gorin

Arizona State University

Overview The need for cognitively-based assessment. Defining cognitive models and their

properties. Tools for cognitively-based assessment

design and analysis. Empirical research on cognitive approaches to

test design and validation.

The Purpose of Educational Assessment New purposes for testing have introduced issues related to

the inappropriateness of current standardized tests.

Such assessments shall produce individual student interpretive, descriptive, and diagnostic reports…that allow parents, teachers, and principals to understand and address the specific academic needs of students, and include information regarding achievement on academic assessments aligned with State academic achievement standards, and that are provided to parents, teachers, and principals as soon as is practicably possible after the assessment is given, in an understandable and uniform format, and to the extent practicable, in a language that parents can understand.

NCLB Part A Subpart 1 Sec. 2221(b)3(C)(xii), 2001

Implications of new assessment needs. “[a]ll assessments will be more fruitful when based

on an understanding of cognition in the domain and on the precept of reasoning with evidence” (NRC, 2001, p. 178) Increase our understanding of the claims that we want to

make about students, instruction, programs, and policy. Detailed description of the skills, abilities, constructs we are

measuring. Increased understanding of the evidence provided by the

student responses to the test questions. Detailed description of item difficulty, discrimination, and other

statistical and psychological properties.

Cognitively Based Assessment Design If we begin with a complete understanding of

the skill, domain, competency, ability, etc. that we want to measure, then we can be more principled in our design of the tools we use to measure it.

Further, if we understand our measurement tools and the scores they generate more fully, we can evaluate their quality relative to our measurement goals.

Cognitive Models and Grain Size What is a complete understanding of the construct?

Of the test? Types of models:

Model of test specifications (content standards). Model of task performance (cognitive processes).

Alignment of the test with standards or test specifications may not be sufficient to yield the desired information. The model and alignment must be made at the appropriate

level of inference.

Construct / Latent Trait

Process A Process B Process C Process D

Item / Test Question

Feature A Feature B Feature C Feature D

General Model of Cognitively-Based Assessment Design

Potential Cognitive Models Information processing models.

Coherence-integration theory of text processing. Propositional representation of text. Cyclic construction of representation and integration of new

information.

Activation theory. Activation of information is influenced by various factors. The most highly activated information will be selected.

Evaluate the “completeness” of these models in terms of the target inferences to be made about students.

Steps in Assessment Design

Generate itemslinked to

cognitive model.

Derive person level information

from item responses.

Generate cognitive model of

the construct.

Construct-based ModelConstruct-based Model

Tools for Cognitively-Based Assessment Development

Design Frameworks Mislevy’s Evidence Centered Design

A multi-level model of assessment. Embretson’s Cognitive Design System

A process-based approach. Bennett and Bejar’s Generative Approach

A structural approach.

Cognitively-Based Assessment Design

Link item featuresto a

cognitive model.

Derive person level information

from item responses.

Generate a cognitive model of the items.

Task-based ModelTask-based Model

Purpose of Item Difficulty Modeling (IDM)

Extend available student and item information beyond a single statistical parameter.

Substantive information can be useful for: Verification of construct definition (Construct

Validity). Creating new items (Automatic/Algorithmic Item

Generation). Providing diagnostic information (Score

Reporting). Understanding group differences (DIF).

Tools for Cognitively-Based Assessment Development

Design Frameworks for Test Construction Evidence Centered Design Cognitive Design System Generative Approach

Psychometric Models for IDM Tree Based Regression Approach Rule Space Methodology Attribute Hierarchy Method Fusion Model Linear Logistic Latent Trait Model

Traditional item-construct definition. Reading comprehension questions measure the

ability to read with understanding, insight and discrimination. This type of question explores the ability to analyze a written passage from several perspectives. These include the ability to recognize both explicitly stated elements in the passage and assumptions underlying statements or arguments in the passage as well as the implications of those statements or arguments.

Model of Test-Specifications (Standards and Objectives)

Strand 2: Comprehending Literary Text Comprehending Literary Text identifies the comprehension strategies that are specific in the study of a variety of literature. Concept 1: Elements of Literature

Identify, analyze, and apply knowledge of the structures and elements of literature.

PO 1. Identify the plot of a literary selection, heard or read.

PO 2. Describe characters (e.g., traits, roles, similarities) within a literary selection, heard or read.

PO 3. Sequence a series of events in a literary selection, heard or read.

PO 4. Determine whether a literary selection, heard or read, is realistic or fantasy.

PO 5. Participate (e.g., clapping, chanting, choral reading) in the reading of poetry by responding to the rhyme and rhythm.

Concept 2: Historical and Cultural Aspects of Literature

Recognize and apply knowledge of the historical and cultural aspects of American, British, and world literature.

PO 1. Compare events, characters and conflicts in literary selections from a variety of cultures to their experiences.

Cognitive Model Development Cognitive Theory

Generate list of relevant processing components from theory. Correlational Studies

Establish a statistical relationship between the features and the item properties.

Experimental Manipulations Context/format Item Design

Use process tracing methods to identify additional processing influences.

Verbal protocols (“think alouds”) Eye-tracking data

Attribute/Skill List Encoding Process Skills:

EP1: Encoding propositionally dense text. EP2: Encoding propositionally sparse text. EP3: Encoding high level vocabulary. EP4: Encoding low level vocabulary.

Decision Process Skills: DP1: Synthesizing large sections of text into a single answer. DP2: Confirming correct answer from direct information in the text. DP3: Falsifying incorrect answers from direct information in the text. DP4: Confirming correct answer by inference from the text. DP5: Falsifying incorrect answers by inference from the text. DP6: Encoding correct answers with high vocabulary. DP7: Encoding incorrect answers with high vocabulary. DP8: Mapping correct answers to verbatim text. DP9: Mapping correct answers to paraphrased text. DP10: Mapping correct answers to reordered verbatim text. DP11: Mapping correct answers to reordered paraphrased text. DP12: Mapping incorrect answers to verbatim text. DP13: Mapping incorrect answers to paraphrased text. DP14: Mapping incorrect answers to reordered verbatim text. DP15: Mapping incorrect answers to reordered paraphrased text. DP16: Locating relevant information early in text. DP17: Locating relevant information in the middle of the text. DP18: Locating information at the end of the text. DP19: Using additional falsification skills for specially formatted items.

Cognitive Model (Skill/Subskill Model)

finer grain size

Cognitive Model (Information Processing Model)

Encoding: construction

Coherence

Processes: integration

Encoding & Coherence Processes

Text Mapping Evaluate Truth Status

Text Representation Response Decision

Cognitive Variables Modifier Propositional Density Predicate Propositional Density Text Content Vocabulary Level Percent Content Words Percent of Relevant Text Falsification Confirmation Vocabulary Level of the Distractors Vocabulary Level of the Correct Response Reasoning of the Distractors Reasoning of the Correct Response Location of Relevant Information in Text Length of Passage Special Item Format

Full Item Difficulty Model

Encoding: construction

Coherence

Processes: integration

Encoding & Coherence Processes

Text Mapping Evaluate Truth Status

Text Representation Response Decision

• Vocabulary Level

• Sentence Length

• Propositional Density

• Argument Structure

• Text Length

• Vocabulary Level

• Sentence Length

• Semantic Overlap

• Level of Question

• Vocabulary Level of Key and Distractors

• Falsifiability of Distractors

• Confirmation of the Key

Activation Model of Item DifficultyItem Features Item 1 Item 2 Item 3

Key

Location Correspondence Elaboration

Early Verbatim Strong Elaboration

Delayed Paraphrase Strong Elaboration

Delayed Paraphrase No Elaboration

Resulting Activation High Moderate Low Distractor

Location Correspondence Elaboration

Delayed Paraphrased No Elaboration

Delayed Paraphrased No Elaboration

Early Verbatim Strong Elaboration

Resulting Activation Low Low High

Expected Difficulty Easy

Medium

Hard

Regression Model of GRE ItemsVariable B SE(B) β t p

Text Encoding (TE) Modifier Propositional Density 6.121 3.698 .298 1.655 .100 Predicate Propositional Density 4.728 2.656 .190 1.780 .077 Text Content Vocabulary Level .643 .511 .092 1.257 .210 Percent Content Words -1.955 1.938 -.217 -1.009 .314

Decision Processing (DP) Percent Relevant Text -.14 .25 -.04 -.57 .57 Confirmation .08 .08 .07 .96 .34 Falsification .37 .25 .10 1.48 .14 Vocabulary Level – Correct .08 .02 .25 3.38 < .01 Vocabulary Level – Distractors -.05 .03 -.11 -1.53 .13 Reasoning – Correct .62 .11 .39 5.77 < .01 Reasoning – Distractors .14 .10 .09 1.33 .18 Location of Relevant Information -.05 .08 -.05 -.64 .52

GRE-V Specific Variables Special Item Format - Line Citation -.08 .10 -.05 -.85 .40 Special Item Format - Roman Numeral Item

-.38 .10 -.27 -4.00 < .01

Length of Passage -.18 .08 -.16 -2.24 .03

Contribution of Processing Factors to Item Difficulty

Change Statistics

Model R Adj. R square R square F df1 df2 Sig.

TE .15 .00 .02 1.18 4 195 .32

TE + DP .57 .28 .30 10.20 8 187 < .01

TE + DP + GRE Specific Factors .62 .34 .07 6.77 3 184 < .01

Implications for Score Meaning Variables related to both text encoding and

decision processes were significant predictors in models of GRE-V item difficulty.

This suggests that GRE-V RC items measure both processes. New evidence on construct validity of test scores

for these items.

Experimental Manipulations Experimental conditions corresponded to variations

in item features based on a hypothesized cognitive model. Passage propositional density and syntax modification. Passage passive voice and negative wording modification. Passage order of information change. Response alternative-passage overlap change.

Experimental effects tested with the LLTM model. Rasch model to deconstruct sources of item difficulty.

Contrast Coding for LLTM AnalysisD1 D2 D3 … D28 C1 C2 C3 C4

Item 1 1 0 0 0 0 0 0 0Item 2 0 1 0 0 0 0 0 0Item 3 0 0 1 0 0 0 0 0…Item 29 0 0 0 1 0 0 0 0Item 30 1 0 0 0 1 0 0 0Item 31 0 1 0 0 1 0 0 0Item 32 0 0 1 0 1 0 0 0…Item 37 0 0 0 0 0 1 0 0Item 38 0 0 0 0 0 1 0 0Item 38 0 0 0 0 0 1 0 0…Item 51 0 0 0 1 0 0 0 0

Scatterplot of Known and Estimated Item Difficulty Parameters for 29-Original

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00

ETS Known 3-PL Item Parameters

-4.00

-2.00

0.00

2.00

4.00

6.00

8.00

10.00

Est

imat

ed 1

-PL

Effects of Manipulations on Item Difficulty

No. ofReps

MinValue

MaxValue Mean

New StdError ofEstimate Sig.

Parameter C1 5 .17 .24 .22 0.1424 ns

Estimates C2 5 .32 .38 .35 0.1398 <.05

C3 5 -.13 -.05 -.08 0.1457 ns

C4 5 -.02 .06 .03 0.1415 ns

Table of Regression Coefficients and Significance Tests for the Experimental Model

Unstandardized Coefficients

Standardized Coefficients

Parameter B SE Beta t Sig

Condition 1 .080 .044 .067 1.793 .081

Condition 2 .005 .044 .004 .115 .909

Condition 3 -.091 .044 -.077 -2.040 .048

Condition 4 -.104 .044 -.088 -2.345 .024

Implications for Test Design Significant effect of passive voice and negative

wording on item difficulty was found. Test writers often avoid the use of negative

wording, citing that it is complicated and can confuse readers.

Although the effects were not significant for item difficulty, two significant effects on response time were found.

Self Report Measures Verbal Protocols

Concurrent or retrospective

Structured Questionnaires Strategy use Background information Interest Confidence

Digital Eye Tracking Digital eye tracking data has been used to

examine cognition and individual differences in

language processing. facial processing. learned attention. electrical circuit troubleshooting. problem solving strategies.

Spatial reasoning Abstract reasoning

Look Zone 1

Look Zone 2

Look Zone 3

Look Zone 1

Look Zone 2

Look Zone 3

Gazetrail for a Reading Comprehension Item

Question (0.797) RO (0.297) Passage (4.03) Question (0.31) Passage (41.35) Question (2.59) Passage (2.65) RO (6.00) Passage (3.09) RO (0.76) Question (0.37) RO (.25)

Summary Data for Reading Comprehension Item

Gaze Distribution

0.81

0.07 0.12

0

0.2

0.4

0.6

0.8

1

Passage Question Options

Zone

Perc

ent of Tota

l Tim

e

Individual vs Group Differences in Item Processing

Point of Gaze by Time

Implications and future use. Verify some of our current models. Identify new variables related to processing. Describe qualitative differences characterized by

strategy differences. Examine specific aspects of test items that are

problematic for individuals or for subgroups. Observe the effects of controlled item manipulations

on item processing, not just item responses alone.

Summary of the Benefits of Cognitively-Based Assessment Design Construct validity is more completely understood.

Explicitly elaborates the processes, strategies and knowledge structures.

Enhanced score interpretations. Persons, as well as items, can be described by processes,

strategies and knowledge structures. Generation of items with specified sources and

levels of item difficulty. Item parameters may be predicted for newly developed

items. Items can be generated for specific populations by

controlling the cognitive processing requirement.

Greatest Challenges Our limited understanding of the cognitive models

and the test items. Current item response data is limited in information.

A “one size fits all” approach to cognitive modeling will not work. Changing your goals for the assessment necessitates a

change in the items. Changing the items (or features of the items) implies a

change in what can be concluded from the test.

Documents

The Role of Cognition in Educational Assessment Design Joanna S. Gorin Arizona State University