Transcript
Page 1: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Making Computerized Adaptive Testing Diagnostic Tools for Schools

Hua-Hua ChangUniversity of Illinois at Urbana-Champaign

October 17, 2010

Page 2: Making Computerized Adaptive Testing Diagnostic Tools for Schools

What is Adaptive Testing?

• Originally called tailored tests (Lord, 1970)– Examinee are measured most effectively if items are

neither too difficult nor too easy.• Θ: latent trait. Heuristically,

– if the answer is correct, the next item should be more difficult;

– If the answer is incorrect, the next item should be easier.• How adaptive test works?

– An item pool, known item properties (such as difficulty level, discrimination level,..

– Algorithm, computer, and network– The core is the item selection algorithm– Need mathematicians help to design algorithm

2

Page 3: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Sequential Design &Robbins-Monro Process (1955)

1 2 3

1 2 3

1 2 3

1

Responses: , , ,.......Design points: , , ,....... Constants: , , ,........

(a point of interest)n

n n nn

x x xb b b

b mb b x

Numerous Refinements:

Engineering (Goodwin, Ramadge and Caines, 1981; Kumar, 1985) Biomedical science (Finney, 1978)Education (Lord, 1970)

1 ( )n n n nb b x m 3

Page 4: Making Computerized Adaptive Testing Diagnostic Tools for Schools

4

The Maximum Information Criterion (MIC)

• Lord’s (1980) MIC method, the most commonly used method.

• MIC would select items with high discrimination

• There have been many other methods

0

1

: true latent traitˆ : MLE after n items were administered

( ) : Item information function

( ) ( ) : Test Information

n

i

n

ii

I

I I

Page 5: Making Computerized Adaptive Testing Diagnostic Tools for Schools

5

Item difficulty vs Item discrimination

a=2

a=0.5

a=1

Page 6: Making Computerized Adaptive Testing Diagnostic Tools for Schools

6

Page 7: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Item inf surface

Theta regionInformation volume

In 2D Case: item whose volume is max should be selected

7

Page 8: Making Computerized Adaptive Testing Diagnostic Tools for Schools

From Theoretical Development to Large Scale Operation

• Issues to be addressed:• Should CAT only use the best items?

– It is common only 50% items are used• Is CAT more secure than paper/pencil test?

– How to improve CAT test security?• How to control non-statistical constraints?• How to get diagnostic information?• How to make CAT affordable to many schools?

8

Page 9: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Two NSF Grants and loads of Papers• Constraint-weighted design

– Cheng, Chang, & Yi (2006), Cheng & Chang, (2008), Cheng, Chang Douglas, & Guo (2009), etc.

• Establish theoretical foundation– Chang & Ying (2009)

• Test Security– Chang & Zhang (2001), Zhang & Chang (2010)

• Cognitive diagnostic CAT– McGlohen & Chang (2008)

• Multi-dimensional CAT– Wang & Chang (in press), Wang & Chang (accepted for publication)

• Large scale k-12 Applications in China– Liu, Yu, Wang, Ding & Chang (2010) 9

Page 10: Making Computerized Adaptive Testing Diagnostic Tools for Schools

CAT & Transformative Research

• National Science Board (2007)unanimously approved a motion to enhance support of transformative research at the NSF.– All proposals received after Jan 5, 2008, will be

reviewed against the criterion.• revolutionizing entire disciplines; • creating entirely new fields; or • disrupting accepted theories and perspectives

• Many CAT researches are transformative!

10

Page 11: Making Computerized Adaptive Testing Diagnostic Tools for Schools

New Developments• Measuring Patient-Reported Outcomes

– Conventional measures of disease such as lab results do not fully capture information about chronic diseases and how treatments affect patients.

– CAT can be used to assess patients subjective experiences such as symptom severity, social well-being, and perceived level of health.

• K-12 Applications– Large State testing– Teaching/learning, within School application– Diagnostic purpose– Web-based learning

11

Page 12: Making Computerized Adaptive Testing Diagnostic Tools for Schools

12

Challenges in NCLB Testing

• Many items are too difficult to students– 70% math items may be too difficult

• The influence of this kind of test taking experience on low-achieving students is not well-understood (e.g., Roderick & Engle, 2001, Ryan & Ryan, 2005; Ryan, Ryan, Arbuthnot, & Samuels, 2007).

• Test security of NCLB• The # of security violations in P&P based NCLB testing in on the

rise. • Documented cases of such incidents have been uncovered in

numerous states including New York, Texas, California, Illinois, and Massachusetts. (Jacob & Levitt, 2003, and Texas Education Agency, 2007).

Page 13: Making Computerized Adaptive Testing Diagnostic Tools for Schools

13

CAT Has Glowing Future in the K-12 Context.

• Why not use benchmark testing?– Adaptive Testing can do better.

• Quellmalz & Pellegrino (2009): – more than 27 states currently have operational or

pilot versions of online tests, including Oregon, North Carolina, Utah, Idaho, Kansas, Wyoming, and Maryland.

– The landscape of educational assessment is changing rapidly with the growth of computer-administered tests.

Page 14: Making Computerized Adaptive Testing Diagnostic Tools for Schools

1. MAKING CAT DIAGNOSTIC TOOL2. DELIVER THE TOOL TO SCHOOLS

Objectives:

14

Page 15: Making Computerized Adaptive Testing Diagnostic Tools for Schools

How to get diagnostic information?

• Post-hoc approach (non-adaptive)– perform CD after students completed CAT

• Adaptive approach– Select the next item which provides the max info

about the student’s strength and weakness– Need a model, item selection algorithm– Psychometric theory– Simulation study– Field test

15

Page 16: Making Computerized Adaptive Testing Diagnostic Tools for Schools

16

Cognitive Diagnosis

Provide examinees with more information than just a single score.

• How? By considering the different attributes measured by the test.

• An attribute is a “task, subtask, cognitive process, or skill” assessed by the test, such as addition or reading comprehension.

Page 17: Making Computerized Adaptive Testing Diagnostic Tools for Schools

17

Traditional Testing: Cognitive Diagnosis:

1 2[ , ,..., ]K

A single score A set of scores:One for each attribute.

(K is the total # of attributes.)

What should be reported to examinees?

Page 18: Making Computerized Adaptive Testing Diagnostic Tools for Schools

18

Feedback from an exam can be more individualized to a student’s specific strengths and weaknesses.

75

75

Julia R.

Halle B.

]0000111[ˆ

]0101100[ˆ

Why is this beneficial?

Page 19: Making Computerized Adaptive Testing Diagnostic Tools for Schools

19

The Item-Attribute Relationship

Which items measure which attributes is represented by the Q-matrix:

i1 i2 i3 i4A1A2A3

0 1 0 11 0 0 11 0 1 0

Page 20: Making Computerized Adaptive Testing Diagnostic Tools for Schools

20

Cognitive Diagnostic Models

• Many models have been proposed• DINA model• Fusion model (Stout’s group)

( 1| )ij iP X vector

Itemperson

Page 21: Making Computerized Adaptive Testing Diagnostic Tools for Schools

21

The DINA Model

(Macready & Dayton, 1977, 1989; Junker & Sijstma, 2001)

(1 )

1

Deterministic Input; Noisy "And" Gate

( 1| ) (1 )

( 0 | 1) -- "slip" parameter

( 1| 0) --

ij ij

jk

ij ij j j

Kq

ij ikk

j ij ij

j ij ij

P X s g

where

s P X

g P X

"guess" parameter

Student iItem j

Page 22: Making Computerized Adaptive Testing Diagnostic Tools for Schools

How to adaptively select items?• No direct analogy to “match theta with b-parameter”

– Regular CAT, b-parameter with

• Now is a vector, called latent class

22

: # of attributes

: pt in the latent space (2 )ˆ : estimated ( | ) : IRF

Kc

i i

K

P X x

Page 23: Making Computerized Adaptive Testing Diagnostic Tools for Schools

• The KL information Approach (Xu, Chang, & Douglass, 2004)

• Let’s assume

• The likelihood test is the most powerful test• Intuitively the j-th item selected should make

large

0 1ˆ: , : iH H

ˆ( | )log

( | )j j

j j j

P X xP X x

23

Page 24: Making Computerized Adaptive Testing Diagnostic Tools for Schools

• Taking expected value assume is true ̂

2

1

ˆ( || )K

jc cc

KL KL

1

0

ˆ( | )ˆ ˆ( || ) log( ) ( | )

( | )j

jc c jx j c

P X xKL P X x

P X x

Select item j to make the following as large as possible

24

Page 25: Making Computerized Adaptive Testing Diagnostic Tools for Schools

25

Demo

Item Slip Guess Q 1 Q21 0.1 0.2 1 12 0.1 0.2 0 13 0.1 0.2 1 04 0.1 0.2 0 0

P1 P2 P3 P4

0.9 0.2 0.2 0.2

0.9 0.9 0.2 0.2

0.9 0.2 0.9 0.2

0.9 0.9 0.9 0.9

( | )c j cP X x

Consider two attributes and four candidate items

1 2 3 4[1,1], [0,1], [1,0], [0,0]ˆ [1,0]

4 possible patterns

Interim estimate for an examinee

Item bank

Item KL1 KL2 KL3 KL4 Total

1 1.363 0.000 0.000 0.000 1.363

2 1.363 1.363 0.000 0.000 2.716

3 0.000 1.146 0.000 1.146 2.292

4 0.000 0.000 0.000 0.000 0.000

1

0

ˆ( | )ˆ ˆ( || ) log( ) ( | )

( | )j

jc c jx j c

P X xKL P X x

P X x

j: item, c: attribute pattern2

1

ˆ ˆ( ) ( || )K

j jc cc

I KL

Page 26: Making Computerized Adaptive Testing Diagnostic Tools for Schools

26

Demo

Item Slip Guess Q 1 Q21 0.2 0.3 1 12 0.2 0.3 0 13 0.2 0.3 1 04 0.2 0.3 0 0

P1 P2 P3 P4

0.8 0.3 0.3 0.3

0.8 0.8 0.3 0.3

0.8 0.3 0.8 0.3

0.8 0.8 0.8 0.8

Change the slipping/guessing parameters of the items

Item KL1 KL2 KL3 KL4 Total

1 0.583 0.000 0.000 0.000 0.583

2 0.583 0.583 0.000 0.000 1.165

3 0.000 0.534 0.000 0.534 1.068

4 0.000 0.000 0.000 0.000 0.000

• The magnitude of the non-zero values depends on the item slipping and guessing parameters

Page 27: Making Computerized Adaptive Testing Diagnostic Tools for Schools

27

Demo

Item Slip Guess Q 1 Q2

1 0.2 0.3 1 1

2 0.2 0.3 0 1

3 0.2 0.3 1 0

4 0.2 0.3 0 0

P1 P2 P3 P4

0.8 0.3 0.3 0.3

0.8 0.8 0.3 0.3

0.8 0.3 0.8 0.3

0.8 0.8 0.8 0.8

Change the interim estimate to ˆ [0,1]

Item KL1 KL2 KL3 KL4 Total KL

1 0.583 0.000 0.000 0.000 0.583

2 0.000 0.000 0.534 0.534 1.068

3 0.583 0.000 0.583 0.000 1.165

4 0.000 0.000 0.000 0.000 0.000

The positions of the zero KL cells changed for item 2 & 3

Page 28: Making Computerized Adaptive Testing Diagnostic Tools for Schools

• To explain the last table in the previous slide– “0” means this item provides no information to discriminate the interim

estimate with another possible attribute pattern. – The magnitude of the non-zero values depends on the item slipping

and guessing parameters– Which cell is zero depends on the q-vector and the examinee’s interim

estimate. If for a particular item (e.g., item 4 in this demo), q-vector contains all zeros, all cells will be zero.

28

Page 29: Making Computerized Adaptive Testing Diagnostic Tools for Schools

29

Estimation

11 12 1 11 12 1

21 22 2 21 22 2

1 2 1 2

, ,...., ( , ,..., ), ,...., ( , ,..., )

:, ,...., ( , ,..., )

n K

n K

N N Nn N N NK

x x xx x x

x x x

1 1 2 2( , ), ( , )..., ( , )n ns g s g s g

Response data Students’ latent class

Item parameters

Page 30: Making Computerized Adaptive Testing Diagnostic Tools for Schools

30

New Tests vs. Existing Tests

• Existing Exams– Analyze the responses from an existing large-scale

assessment from a Cognitive Diagnosis framework.– Examine the results across various methods of

constructing a Q-matrix.• New Exams

– Identify Attributes and Content validity structure– Writing items according to cognitive specifications– Pre-testing– Q-matrix validation

Page 31: Making Computerized Adaptive Testing Diagnostic Tools for Schools

31

Application 1: existing dataset– A simple random sample of 2000 examinees who took the

• Grade 3 TAAS from Spring 2002 • Grade 11 TEKS from Spring 2003

– The Math & Reading portion of each test was analyzed by using the Fusion Model

– Item selection methods• Kullback-Libler (KL) • Shannon Entropy (SHE), and etc.

– Reference, e.g.,• McGlohen & Chang (2008)• Download from

http://www.psych.illinois.edu/people/showprofile.php?id=539• Or, google Hua-Hua Chang

Page 32: Making Computerized Adaptive Testing Diagnostic Tools for Schools

32

Taxes 3rd grade reading assessment 6 attributes (Application 1)

The student will determine the meaning of words in a variety of written texts.

The student will identify supporting ideas in a variety of written texts.

The student will summarize a variety of written texts.

The student will perceive relationships and recognize outcomes in a variety of written texts.

The student will analyze information in a variety of written texts in order to make inferences and generalizations.

The student will recognize points of view, propaganda, and/or statements of fact and opinion in a variety of written texts.

Page 33: Making Computerized Adaptive Testing Diagnostic Tools for Schools

HOW TO HELP SCHOOLS TO OWN AND OPERATE CD-CAT?

Why CD-CAT?

33

Page 34: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Building CAT-Driven Assessment and Diagnosis to Improve Student Learning

Chang & Ryan (IES Proposal)

• Develop the technical foundations for a CAT system to meet NCLB accountability and to inform teaching and learning.

• In alignment with race to the top (RTTT) priorities, the proposed CAT will include – individualized diagnostic information to provide teachers,

schools, and states with more-precise information about student achievement levels along with valuable formative feedback to inform instructional planning.

34

Page 35: Making Computerized Adaptive Testing Diagnostic Tools for Schools

HOW TO ADDRESS ISSUES SUCH AS SCHOOLS HAVE NO MONEY TO BUY AND OPERATE CD-CAT?

Address issues reviewers raised

35

Page 36: Making Computerized Adaptive Testing Diagnostic Tools for Schools

New Technologies--- Schools can use existing PCs or MACs

• Client/Server Architecture (CS)– CAT software has to be installed on each client computer ( large

workload)– only applicable to Local Area Network (LAN)

• Browser/Server Architecture (BS)– database is still on the server– nearly all the tasks concerning development, maintenance and

upgrade, are carried out on the server. – based on the Wide Area Network (WAN)

• Advantages of BS– Low maintenance, no network programming

36

Page 37: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Hardware and Network Design

Item Bank

37

Page 38: Making Computerized Adaptive Testing Diagnostic Tools for Schools

APPLICATION 2: THE CHINA PROJECT

Develop a CD-CAT system to show its applicability to improve teaching and learning

38

Page 39: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Application 2:Level II English Proficiency Test

• Pretest and Calibration of Item bank – Pretest

• 38,662 students from 78 schools, 12 counties participated

– Analyzing pretest data1. Estimated the parameters of DINA model2. Estimated the parameters of 3PLM model3. Calibrate attributes of item again4. If it fits well then stop, otherwise revise q-matrix and got 3

– Assembling the item bank with item parameters and specifications.

39

Page 40: Making Computerized Adaptive Testing Diagnostic Tools for Schools

40

Distribution of the students in pretest

Red: Field Test Sampling AreaYellow and red: Current Implementation

40

Page 41: Making Computerized Adaptive Testing Diagnostic Tools for Schools

41

Linking Design

Anchor itemsGroup1 Group2 Group3 Group4

Test1Test2Test3Test4Test5Test6Test7Test8Test9Test10Test11Test12Anchor Test

The locations of the anchor items in each booklet are the same (as they appear in anchor test).

Eg, this block has 10 anchor items,

41

Page 42: Making Computerized Adaptive Testing Diagnostic Tools for Schools

42

Page 43: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Item Writing

• About 40 Excellent Teachers in Beijing• Process

1. Psychometric Training2. Identify Attributes3. Writing Items4. Constructing Q-matrix5. Pre-testing and check FITTING6. Revise Q-matrix until fitting is ok; go to 5 if not7. stop

43

Page 44: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Item Selection Strategy

– Shannon Entropy (SHE) procedure was applied to select

next items

• SHE (Tatsuoka, 2002, Xu, Chang, & Douglas, 2004, McGlohen &

Chang, 2008)

– Dual Information (McGlohen & Chang, 2004 and 2008)

Cheng and Chang, 2007)

44

Page 45: Making Computerized Adaptive Testing Diagnostic Tools for Schools

• Parameters Estimation – The knowledge state of examinee is estimated

sequentially.– The Maximum posterior estimation (MAPE)

method was used in the system.

– The ability is estimated at the end of the test.

( )

0,1, ,2 1ˆ arg max ( ( | ))

K ic

mi cP u

1( ) 0

0 1( )2 1 2 1

1( )0 0

0 0 1

( ) (1 ( ))( ; )

( | )( ; ) ( ) (1 ( ))

ij ij

iK Ki

ij ij

i

mu u

m c j c j cc c jm

c mu um

c c c j c j cc c j

g P Pg L u

P ug L u g P P

45

Page 46: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Monte Carlo simulation Studies

• Item selection rule – Content constraints (same test structure as Pretest)

• Listening Dialog (item1-item10), the next items is selected within remaining Listening Dialog items in the item bank.

• Short Talks (item11-item12), two items for a piece of speech is selected within the short-talk items in the item bank.

• Grammar and Vocabulary (item17-item32), the next items is selected within remaining Grammar and Vocabulary items in the item bank.

• Reading Comprehension (item33-item40), the next items is selected within

remaining Reading Comprehension items in the item bank.

• Item selection strategy – the item was selected according to Shannon entropy procedure

46

Page 47: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Classification Accuracy & Evaluation Criteria

• Evaluation criteria– Rate of pattern match (RPM)

– Rate of marginal match (RMM)

– average test information

The number of examinees of pattern matchRPM=M

ijThe sum of all hMK

RMM

47

Page 48: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Field Test

• SHE with content constraints• The adaptive test was web-based, consisting

of 36 items and lasting for 40 minutes.

• Number of Participants: 584 – 5th and 6th grade, from 8 schools in Beijing,

China

48

Page 49: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Validity Study

• Evaluating the consistency of– CD-CAT system results with an existing English

achievement test• a group of students took two exams

– CD-CAT system results with Teachers’ evaluation outcomes.

49

Page 50: Making Computerized Adaptive Testing Diagnostic Tools for Schools

# of mastered attributes

Academic Performance Level

0 1 2 3 4 5 6 7 8 Total

Excellent 0 0 1 1 1 3 4 6 23 39

Good 0 0 1 2 8 5 7 7 3 33

Pass 1 1 3 5 3 1 0 0 1 15

Fail 0 1 2 0 0 0 0 0 0 3

Total 1 2 7 8 12 9 11 13 27 90

The Consistence between levels and # of mastered attributes

CD scores vs. scores of an achievement test

50

Page 51: Making Computerized Adaptive Testing Diagnostic Tools for Schools

CD-CAT Results vs. Teachers' Assessment

• Comparison of a CD scores with teachers’ assessment– Participants from three classes:

• 91 6-grade students and 3 teachers were recruited to evaluate the diagnostic reports. one rural school and two urban schools.

– Measurement• Students’ diagnostic reports were presented to three teachers, they

were asked to evaluate the accuracy of this report.

51

Page 52: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Validity Study: CD vs. Teachers

Evaluation on the CD-CAT feedback reports by teachers

Teacher High consistency medium consistency low consistency total

A 28(90.32) 3(9.68) 0(0.00) 31(100)

B 13(41.94) 16(51.61) 2(6.45) 31(100)

C 27(93.10) 1(3.45) 1(3.45) 29(100)

total 68(74.73) 20(21.98) 3(3.30) 91(100)

52

Page 53: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Discussions• Large scale field tests will take place in Shanghai and

Dalian in the near future.• CD-CAT can be implemented effectively and

economically. • Though the DINA model was used, the results can be

generalized to many other IRT and Cognitive Diagnostic Models!

• The method for on-line calibrating of pre-test items has been developed. In the future, paper/pencil based pretesting is not needed.

53

Page 54: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Conclusion

• CAT is revolutionarily changing the way we address challenges in assessment and learning.

• In June 2010 the IES proposal was revised and resubmitted.

• Any good example of LARGE-SCALE CD-CAT?– http://cp.guoshi.com/

54

Page 55: Making Computerized Adaptive Testing Diagnostic Tools for Schools

Thank you!55


Recommended