Examing Rounding Rules in Angoff Type Standard Setting Methods Adam E. Wyse Mark D. Reckase

Examing Rounding Rules in Angoff Type Standard Setting Methods

Adam E. WyseMark D. Reckase

Mark D. Reckase• Current Projects• Multidimensional Item Response Theory

– Development of methodology for fine grained analysis of item response data in high dimensional spaces. Application of methodology to gain understanding of constructs assessed by tests.

• Test Design and Construction – Design of content and statistical specifications for tests using the

philosophy of item response theory. Use of computerized test assembly procedures to match test specifications.

• Portfolio Assessment – Design of portfolio assessment systems, including formal objective

scoring of portfolios. • Procedures for Setting Standards

– Development and evaluation of procedures for setting standards on educational and psychological tests. Includes extensive work on setting standards on the National Assessment of Educational Progress.

• Computerized Adaptive Testing – Developing procedures for selecting and administering test items

to individuals using computer technology. In particular, designing systems to match item selection to the specific requirements for test use.

Angoff Method

• The probability of the minimally competent examinee (MCE) would respond correctly to the item

Modified Angoff Method (1)

• Round to a whole number of score point (Yes/No method)

Polytom

Dichotom

• Rate the MCE score of each cluster of items.-Round to 1 decimal place -round to integer

• How to aggregate those rater’s judgment– Mean or median (for excluding the effect of outliner)

mean median18.166

7 18.4

20.8833 21

Theoretical Framework

• Reckase 2006 Round to integer

Round to 0.05

Perfectly understand the relation between Item difficulty and Cut theta

• Reckase 2006

Round to 1 decimal place

Round to 2 decimal places

• Bias– Individual panelists cut-score– Group level cut-scores: mean or median.

• Other evidence for evaluating Standard Setting– Correlation: item ratings and P values provided by

panelists• Can’t detect the panelists’ servility• Errors can be incorporated into Reckase evaluation

approach.

• Assumption– Only for single round (Without training effect)– Do not include error (In an ideal setting)

• Investigate the impact of the Angoff modifications and rounding rules in the ideal situation.

Data and Method• NEAP Data– 20 raters last round– The panelist’s θ cut-score in NEAP was his

intended cut-score.• 2PL• 3PL• GPCM:

E(X|θ)=1*P1(θ)+2*P2(θ)+3*P3(θ)+4*P4(θ)

Simulated conditions

• Round – Integer: 1.2345 1– Nearest 0.05: 1.2345 1.25– Nearest 2 decimal places: 1.2345 1.23

• Item pool– 180, 107, 109, 53 items

Simulated conditions

• Individual item vs. clusters of items• Cut-scores– Basic, Proficient, and advanced

• Aggregating value– Mean vs. Median

Evaluation Criteria• Bias:–

• Average absolute bias:–

• Bias for the group’s intended cut score– mean:

– median:

Result –individual panelist

> > > >

Rounding: integer > 0.05 > 2 decimal places

Cut-score location: Advanced > Basic > Proficient

Individual items > cluster level (fewer rounding error)

Item pool: 53 items have greater bias than the other pools

Item pool: 53 items < 180 items , for Proficient, integer.The importance of the location of Cut-score and the items distribution

Result –Group panelist

Some cases the Mean is better, other cases the Median is better

Basic were “-” bias, Proficient and Advanced were “+” bias.At cluster item level, the proficient was “-” bias.

The advanced produced the greatest bias than other two level.The bias did not cancel out for a group of panelists.

Both the mean and median bias < 0.01 for round to 0.05 and 2 decimal places.Again, more test items did not necessarily.

Cluster level is better than individual items.

Impact on Percent Above Cut-score (PAC)

Finding the PAC for the closest value on the NAEP in the pilot study.PAC for estimating θ - PAC for intended θ. Nearest 0.05 or nearest 0.01 did not change. No effect. Minimal impact

Basic: 5.610~13.010Proficient: -3.823~-4.387Advanced: -1.156~-1.262

Basic: 4.490~14.190Proficient: -4.387~-5.346Advanced: -1.156~-1.343

Bias: Advanced > Basic and ProficientPAC: Advanced < Basic and ProficientThere are more student near the basic and proficient cut score

Rounding to the integer dose not present a viable alternative in Angoff method.

Discussion

• Rounding to integer could affect the cut scores.– Using cluster item level can mitigate bias, but

biases still remained.• Using more test items will not necessarily

produce less bias.– The important is the location of the items in

relationship to the intended cut-score.

Discussion

• 10 items [-2 ~ +2]• Cut score θ = 0– 5 items rounded to score 1 – 5 items rounded to score 0

• Cut total score = 5 θ = 0

• Bias = 0

Discussion

• 20 items [-1 ~ +3]• Cut score θ = 0– 5 items rounded to score 1 – 15 items rounded to score 0

• Cut total score = 5 θ = -0.438

• Bias = -0.438

Discussion

• Using OIB from bookmark to roughly design half of the items were above cut-score.– Impossible to know the location of cut-score.– The intended cut-scores in different panelists are

different. Some panelists must have bias• In multiple cut-scores, at lease one of cut-

scores would produce bias.• Rounding to integer present many potential

problems.

Discussion

• Challenge: in real situations panelists are not completely consistent in their judgments.– Feedback is helpful for reducing rater

inconsistency in NAEP

• Further development– Examine the bias at the group level

Thank you for attention

Examing Rounding Rules in Angoff Type Standard Setting Methods Adam E. Wyse Mark D. Reckase

Documents

Examing the Specificity of TV Editing in Contemporary Crime Drama

an Value Added Measures of Teacher Education Performance e ...education.msu.edu/epc/library/documents/WP18Guarino-Reckase-Wooldridge... · an Value-Added Measures of Teacher Education

EXAMING NURSING STUDENTS’ UNDERSTANDING OF THE CARDIOVASCULAR

Examing the Emerging Trends in Higher ... - Waseda University

Online Standard Setting Workshop€¦ · Process and Validation for Setting Standards (Reckase, 2009) Agency that Calls for the Standard Policy Definition of Standard

OBJECTIVITY AND MULTIDIMENSIONALITY · OBJECTIVITY AND MULTIDIMENSIONALITY AN ALTERNATING LEAST SQUARES ALGORITHM ... Mark Wilson and Mark Reckase are seminal figures in this field,

Mixing Messages & Methods: Examing News Content on Facebook & Twitter

All Our Kin- Examing Quality in Family Child Care

Internet Interventions for Hearing Loss: Examing …liu.diva-portal.org/smash/get/diva2:691763/FULLTEXT02.pdfInternet Interventions for Hearing Loss Examing rehabilitation, self-report

A Comparison of the Angoff, Beuk, and Hofstee Methods for

MODELLING AZERBAIJAN’S FISCAL DECENTRALIZATION: EXAMING ... · Modelling Azerbaijan’s Fiscal Decentralization: ... industry, ninety percent of ... MODELLING AZERBAIJAN’S FISCAL

Using the Angoff Method to Set Cut Scores

William H. Angoff Memorial Lecture Inferences About Teachers Based on Student … · 2016. 5. 13. · 2013 William H. Angoff Memorial Lecture . The Allure of VAM Value-Added Models

EXAMING DETERMINANTS OF SELF-CONCEPT AND LIFE SATISFACTION OF WHEELCHAIR … · 2015-02-19 · examing determinants of self-concept and life satisfaction of . wheelchair rugby players

International Comparisons of Educational Achievement ...International Comparisons of Educational Achievement: Cautions and Caveats. Michael J. Feuer. ETS Angoff Memorial Lecture.

Examing l 2011

Web viewCentrality in a Modified-Angoff Standard Setting. PhD Dissertation - Proposal. by. Michael Scott Sommers (張夏石) Candidate for the Degree. Doctor of

Beyond the commodity metaphor: Examing emotional and symbolic

examing the coping strategies of parents - University of Wisconsin

STATEMENT OF BEFORE THE HOUSE OF … · HOUSE OF REPRESENTATIVES COMMITTEE ON OVERSIGHT AND GOVERNMENT REFORM ON EXAMING THE ADMINISTRATION’S TREATMENT OF WHISTLEBLOWER ... center