Upload
haylie-vital
View
216
Download
0
Embed Size (px)
Citation preview
Powerpoint TemplatesPage 1
Powerpoint Templates
Methods of Standard Methods of Standard SettingSetting
Natalia Gaponova
Powerpoint TemplatesPage 2
IntroductionIntroduction
• All standard setting methods involve expert judgemental decision making at some level... (Jaegar, 1979)
•There is no such thing as a true standard, but there is a theoretical cut-score that would be set by a judge if he or she totally understood the process, the test, the content, and the policy and had a true score on the test in mind as the standard. The question is whether the standard setting method can recover the theoretical cut-score assuming a judge performed every task consistently and without error (Reckase, 2000)
• Many different terms are used in the measurement literature to refer to performance standards: “passing scores”, “cut scores”, “cutoff score”, “performance levels”, “achievement levels”, “mastery levels”, “proficiency levels”, “tresholds” and “standards” (Hambleton, 2001)
Powerpoint TemplatesPage 3
The importance of standard-settingThe importance of standard-setting
• Cut-score – is crucial for all participants of testing
must be reasoned and fair
necessary to use methods that allow with a mathematical precision to make it possible
Powerpoint TemplatesPage 4
Participants of testing need
•to compare themselves with other examinees
•to estimate correctly and adequately their level of mastery of the material
Common solutionCommon solution: : Setting of cut-scores and division of
examinees into groups in accordance with their ability level
Policy-makers
Are interested in overall level of educational
achievements, which could reflect the real situation in schools and classes of a
region
Interpretation of the mass-testing resultsInterpretation of the mass-testing results
Powerpoint TemplatesPage 5
Professional and ethic responsibility of people, who conduct testing for the provided results
1.
Interpretation of the results should be available to any understanding of the audience and should not cause an obvious disagreement with them
2.
The results interpretation should reflect real situation and be informative for policy-makers
3.
The results interpretation should not have a dual meaning – the examinees of one group should have really different levels of ability from examinees from another group
4.
Why is it important to establish reasonable Why is it important to establish reasonable and fair cut-scores?and fair cut-scores?
Powerpoint TemplatesPage 6
Second Page :
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
Cycle Diagram
Test-centered
Criterion-referenced
Norm-referenced
Examinee-centered
Standard-Setting Methods
Classification of
Powerpoint TemplatesPage 7
The most commonly used classification scheme nowadays is the one suggested by Jaeger (1989) who
splits the standard setting methods into two large groups
Test-centered• Angoff• Ebel• Nedelsky• Jaeger• Objective Standard
Setting• Bookmark• Etc.
Examinee-centered• Method of Contrasting
Groups• Method of Borderline
group• Etc.
Powerpoint TemplatesPage 8
ANGOFFTest-centered method
Powerpoint TemplatesPage 9
MethodMethod Angoff – Angoff – one of the most preferred one of the most preferred widely and frequently used methodswidely and frequently used methods
AngoffAngoff
Traditional Modified
Powerpoint TemplatesPage 10
Procedure of standard settingProcedure of standard setting ((traditional method Angofftraditional method Angoff))
Experts rate the probability that a barely or minimally satisfactory or qualified person would answer each test item correctly
The average of these probabilities across judges or raters is the cutoff score
Powerpoint TemplatesPage 11
Advantages and disadvantagesAdvantages and disadvantages
+• Transparency and clarity • Simplicity• Flexibility
-• ? Objectiveness
decision making about the probability of a correct answer by a minimally competent examinee
• One round in rating
variable values
(fluctuating rated probability)
Powerpoint TemplatesPage 12
EBELTest-centered method
Powerpoint TemplatesPage 13
Procedure of Standard SettingProcedure of Standard Setting
• 2 Rounds• Experts classify independently test items by:
I level of difficulty
II level of relevance
easy medium hard
essential important acceptable questionable
Powerpoint TemplatesPage 14
For each judge then: All items could be classified 12 cells in a 3*4 grid defined by the three
difficulty and four relevance category. As in the example:
categories Expert №3 Expert №4 Expert №5
Number of items
in a category
(А)
% correctly performed
items
(В)
А*В
Number of items
in a category
(А)
% correctly performed
items
(В)
А*В
Number of items
in a category
(А)
% correctly performed
items
(В)
А*В
Essential
Easy 11 60 660 10 70 700 13 75 975
Medium 1 25 25 3 25 75 1 0 0
Hard 0 10 0 1 0 0 0 0 0
QuestionableEasy 0 0 0 0 0 0 0 0 0
Medium 0 0 0 0 0 0 0 0 0Hard 0 0 0 0 0 0 0 0 0Mean 25.1 26.7 35
Mean for all experts
28
Cut-score 12
……
Powerpoint TemplatesPage 15
How to count a cut-score Judges indicated the percentage of items within each of
the 12 cells that a student should answer correctly in order to be judged minimally competent each item assigned to one of the 12 cells based on the expert’s ratings the percent passing judgment for a cell
multiplied times the number of items in a cell these products summed over all 12 cells to get an overall passing score for a judge these passing scores - averaged over judges in order to get the composite passing score
Powerpoint TemplatesPage 16
Advantages and disadvantagesAdvantages and disadvantages
++• Can be used with
different types of items (not only multiple-choice)
--• It may be challenging for standard
setting participants to keep the two dimensions of difficulty and relevance distinct because those dimensions may, in some situations, be highly correlated
• Validity concern has to do with judgments about item relevance. Because the inclusion of items judged to be of questionable relevance appears on its face to weaken the validity evidence supporting defensible interpretation of the total test scores
Powerpoint TemplatesPage 17
NEDELSKYTest-centered
Powerpoint TemplatesPage 18
General conceptGeneral concept
NedelskyNedelsky proposed considering the characteristics and proposed considering the characteristics and performance of a hypothetical borderline examinee that performance of a hypothetical borderline examinee that he referred to as the he referred to as the “F-D student”“F-D student”. Responses . Responses (distractors) which the lowest (distractors) which the lowest D-student D-student should should be able be able to reject as incorrectto reject as incorrect, and which therefore should be , and which therefore should be attractive to [failing students]attractive to [failing students] are called are called F-F-responsesresponses… Students who possess just … Students who possess just enough enough knowledge to eliminate F-responses knowledge to eliminate F-responses and must choose and must choose among the remaining responses at random are called among the remaining responses at random are called F-F-D studentsD students..
Powerpoint TemplatesPage 19
Procedure of Standard SettingProcedure of Standard Setting
• The experts independently determine F-responses which minimally competent examinees would be able to be able to eliminate as incorrecteliminate as incorrect
• The number of other options determines the probability with which the candidate will answer correctly the question: a plausible answer = 100%, 2 = 50%, 3 = 33%, 4 = 25%, and 5 = 0% probability of a correct answer
Powerpoint TemplatesPage 20
An exampleAn example• Participants judged that, for a certain five-option item, Participants judged that, for a certain five-option item,
borderline examinees would be expected to rule out two borderline examinees would be expected to rule out two of the options as incorrect, leaving them to choose from of the options as incorrect, leaving them to choose from the remaining three options. The Nedelsky rating for this the remaining three options. The Nedelsky rating for this item would be 1/3 = 0.33. Repeating the judgment item would be 1/3 = 0.33. Repeating the judgment process for each item would give a number of Nedelsky process for each item would give a number of Nedelsky values equal to the number of items in the test (n). The values equal to the number of items in the test (n). The sum of the n values can be directly used as a raw score sum of the n values can be directly used as a raw score cut score. For example, a 50-item test consisting entirely cut score. For example, a 50-item test consisting entirely of items with Nedelsky ratings of 0.33 would yield a of items with Nedelsky ratings of 0.33 would yield a recommended passing score of 16.5 (i.e., 50 × 0.33 = recommended passing score of 16.5 (i.e., 50 × 0.33 = 16.5)16.5)
Powerpoint TemplatesPage 21
Advantages and disadvantagesAdvantages and disadvantages+
• Nedelsky method is used for many years to establish threshold assessment. Probably it’s been popular for many years, because the procedure is clear for experts, they can make a decision about responses quickly, which is minimally competent examinee would be able to eliminate as incorrect.
• It can be used without preliminary approbation of a test
-• Can be used only with multiple-
choice items• Raters tend not to assign
probabilities of 1.00 (i.e., to judge that a borderline examinee could rule out all incorrect response options), this tends to create a downward bias in item ratings (i.e., a rating of .50 is assigned to an item instead of 1.00) with the overall result being a somewhat lower passing score than the participants may have intended to recommend, and somewhat lower passing scores compared to other methods
Powerpoint TemplatesPage 22
BOOKMARKTest-centered (based on Item-Response Theory)
Powerpoint TemplatesPage 23
EssentialEssential materialsmaterials
Powerpoint TemplatesPage 24
Standard SettingStandard Setting
Presentation of the percentage ofPresentation of the percentage ofstudents falling into each performance level students falling into each performance level and each median cut-score from Round 2. and each median cut-score from Round 2. After discussion individual judgmentsAfter discussion individual judgments
Overview of established cut-scores by every Overview of established cut-scores by every expert, repeating of the same procedure asexpert, repeating of the same procedure as
in the first stepin the first step
Experts are informed about the essential numberExperts are informed about the essential number of cut-scores to establish. Experts work inof cut-scores to establish. Experts work insmall groups, all the essential material issmall groups, all the essential material is
introduced to themintroduced to them
Basic steps of the Basic steps of the procedureprocedure
Round III
Round II
Round I
Powerpoint TemplatesPage 25
Round 1• The main goals are to get panelists familiar with the ordered
item booklet, set initial bookmarks, and then discuss the placements.
• Panelists are asked to discuss and determine the content that students should master for placement into a given performance level.
• Their independent judgments of cut-scores are expressed by simply placing a bookmark between the items judged to represent a cut-point. One bookmark is placed for each of the required cut-points.
• Items preceding the participant's bookmark reflect content that all students at the given performance level are expected to know and be able to perform successfully with a probability of at least 0.67 or 0.50.
Powerpoint TemplatesPage 26
Round 2• The first activity in Round 2 involves having each member
place bookmarks in his/her ordered item booklet where each of the other panelists in their small group made their bookmark placement. For a group of 6 people, each panelist’s ordered booklet will have 6 bookmarks for each cut point.
• Discussions are then focused on the items between the first and last bookmarks for each performance level. Upon completion of this discussion, the panelists then independently reset their bookmarks. The median of the Round 2 bookmarks for each cut point is taken as that group’s recommendation for that cut-point.
Powerpoint TemplatesPage 27
Round 3
• The percentage of students falling into each performance level is presented, given each group’s median cut-score from Round 2.
• With this information of how students actually performed, the panelists discuss the bookmarks in the large group and then make their Round 3 independent judgments of where to place the bookmarks.
• The median for the large group is considered to be the final cut-point for a given performance level.
Powerpoint TemplatesPage 28
METHOD OF CONTRASTING GROUPS
Examinee-centered
Powerpoint TemplatesPage 29
Method of contrasting groupsMethod of contrasting groups• Procedure includes testing of two groups of
examinees
• Comparison of the distribution of test scores for each examinee, who was classified by category
• In the place of intersection of two distributions cut-score
CompetentCompetent Non-competentNon-competent
Powerpoint TemplatesPage 30
Powerpoint TemplatesPage 31
Advantages and disadvantagesAdvantages and disadvantages++
• Can be used with any kind of an item type
--• Classifying students
on competent and non-competent is doubted to be objective
Powerpoint TemplatesPage 32
THANK YOU FOR ATTENTIONYour questions?