Upload
sanjoy-sanyal
View
7.203
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Describes the Classification of standards, standard-setting models and the Hofsee method of scaling, as employed in Medical University of Americas, as per USMLE guidelines.
Citation preview
29/11/2009
Standard Setting and Medical Students’
AssessmentDr. Sanjoy Sanyal
Associate Professor – Neurosciences
Medical University of the Americas
Nevis, St. Kitts-Nevis, WI
List of topics• Summative
assessment
• Standard-setting
• Classification of standards
• Standard-setting models– Test-centered
models– Examinee-centered
models
• Modified Angoff
approach
• The Hofstee method
• Evaluation of
standards
• Future perspectives
• Conclusion
• References
What is Summative Assessment?
• In the context of a Caribbean medical
school training students to be future
doctors, summative assessment can be
interpreted as any of the following:
– End-point / end-semester assessment
– Certification examination
– Licensing examination
Reasons for Post-training Summative Assessment
1. Trainee motivation: Assessment drives
learning
2. Recognition of achievement
3. Rite of passage: Initiation to the profession
4. Reputation of the discipline
5. Patient safety
6. Quality marker for patients
Characteristics of Good Summative Assessment
VALIDITY* RELIABILITY**
FEASIBILITY
Content validity
Accurate Practicable
Construct validity
Consistent Cost-effective
Predictive validity
Fair Proportionate
Assessment Methods
Adapted from Roger Neighbour 2006
Factors Playing a Role in Recruitment of Examiners
Qualities
• Credibility
• Can ‘rank order’
• Trainable
• Impartial
• Team players
Incentives
• Status
• Influence
• Stimulation
• ‘Make a difference’
• Financial
Selection of Standard-setting Panelists
1. Experts in related field of examination
2. Familiar with examination methods
3. Good problem solvers
4. Familiar with level of candidates
5. Interested in education (teachers)
Establishing Standards – Policy Decision
Deciding who should pass or fail
should be a matter of policy
decision rather than a statistical
exercise
Why Need Standard Setting?
• To provide an educational tool to decide
cut-off point on the scoring scale which
separates the non-competent from the
competent
• To determine standards of performance,
which separate competent from the non-
competent candidate
Pertinent Questions Regarding Standard Setting
• What is the main purpose of assessment?
• What is at stake?
– For students
– For patients
– For organization
• Who has an interest in the outcome?
• What message do we wish to convey?
• What may be the effect of high / low pass rate?
Pertinent Questions Regarding Standard Setting
• What are the rules of combination in a multi-component examination?
• Who should set the standards?– Examiners?– Clinical practitioners?– Patients?
• Should the standards be absolute or relative?
• What happens to those who fail under the current standards?
• Are there any appeals procedure?
Qualities of Good Standards for Assessments
• Transparent marking and standard-setting process
• High reliability indices (Cronbach’s α >0.8; Cohen’s κ >+4)*
• Corrections for test variance and Error of Measurement
• Low examiner variability (recruitment, training, feedback)
• Fair appeals procedure
Educational Benefits of Standard Setting
• Faculty development
• Quality control of test materials
Standards – Classification
• Norm-referenced standards vs. Criterion-
referenced standards
• Compensatory Standards vs. Conjunctive
Standards
Ort
ho
go
nal
Bip
ola
r S
tan
dar
ds
Norm-referenced Standards
• Standard is based on performance of an
external large representative sample (‘Norm
group’) equivalent to candidates taking the
test
• May result in reasonable standards provided
the group is representative of candidates’
population, heterogeneous and large
Criterion-referenced Standards
• Links the standard to a set criterion of the
competence level under consideration
• Can be:
– Relative criterion standard
– Absolute criterion standard
Relative Criterion Standard
• A relative standard can be set at the mean
performance of candidates;
• Or by defining the units of SD from mean
• These standards may vary from year to year
due to shifts in ability of the group
• May result in a fixed annual percentage of
failing students
Absolute Criterion Standard
• Absolute criterion standard stays same over
multiple administrations of the test, relative
to the content specifications of the test
• Failure rate may vary due to changes in the
group’s ability, from one test administration
to the other
• The standard is set on the total test score
• Candidates can compensate for poor
performance in some parts of exam with
good performance in others
Compensatory Standards
• Standards are set for individual components
of the examination
• Candidates cannot compensate for poor
performance in one part
– Each skill component considered separately
– Allows diagnostic feedback to candidates
– Higher the correlation among test
components, greater the inclination towards
a compensatory standard
Conjunctive Standards
Standard-setting Models
• Test-centered models: Judges review test
items and provide judgments as to ‘just
adequate’ level of performance on these
items
• Examinee-centered models: Judges identify
(and sort) an actual (not hypothetical) group
of examinees
Test-centered Models
• Angoff model
• Ebel’s approach
• Nedelsky approach
• Jaeger’s method
Angoff Model
• A judgemental approach
• Group of expert judges make judgements
about how borderline candidates would
perform on items in the examination
• Details described later…
Ebel’s Approach
• Judges categorise items in a test according
to levels of difficulty and relevance to the
decision to be made
• Then they decide on proportion of items in
each category that a hypothetical group of
examinees could respond to correctly
Nedelsky Approach
• Originally designed for multiple choice
items
• For each item, judges decide on how many
of the distractors (response options) a
minimally competent examinee would
recognise as being incorrect
Jaeger’s method
• Emphasises the need to sample all
populations that have a legitimate interest in
outcomes of competency testing
• Focuses on passing examinees rather than
on borderline or minimally competent
Examinee-centered Models
• Borderline-group method
• Contrasts-by-group approach
• Hofstee method
Borderline-group method
• Judges identify an actual (not hypothetical)
borderline group
• The median score for this group is used as
the passing score
Contrasts-by-Group approach• Panellists sort examinees into 2 groups: competent and not-competent– This judgement is based on prior characteristics
of examinees rather than the current test scores– Test scores are not known to panellist during
sorting process
• After sorting is completed, score distributions for competent / not-competent groups are plotted
• Point of intersection of the two distributions is considered as the passing score
Hofstee Method
• A standard setting approach that
incorporates advantages of both relative
and absolute standard setting procedures
• Details described later…
Two Common Standard-Setting Procedures
• Modified Angoff procedure
– A Test-centered model
– Judgmental approach
– Suitable for MCQ examinations
• The Hofstee method
– An Examinee-centered model
– Compromise relative/absolute method
– Suitable for overall pass/fail decisions
– Approved by USMLE
Modified Angoff Procedure
• Judges discuss characteristics of a borderline candidate ‘only just good enough to pass’
• They make judgements about borderline candidate’s likelihood to respond correctly to each test item
• For each test item, judges estimate % of borderline candidates that is likely to answer the item correctly
• Pass / fail standard is the average of % for all items
The Hofstee Method• This takes advantages of both relative and
absolute standard-setting procedures and
arrives at a compromise between the two
• Reference group of judges agree on ff:
– Lowest acceptable fail rate (A)
– Highest acceptable fail rate (B)
– Lowest permissible passing grade (C)
– The required passing score (D)
The Hofstee Method
Adapted from Roger Neighbour 2006
Evaluation of Standards• Standard setting process should be
evaluated
• Evaluation includes data on 1st and 2nd ratings
of panellists for each test item rated
• This should demonstrate increased
consensus among raters (Cohen’s κ inter-
rater reliability)
• A questionnaire should be administered to
panellists at end of standard setting process
Future Perspectives
• Much work is still needed to establish
effective standard setting procedures
• Length of procedures should be considered
• Ways to shorten the process are needed
• Fully compensatory models should be
considered, in which test items are
averaged to produce a test standard
Future Perspectives
• Obtained standards should be checked
against other information available on the
test-taker to ensure construct validity
• Effective methods of training panellists to
recognise borderline characteristics are
essential if Angoff approach is widely used
Conclusion
• The more standard setting procedures are
applied to a variety of tests,
• More the practice of high quality testing will
be enhanced, and
• Higher will be the confidence in the testing
of professional competencies
References• Neighbour, Roger. Summative assessment
and standard setting.
www.jafm.org/edu/20060128/sem4_060129.pdf
• Friedman Ben-David. Standard setting in student assessment – An extended summary of AMEE Medical Education Guide No 18. Medical Teacher (2000) 22, 2, pp 120-130 www.medev.ac.uk/resources/features/AMEE_summaries/Guide18summaryMar04.pdf