Upload
haminh
View
217
Download
2
Embed Size (px)
Citation preview
Setting Cut-Scores for Written
Tests: A how-to guide
(and other methods for hiring)
Jim Kuthy, Ph.D. &
Heather Patchell, M.A.
Visit BCGi Online
• While you are waiting for the webinar to
begin,
– Don’t forget to check out our other training
opportunities through the BCGi website.
– Join our community by signing up (its free)
and we will notify you of our upcoming free
training events as well as other information
of value to the HR community.
www.BCGinstitute.org
HRCI Credit
• BCG is an HRCI Preferred Provider
• CE Credits are available for attending
this webinar
• Only those who remain with us for at
least 80% of the webinar will be eligible
to receive the HRCI training completion
form for CE submission
BCGi is Sponsored by
Biddle Consulting Group, Inc.
• Forty years of experience in Equal
Employment Opportunity consulting
• Has represented hundreds of employers in
litigation-related settings
• Has performed job analyses and validation
studies for hundreds of employers
• Has created valid selection testing that have
been used by thousands of employers
www.biddle.com
Setting Cut-Scores for Written
Tests: A how-to guide
(and other methods for hiring)
Jim Kuthy, Ph.D. &
Heather Patchell, M.A.
Contact Information
Jim Kuthy, Ph.D.Principal Consultant
(800) 999-0438 x 239
Heather Patchell, M.A.Consultant II
(800) 999-0438 x 155
The Presenters…
• Jim holds Masters and Doctorate Degrees in Industrial & Organizational Psychology
• Heather holds a Masters Degree in Psychology
• More than twenty-five combined years of experience in the employment selection field
• They have designed and/or validated work-sample tests for many employers, including conducting validation studies that have successfully passed review of federal agencies
• Jim taught Psychology and Business-related courses at the University of Akron and California State University, Sacramento and Heather is the Director of the BCGi Institute for Workforce Development and Managing Editor of the EEO Insight journal
What is a “Good” Cutoff Score?
• 60% 65% 70% 75% Something else?
• Well… it depends
– How difficult is the test?
– How reliable is the test?
– How job-related is the cutoff score?
– How are the test scores used?
– Can you defend it if challenged?
Professional Standards for Validation
• Society for Industrial & Organizational
Psychology (SIOP) Principles (2003)
– Selection procedures should be job
related
– Job relatedness is demonstrated when
the inferences made based on scores
are accurate
www.siop.org/_Principles/principles.pdf
10
Test Usage
• Pass/Fail Cutoffs:
– “Normal Expectations of Acceptable Proficiency in the
Workplace” (Guidelines, 5H)
– Modified Angoff (U.S. v. South Carolina, USSC)
• Banding:
– Substantially Equally Qualified Applicants
– Statistically Driven (uses the Std. Error of Difference)
• Ranking:
– Is there adequate score dispersion?
– Does the test have high reliability? (e.g., >.85)
– Is the KSAPC performance differentiating?
– Highest potential for adverse impact
Uniform Guidelines on Employee
Selection Procedures (UGESP)
• Section 5F: Where cutoff scores are used, they
should normally be set so as to be reasonable and
consistent with normal expectations of acceptable
proficiency within the work force. Where applicants
are ranked on the basis of properly validated selection
procedures and those applicants scoring below a
higher cutoff score than appropriate in light of such
expectations have little or no chance of being selected
for employment, the higher cutoff score may be
appropriate, but the degree of adverse impact should
be considered.
www.uniformguidelines.com
Content Validity Mechanics
• Content Validity (UGESP Section 14 &
15C)
– Jobs require that people have certain
knowledge, skills, abilities, and personal
characteristics (KSAPCs)
– Identify KSAPCs that are required for the job
– Create a test that measures a person’s
possession of those KSAPCs
Cutoffs & the Uniform Guidelines on
Employee Selection Procedures (UGESP)
• Content Validated testing: UGESP Section 15C(7): If
the selection procedure is used with a cutoff score, the
user should describe the way in which normal
expectations of proficiency within the work force were
determined and the way in which the cutoff score was
determined (essential). In addition, if the selection
procedure is to be used for ranking, the user should
specify the evidence showing that a higher score on
the selection procedure is likely to result in better job
performance.
Content Validated Testing
• For “content valid” tests you can determine
normal expectations of proficiency by
identifying percent of minimally qualified job
applicants who should answer a question
correctly
– A minimally-qualified job applicant is one who
posses the necessary, baseline level of knowledge,
skill, ability, or personal characteristic measured by a
test to successfully perform the first day (before
training) on the job
– This is called the Angoff approach; other approaches
to determining cutoff scores may also be acceptable
The U.S. Courts generally accept the
Angoff method for determining cutoff
scores for content valid knowledge and
ability tests
• Example of how it works:
– Job Experts review each item on a written
test and provide their “best estimate” on the
percentage of minimally qualified
applicants they believe would answer the
item correctly (i.e., each item is assigned a
percentage value).
– These ratings are averaged and a valid
cutoff for the test can be developed.
Cutoff Score Adjustments
• You should consider reducing your pass/fail cutoff score using the conditional standard error of measurement
• Question: If there is the cutoff score my job experts recommend, why should I consider using any other cutoff score?
• Answer: Because no tests are perfect. They include "measurement error" and the U.S. Supreme Court has supported adjusting minimum passing (cutoff) scores to account for this.
– U.S. v. South Carolina, 434 US 1026 (1978)
Modified (Adjusted) Angoff
Cutoff Scores
• The modified Angoff method adds a
slight variation:
• After the test has been administered,
the cutoff level set using the method
above is lowered by 1, 2, or 3
Conditional Standard Errors of
Measurement (SEMs) to adjust for the
unreliability of the test
Traditional Standard Error of
Measurement (SEM)
• The traditional SEM represents the
standard deviation of an applicant’s true
score (the score that represents the
applicant’s actual ability level) around
his or her obtained (or actual) score.
• The traditional SEM considers the entire
range of test scores when calculated.
Test Scores
Number of
Test Takers
The Traditional SEM Considers the Entire Range
of Test Scores when Calculated
Because the traditional SEM considers the entire range of
scores, its accuracy and relevance is limited when
evaluating the reliability and consistency of test scores
within a certain range of the score distribution.
Higher ScoreLower Score
The Traditional SEM Considers the Entire
Range of Test Scores when Calculated
• Most test-scores distributions have scores
bunched in the middle and then spread out at
the high- and low-end range of the distribution
• Those who score in the lowest range of the
distribution lower the overall test reliability,
which affects the size of the SEM
• High scores can also lower the overall
reliability of a test since the “true variance”
associated with the test-score range is reduced
Test Scores
Number of
Test Takers
The Conditional SEM (CSEM) attempts to avoid
the limitations of the SEM by considering only
the score range of interest when calculating its
value.
By only considering the scores around the critical score
value, the Conditional SEM is the most accurate estimate
of the reliability dynamics of the test that exist around the
critical score.
Higher ScoreLower Score
Putting Standards into Practice
• Modified Angoff method for developing
cut scores
1. Gather job experts
2. Review test items
3. Establish baseline % scores
4. Account for outliers; remove if appropriate
5. Administer test and compute its reliability
6. Adjust downward 1 to 3 CSEMs
7. Consider adjusting for upward bias
Cutoffs & the Uniform Guidelines on
Employee Selection Procedures (UGESP)
• Criterion Validated testing: UGESP
Section 15B(10): If the selection
procedure is used with a cutoff score, the
user should describe the way in which
normal expectations of proficiency
within the work force were determined
and the way in which the cutoff score was
determined (essential).
Criterion Validity Mechanics
• Criterion Validity (UGESP Sections 14B &
15B)
– Determined by comparing candidate’s
scores on a test with measurements of their
job performance
– These values are correlated, allowing for a
determination of how well the test measures
what is required for job performance.
– Can offer the ability to use statistical
projections and other empirical data to set
cut scores
Score on some “Criteria”
(e.g., job performance,
production rate, etc.)
Score on a
“Test”
Test Score = 22
Performance = 31
Test Score = 85
Performance = 55
We then compute a line the reflects the best
fit of the relationship between test scores and
the criteria
Where do we set the cutoff for this type of
testing?
Where do we set the cutoff for
Criterion Validated testing?
• The potentially most defensible place to
set the cutoff score is at the normal
expectation of proficiency• You might have supervisors rate qualified vs.
unqualified performers and use that information to
determine this
• However… some argue you can set the
cutoff score anywhere along the line that
reflects the relationship between test
scores and performance measures
Alternatives to Setting Cutoff Scores
31
• Ranking assumes one applicant is reliably more qualified than the other
• Banding considers the unreliability of the test battery and “ties” applicants
• Pass/fail cutoffs treat all applicants as either “qualified” or “not qualified”
Alternatives to Setting Cutoff Scores
• Banding
– Used to group applicants into “substantially
equally qualified score bands”
– Ignores small score differences
– Sometimes has less adverse impact than
rank ordering
– Has been successfully defended in litigation
– Requires appropriate statistical
computations
Alternatives to Setting Cutoff Scores
• Ranking
– Simply ordering scores from highest to
lowest and choosing candidates in that
order.
– Some particularly stringent requirements
involved as set forth in the Guidelines 14 C –
9.
– Must link selection procedure to
“performance differentiating” KSAPCs
– Should look at things like distribution of
scores and the reliability of selection
procedure
Alternatives to Setting Cutoff Scores
• Ranking – Criterion Validity
– Regularly endorsed by the courts when
appropriately used
– Strict minimum threshold for correlation
coefficients from criterion validation studies
(r > .30) when ranking candidates
Rank Ordering; Banding
• Top-down, rank ordering of test takers by their scores often results in higher levels of adverse impact
• Banding is one method that might help to minimize this concern
– Group candidates into “substantially equally qualified” score bands
– Successfully supported in courto Officers for Justice v. Civil Service Commission (CA9, 979 F.2d 721, cert. denied, 61 U.S.L.W. 3367, 113S Ct. 1645, March 29, 1993). See also Henle, C. A., Case review of the legal status of banding. Human Performance, 17(4), 415-432.
UGESP Section 5G
• The evidence of both the validity and utility of a
selection procedure should support the method the
user chooses for operational use of the procedure, if
that method of use has a greater adverse impact than
another method of use.
– Evidence which may be sufficient to support the use of a
selection procedure on a pass/fail (screening) basis may be
insufficient to support the use of the same procedure on a
ranking basis under these guidelines. Thus, if a user decides
to use a selection procedure on a ranking basis, and that
method of use has a greater adverse impact than use on an
appropriate pass/fail basis… the user should have sufficient
evidence of validity and utility to support the use on a ranking
basis.
37
Factor Ranking Banding
Pass/Fail
Cutoffs
Validation
RequirementsHigher Moderate Lower
Adverse Impact Typically Higher Moderate Typically Lower
Defensibility Potentially Lower Higher Higher
Litigation "Red
Flag"Higher Moderate Lower
Utility Higher Moderate Lower
Cost Lower Moderate Higher
Applicant FlowRestrictive/
Controllable
Moderate/
ControllableHigh
Development
TimeLower Moderate Higher
Reliability
RequirementsHigher Moderate Low, but still > .70
# Item Req. Higher Moderate Lower
Potential Issues
Questions?