Setting Cut-Scores for Written Tests: A how-to guide (and ...c.ymcdn.com/sites/ · Setting Cut-Scores for Written Tests: A how-to guide (and other methods for hiring) Jim Kuthy,

Setting Cut-Scores for Written

Tests: A how-to guide

(and other methods for hiring)

Jim Kuthy, Ph.D. &

Heather Patchell, M.A.

Visit BCGi Online

• While you are waiting for the webinar to

begin,

– Don’t forget to check out our other training

opportunities through the BCGi website.

– Join our community by signing up (its free)

and we will notify you of our upcoming free

training events as well as other information

of value to the HR community.

www.BCGinstitute.org

HRCI Credit

• BCG is an HRCI Preferred Provider

• CE Credits are available for attending

this webinar

• Only those who remain with us for at

least 80% of the webinar will be eligible

to receive the HRCI training completion

form for CE submission

BCGi is Sponsored by

Biddle Consulting Group, Inc.

• Forty years of experience in Equal

Employment Opportunity consulting

• Has represented hundreds of employers in

litigation-related settings

• Has performed job analyses and validation

studies for hundreds of employers

• Has created valid selection testing that have

been used by thousands of employers

www.biddle.com

Setting Cut-Scores for Written

Tests: A how-to guide

(and other methods for hiring)

Jim Kuthy, Ph.D. &

Heather Patchell, M.A.

Contact Information

Jim Kuthy, Ph.D.Principal Consultant

[email protected]

(800) 999-0438 x 239

Heather Patchell, M.A.Consultant II

[email protected]

(800) 999-0438 x 155

The Presenters…

• Jim holds Masters and Doctorate Degrees in Industrial & Organizational Psychology

• Heather holds a Masters Degree in Psychology

• More than twenty-five combined years of experience in the employment selection field

• They have designed and/or validated work-sample tests for many employers, including conducting validation studies that have successfully passed review of federal agencies

• Jim taught Psychology and Business-related courses at the University of Akron and California State University, Sacramento and Heather is the Director of the BCGi Institute for Workforce Development and Managing Editor of the EEO Insight journal

What is a “Good” Cutoff Score?

• 60% 65% 70% 75% Something else?

• Well… it depends

– How difficult is the test?

– How reliable is the test?

– How job-related is the cutoff score?

– How are the test scores used?

– Can you defend it if challenged?

Professional Standards for Validation

• Society for Industrial & Organizational

Psychology (SIOP) Principles (2003)

– Selection procedures should be job

related

– Job relatedness is demonstrated when

the inferences made based on scores

are accurate

www.siop.org/_Principles/principles.pdf

10

Test Usage

• Pass/Fail Cutoffs:

– “Normal Expectations of Acceptable Proficiency in the

Workplace” (Guidelines, 5H)

– Modified Angoff (U.S. v. South Carolina, USSC)

• Banding:

– Substantially Equally Qualified Applicants

– Statistically Driven (uses the Std. Error of Difference)

• Ranking:

– Is there adequate score dispersion?

– Does the test have high reliability? (e.g., >.85)

– Is the KSAPC performance differentiating?

– Highest potential for adverse impact

Uniform Guidelines on Employee

Selection Procedures (UGESP)

• Section 5F: Where cutoff scores are used, they

should normally be set so as to be reasonable and

consistent with normal expectations of acceptable

proficiency within the work force. Where applicants

are ranked on the basis of properly validated selection

procedures and those applicants scoring below a

higher cutoff score than appropriate in light of such

expectations have little or no chance of being selected

for employment, the higher cutoff score may be

appropriate, but the degree of adverse impact should

be considered.

www.uniformguidelines.com

Content Validity Mechanics

• Content Validity (UGESP Section 14 &

15C)

– Jobs require that people have certain

knowledge, skills, abilities, and personal

characteristics (KSAPCs)

– Identify KSAPCs that are required for the job

– Create a test that measures a person’s

possession of those KSAPCs

Cutoffs & the Uniform Guidelines on

Employee Selection Procedures (UGESP)

• Content Validated testing: UGESP Section 15C(7): If

the selection procedure is used with a cutoff score, the

user should describe the way in which normal

expectations of proficiency within the work force were

determined and the way in which the cutoff score was

determined (essential). In addition, if the selection

procedure is to be used for ranking, the user should

specify the evidence showing that a higher score on

the selection procedure is likely to result in better job

performance.

Content Validated Testing

• For “content valid” tests you can determine

normal expectations of proficiency by

identifying percent of minimally qualified job

applicants who should answer a question

correctly

– A minimally-qualified job applicant is one who

posses the necessary, baseline level of knowledge,

skill, ability, or personal characteristic measured by a

test to successfully perform the first day (before

training) on the job

– This is called the Angoff approach; other approaches

to determining cutoff scores may also be acceptable

The U.S. Courts generally accept the

Angoff method for determining cutoff

scores for content valid knowledge and

ability tests

• Example of how it works:

– Job Experts review each item on a written

test and provide their “best estimate” on the

percentage of minimally qualified

applicants they believe would answer the

item correctly (i.e., each item is assigned a

percentage value).

– These ratings are averaged and a valid

cutoff for the test can be developed.

Cutoff Score Adjustments

• You should consider reducing your pass/fail cutoff score using the conditional standard error of measurement

• Question: If there is the cutoff score my job experts recommend, why should I consider using any other cutoff score?

• Answer: Because no tests are perfect. They include "measurement error" and the U.S. Supreme Court has supported adjusting minimum passing (cutoff) scores to account for this.

– U.S. v. South Carolina, 434 US 1026 (1978)

Modified (Adjusted) Angoff

Cutoff Scores

• The modified Angoff method adds a

slight variation:

• After the test has been administered,

the cutoff level set using the method

above is lowered by 1, 2, or 3

Conditional Standard Errors of

Measurement (SEMs) to adjust for the

unreliability of the test

Traditional Standard Error of

Measurement (SEM)

• The traditional SEM represents the

standard deviation of an applicant’s true

score (the score that represents the

applicant’s actual ability level) around

his or her obtained (or actual) score.

• The traditional SEM considers the entire

range of test scores when calculated.

Test Scores

Number of

Test Takers

The Traditional SEM Considers the Entire Range

of Test Scores when Calculated

Because the traditional SEM considers the entire range of

scores, its accuracy and relevance is limited when

evaluating the reliability and consistency of test scores

within a certain range of the score distribution.

Higher ScoreLower Score

The Traditional SEM Considers the Entire

Range of Test Scores when Calculated

• Most test-scores distributions have scores

bunched in the middle and then spread out at

the high- and low-end range of the distribution

• Those who score in the lowest range of the

distribution lower the overall test reliability,

which affects the size of the SEM

• High scores can also lower the overall

reliability of a test since the “true variance”

associated with the test-score range is reduced

Test Scores

Number of

Test Takers

The Conditional SEM (CSEM) attempts to avoid

the limitations of the SEM by considering only

the score range of interest when calculating its

value.

By only considering the scores around the critical score

value, the Conditional SEM is the most accurate estimate

of the reliability dynamics of the test that exist around the

critical score.

Higher ScoreLower Score

Putting Standards into Practice

• Modified Angoff method for developing

cut scores

1. Gather job experts

2. Review test items

3. Establish baseline % scores

4. Account for outliers; remove if appropriate

5. Administer test and compute its reliability

6. Adjust downward 1 to 3 CSEMs

7. Consider adjusting for upward bias

Cutoffs & the Uniform Guidelines on

Employee Selection Procedures (UGESP)

• Criterion Validated testing: UGESP

Section 15B(10): If the selection

procedure is used with a cutoff score, the

user should describe the way in which

normal expectations of proficiency

within the work force were determined

and the way in which the cutoff score was

determined (essential).

Criterion Validity Mechanics

• Criterion Validity (UGESP Sections 14B &

15B)

– Determined by comparing candidate’s

scores on a test with measurements of their

job performance

– These values are correlated, allowing for a

determination of how well the test measures

what is required for job performance.

– Can offer the ability to use statistical

projections and other empirical data to set

cut scores

Score on some “Criteria”

(e.g., job performance,

production rate, etc.)

Score on a

“Test”

Test Score = 22

Performance = 31

Test Score = 85

Performance = 55

We then compute a line the reflects the best

fit of the relationship between test scores and

the criteria

Where do we set the cutoff for this type of

testing?

Where do we set the cutoff for

Criterion Validated testing?

• The potentially most defensible place to

set the cutoff score is at the normal

expectation of proficiency• You might have supervisors rate qualified vs.

unqualified performers and use that information to

determine this

• However… some argue you can set the

cutoff score anywhere along the line that

reflects the relationship between test

scores and performance measures

Alternatives to Setting Cutoff Scores

31

• Ranking assumes one applicant is reliably more qualified than the other

• Banding considers the unreliability of the test battery and “ties” applicants

• Pass/fail cutoffs treat all applicants as either “qualified” or “not qualified”


• Banding

– Used to group applicants into “substantially

equally qualified score bands”

– Ignores small score differences

– Sometimes has less adverse impact than

rank ordering

– Has been successfully defended in litigation

– Requires appropriate statistical

computations


• Ranking

– Simply ordering scores from highest to

lowest and choosing candidates in that

order.

– Some particularly stringent requirements

involved as set forth in the Guidelines 14 C –

9.

– Must link selection procedure to

“performance differentiating” KSAPCs

– Should look at things like distribution of

scores and the reliability of selection

procedure


• Ranking – Criterion Validity

– Regularly endorsed by the courts when

appropriately used

– Strict minimum threshold for correlation

coefficients from criterion validation studies

(r > .30) when ranking candidates

Rank Ordering; Banding

• Top-down, rank ordering of test takers by their scores often results in higher levels of adverse impact

• Banding is one method that might help to minimize this concern

– Group candidates into “substantially equally qualified” score bands

– Successfully supported in courto Officers for Justice v. Civil Service Commission (CA9, 979 F.2d 721, cert. denied, 61 U.S.L.W. 3367, 113S Ct. 1645, March 29, 1993). See also Henle, C. A., Case review of the legal status of banding. Human Performance, 17(4), 415-432.

UGESP Section 5G

• The evidence of both the validity and utility of a

selection procedure should support the method the

user chooses for operational use of the procedure, if

that method of use has a greater adverse impact than

another method of use.

– Evidence which may be sufficient to support the use of a

selection procedure on a pass/fail (screening) basis may be

insufficient to support the use of the same procedure on a

ranking basis under these guidelines. Thus, if a user decides

to use a selection procedure on a ranking basis, and that

method of use has a greater adverse impact than use on an

appropriate pass/fail basis… the user should have sufficient

evidence of validity and utility to support the use on a ranking

basis.

37

Factor Ranking Banding

Pass/Fail

Cutoffs

Validation

RequirementsHigher Moderate Lower

Adverse Impact Typically Higher Moderate Typically Lower

Defensibility Potentially Lower Higher Higher

Litigation "Red

Flag"Higher Moderate Lower

Utility Higher Moderate Lower

Cost Lower Moderate Higher

Applicant FlowRestrictive/

Controllable

Moderate/

ControllableHigh

Development

TimeLower Moderate Higher

Reliability

RequirementsHigher Moderate Low, but still > .70

# Item Req. Higher Moderate Lower

Potential Issues

Questions?

Documents

Setting Cut-Scores for Written Tests: A how-to guide (and ...c.ymcdn.com/sites/ · Setting Cut-Scores for Written Tests: A how-to guide (and other methods for hiring) Jim Kuthy,