Validity

Validity

• Does test measure what it says it does?• Is the test useful?

• Can a test be reliable, but not valid?• Can a test be valid, but not reliable?

Types of validity

• Face validity– Important only so far as it doesn’t interfere with

an examinee’s willingness to cooperate.• Content validity

– How well does the test cover areas of content that it should?

– How adequately does it sample the universe of behavior it was designed to assess?

Content validity (cont.)

• Panel of “experts”– Is the item/content essential?– Lawshe (1975) >50% of experts see skill as

essential• Important for:

– Achievement/classroom tests– Training program exams– Professional exams

Criterion-Related Validity

• How well does a test score relate to another score/variable of interest?– Correlate test with criterion

• Standard against which test is evaluated

• Concurrent• Predictive

Criterion-Related Validity (cont.)

• Criterion should be– Reliable

• Reliability limits validity; can’t be valid if not reliable.

– Relevant– Valid– Uncontaminated

• Criterion measure has been based in part on predictor measure


• Concurrent validity– Criterion immediately available– Present standing on a criterion

• Diagnosis, score on another test– Used to predict the performance of new test

takers or for people for whom the criterion isn’t available.


• Predictive validity– Test given, criterion measured later– Ex. ACT & College GPA; employment test &

job performance• Incremental validity

Base Rate & Decision Theory

• Base rate: proportion of population who possess a certain trait, characteristic or attribute– % of EIU undergrads who graduate– % of African Americans with sickle cell anemia

• Base rate affects usefulness of tests

Decision Theory

• 4 outcomes

False rejections/negatives

Valid Acceptances/Positives

Valid Rejections/negatives

False Acceptances/Positives

Cut scores & Hit rates

False rejections/negatives Valid Acceptances/Positives

Valid Rejections/negatives

False Acceptances/Positives

Cut scores & Hit rates (cont.)

• Reciprocal relationship between # of false rejections and # of false acceptances

• Which is more acceptable: to limit the number accepted who shouldn’t be, or to minimize the # rejected who could be successful?

Construct Validity

• Construct:– Scientific idea hypothesized to explain behavior – Postulated attribute of people, assumed to be reflected

in test score– Ex.: intelligence, self-esteem, motivation

• Construct validity: Does the test measure the construct?– Gives theoretical meaning to scores;– Subsumes all other types of validity

Construct Validity (cont.)

• Convergent evidence/validity• Divergent/discriminant evidence• Factor analysis

– Data reduction/simplification of complex correlational matrices … to reveal major dimensions that underlie a set of items

– A factor is considered to be the construct that best represents relationships among variables

Factor Analysis (cont.)

• Methods of factor analysis– Exploratory

1. Correlation matrix2. Factor matrix with loadings3. Label factors• Used to develop or eliminate items or scales from

composite scores

Factor Analysis (cont.)

• Confirmatory factor analysis– Goodness of fit– After test has been developed

Validity & Bias

• Bias: a factor inherent within a test that systematically prevents accurate, impartial measurement– Bias implies systematic, not random variation

• Can you make equally valid predictions for different groups?

Bias in Predictions

• Questions of regression– Slope– Intercept– Error of estimate

Slope Bias

Bias & the DAS

60

80

100

120

140

75 85 100 115General Conceptual Ability Scores

Word Reading Scores

Whites

Asian Americans

Linear (Whites)

Linear (AsianAmericans)

Intercept Bias

Bias & the DAS

0

20

40

60

80

100

120

140

1 2 3 4 5 6General Conceptual Ability Scores

Basic Number Skills

Series1Series2

Rating error

• Leniency Error• Severity Error• Central Tendency Error• Halo Effect

Test Fairness

• Is the test used in an impartial, just, and equitable manner?

• Good tests Discriminate among individuals– Are group differences due to inadequate tests?– Is the test being used fairly?

Documents

Validity