28
Crash Course in Crash Course in Psychometric Theory Psychometric Theory David B. Flora David B. Flora SP Area Brownbag SP Area Brownbag February 8, 2010 February 8, 2010

Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Embed Size (px)

Citation preview

Page 1: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Crash Course in Crash Course in Psychometric TheoryPsychometric Theory

David B. FloraDavid B. Flora

SP Area BrownbagSP Area Brownbag

February 8, 2010February 8, 2010

Page 2: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Research in social and personality psychology is Research in social and personality psychology is about abstract concepts of theoretical importance, about abstract concepts of theoretical importance, called “constructs.”called “constructs.”

Examples include “prejudice,” “self-esteem,” Examples include “prejudice,” “self-esteem,” “introversion,” “forgiveness,” and on and on…“introversion,” “forgiveness,” and on and on…

The success of a research study depends on how The success of a research study depends on how well constructs of interest are measured.well constructs of interest are measured.

The field of “Test Theory” or “Psychometrics” is The field of “Test Theory” or “Psychometrics” is concerned with the theory and accompanying concerned with the theory and accompanying research methods for the measurement of research methods for the measurement of psychological constructs.psychological constructs.

Page 3: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Psychometric theory evolved from the Psychometric theory evolved from the tradition of intelligence, or “mental ability”, tradition of intelligence, or “mental ability”, testing.testing.

Spearman (1904) invented factor analysis to Spearman (1904) invented factor analysis to aid in the measurement of intelligence.aid in the measurement of intelligence.

The psychophysics tradition is also The psychophysics tradition is also foundational to psychometric theory, as per foundational to psychometric theory, as per Thurstone’s (1928) law of comparative Thurstone’s (1928) law of comparative judgment for scaling of social stimuli.judgment for scaling of social stimuli.

A test question is a stimulus; the answer to A test question is a stimulus; the answer to the question is a behavioural response to the the question is a behavioural response to the stimulus.stimulus.

Page 4: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Classical True Score ModelClassical True Score Model

xxii = = ttii + + eeii

xxii is the observed value for person is the observed value for person ii from an from an

operationalization of a construct (e.g., a test operationalization of a construct (e.g., a test score).score).

ttii is that person’s is that person’s true scoretrue score on the construct. on the construct.

eei i is measurement error.is measurement error.

The variable The variable tt is a is a latentlatent variable: variable:

An unobservable variable that is measured by An unobservable variable that is measured by the observable variable the observable variable x.x.

Page 5: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Lord & Novick’s (1968) preferred definition of the Lord & Novick’s (1968) preferred definition of the true score (paraphrased):true score (paraphrased):

For a given person, there is a “propensity” For a given person, there is a “propensity” distribution of possible outcomes of a distribution of possible outcomes of a measurement that reflects the operation of measurement that reflects the operation of processes such momentary fluctuations in processes such momentary fluctuations in memory and attention or in strength of an memory and attention or in strength of an attitude. The person’s true score is the mean of attitude. The person’s true score is the mean of this propensity distribution.this propensity distribution.

Lord, F.M., & Novick, M.R. (1968). Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores.Statistical theories of mental test scores.

Page 6: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

ValidityValidityxxii = = ttii + + eeii oror ttii = = xxii eeii

Validity denotes the scientific utility of the scores, Validity denotes the scientific utility of the scores, xx, , obtained with a measuring instrument (i.e., a test).obtained with a measuring instrument (i.e., a test).

But there is more to it than just the size of But there is more to it than just the size of eeii..

Validity is mostly concerned with whether Validity is mostly concerned with whether xx measures the measures the tt that we want it to… that we want it to…

Note that validity is a property of the scores Note that validity is a property of the scores obtained from a test, not the test itself.obtained from a test, not the test itself.

Page 7: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Nunnally & Bernstein (1994), Nunnally & Bernstein (1994), Psychometric Theory Psychometric Theory (3(3rdrd ed.), p. 84: ed.), p. 84:

““Validation always requires empirical Validation always requires empirical investigations, with the nature of the measure investigations, with the nature of the measure and form of validity dictating the needed form of and form of validity dictating the needed form of [empirical] evidence.”[empirical] evidence.”

““Validation usually is a matter of degree rather Validation usually is a matter of degree rather than an all-or-none property, and validation is an than an all-or-none property, and validation is an unending process.”unending process.”

““Strictly speaking, one validates the Strictly speaking, one validates the use use to which a to which a measuring instrument is put rather than the measuring instrument is put rather than the instrument itself. Tests are often valid for one instrument itself. Tests are often valid for one purpose but not another.”purpose but not another.”

Page 8: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

You may have heard ofYou may have heard of

Internal validityInternal validity External validityExternal validity Face validityFace validity Content validityContent validity Construct validityConstruct validity Criterion validityCriterion validity Predictive validityPredictive validity Postdictive validityPostdictive validity Concurrent validityConcurrent validity Factorial validityFactorial validity Convergent validityConvergent validity Discriminant validityDiscriminant validity Incremental validityIncremental validity Ecological validityEcological validity

Page 9: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

StandardsStandards Standards for Educational and Psychological Standards for Educational and Psychological

TestingTesting (1966; 1974; 1985; 1999) is developed (1966; 1974; 1985; 1999) is developed jointly by AERA, APA, and NCME.jointly by AERA, APA, and NCME.

The Standards view validity as a unitary The Standards view validity as a unitary concept.concept.

Rather than there being separate types of Rather than there being separate types of validity, there are three main types of validity validity, there are three main types of validity evidence.evidence.1. Content-related evidence1. Content-related evidence2. Construct-related evidence2. Construct-related evidence3. Criterion-related evidence3. Criterion-related evidence

Page 10: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Content-related validity evidenceContent-related validity evidence Content validity refers to the extent to which a Content validity refers to the extent to which a

set of items (or stimuli) adequately reflects a set of items (or stimuli) adequately reflects a content domain.content domain.

E.g., selection of vocabulary words for Grade 6 E.g., selection of vocabulary words for Grade 6 vocabulary test from the domain of all words vocabulary test from the domain of all words taught to 6taught to 6thth graders. graders.

Evidence is based on theoretical judgment.Evidence is based on theoretical judgment.

Same as face validity?Same as face validity?- self-report judgment of overall health- self-report judgment of overall health

Page 11: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Construct-related validity evidenceConstruct-related validity evidence Cronbach, L.J., & Meehl, P.E. (1955). Construct validity Cronbach, L.J., & Meehl, P.E. (1955). Construct validity

in psychological tests.in psychological tests.

Mainly concerned with associations between test scores Mainly concerned with associations between test scores and other variables that are dictated by and other variables that are dictated by theorytheory..

Multi-trait multi-method correlation matrix (Campbell & Multi-trait multi-method correlation matrix (Campbell & Fiske, 1959):Fiske, 1959):Is the test strongly correlated with other measures of the Is the test strongly correlated with other measures of the same construct? (convergent validity)same construct? (convergent validity)

Is the test less strongly correlated with measures of Is the test less strongly correlated with measures of different constructs than with measures of the same different constructs than with measures of the same construct? (discriminant validity)construct? (discriminant validity)

Page 12: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Floyd & Widaman (1995), p. 287:Floyd & Widaman (1995), p. 287:

““Construct validity is supported if the factor Construct validity is supported if the factor structure of the [instrument] is consistent with structure of the [instrument] is consistent with the constructs the instrument purports to the constructs the instrument purports to measure.”measure.”

““If the factor analysis fails to detect underlying If the factor analysis fails to detect underlying constructs [i.e., factors] that explain sufficient constructs [i.e., factors] that explain sufficient variance in the [items] or if the constructs variance in the [items] or if the constructs detected are inconsistent with expectations, the detected are inconsistent with expectations, the construct validity of the scale is compromised.”construct validity of the scale is compromised.”

Floyd, F. J., & Widaman, K. F.  (1995).  Factor analysis in the development and Floyd, F. J., & Widaman, K. F.  (1995).  Factor analysis in the development and refinement of clinical assessment instruments.  refinement of clinical assessment instruments.  Psychological AssessmentPsychological Assessment, , 77, 286-299. , 286-299.

Page 13: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Criterion-related validity evidenceCriterion-related validity evidence

Evidence is based on empirical association with Evidence is based on empirical association with some important “gold standard” criterion.some important “gold standard” criterion.

Encompasses predictive and concurrent validity.Encompasses predictive and concurrent validity.

Difficult to distinguish from construct validityDifficult to distinguish from construct validity- Theoretical reason for association is critical for - Theoretical reason for association is critical for construct validity, less important for criterion construct validity, less important for criterion validity.validity.

E.g., relationship between a stress measure and E.g., relationship between a stress measure and physical health?physical health?

Page 14: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Do we really Do we really needneed your new scale? your new scale?

Does it have Does it have incremental validityincremental validity??

““Incremental validity is defined as the degree to Incremental validity is defined as the degree to which a measure explains or predicts a which a measure explains or predicts a phenomenon of interest, relative to other phenomenon of interest, relative to other measures. Incremental validity can be evaluated measures. Incremental validity can be evaluated on several dimensions, such as sensitivity to on several dimensions, such as sensitivity to change, diagnostic efficacy, content validity, change, diagnostic efficacy, content validity, treatment design and outcome, and convergent treatment design and outcome, and convergent validity.”validity.”

Haynes, S. N., & Lench, H. (2003). Incremental validity of new clinical Haynes, S. N., & Lench, H. (2003). Incremental validity of new clinical assessment measures. assessment measures. Psychological Assessment, 15,Psychological Assessment, 15, 456-466. 456-466.

Page 15: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

ReliabilityReliability

Reliability is Reliability is necessarynecessary, but , but notnot sufficient, for sufficient, for construct validity.construct validity.

Lack of reliability (i.e., measurement error) Lack of reliability (i.e., measurement error) introduces bias in analyses and reduces introduces bias in analyses and reduces statistical power.statistical power.

What What exactlyexactly is reliability? is reliability?xxii = = ttii + + eeii

Reliability = Var(Reliability = Var(ttii) / Var() / Var(xxii))Reliability is the proportion of true score variance Reliability is the proportion of true score variance to total observed variance.to total observed variance.

Page 16: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Since we can’t directly observe Var(Since we can’t directly observe Var(ttii) , we must ) , we must

turn to other methods for estimating reliability…turn to other methods for estimating reliability…

Parallel-forms reliabilityParallel-forms reliability Split-half reliabilitySplit-half reliability Internal consistency reliability (coefficient alpha)Internal consistency reliability (coefficient alpha) Test-retest reliabilityTest-retest reliability Inter-rater reliabilityInter-rater reliability

Each is an Each is an estimateestimate of the proportion of true score of the proportion of true score variability to total variability.variability to total variability.

Page 17: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Coefficient alpha (Coefficient alpha ())

Original formula actually given by Guttman Original formula actually given by Guttman (1945), not Cronbach (1951)! (1945), not Cronbach (1951)!

An average of all inter-item correlations, An average of all inter-item correlations, weighted by the number of items, weighted by the number of items, kk::

The expected correlation of one test with an The expected correlation of one test with an alternate form containing the same number of alternate form containing the same number of items.items.

1 ( 1)

kr

k r

Page 18: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Coefficient alpha (Coefficient alpha ())

The more items, the larger The more items, the larger ..

A high A high does NOT imply unidimensionality (i.e., does NOT imply unidimensionality (i.e., that items all measure a single factor).that items all measure a single factor).

is a is a lower-bound lower-bound estimate of true reliability…estimate of true reliability…

Page 19: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

How does factor analysis fit in?How does factor analysis fit in?““Common factor model” for a “congeneric” set of Common factor model” for a “congeneric” set of items measuring a single construct:items measuring a single construct:

xxij ij = = jj ffi i + u+ uijij

xxij ij is the is the jjthth item on a multi-item testitem on a multi-item test

ffii is the is the common factor scorecommon factor score on the factor, or on the factor, or latent variable for person latent variable for person ii. .

j j is the is the factor loading factor loading of test item of test item j. j.

uuij ij is the factor score is the factor score unique factorunique factor j j for person for person ii..

It represents a mixture of systematic influence on It represents a mixture of systematic influence on random error influence on item random error influence on item xx::

uuij ij = (= (ssijij + e + eij ij ) )

Page 20: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

If we define If we define ttijij = = jj ffi i and assume that the systematic and assume that the systematic unique influence is negligible, so that unique influence is negligible, so that uuij ij ≈≈ (0 (0 + e + eij ij )…)…

……then the common factor model gives the Classical then the common factor model gives the Classical True Score model for scores on item True Score model for scores on item jj::

xxij ij = = jj ffi i + u+ uijij

xxij ij = t= ti i + e+ eijij

Coefficient Coefficient will be underestimated to the extent will be underestimated to the extent that the factor loadings, that the factor loadings, jj , vary across items., vary across items.

More accurate reliability estimates can be calculated More accurate reliability estimates can be calculated using the factor loadings.using the factor loadings.

-Perspective shifts from internal consistency to -Perspective shifts from internal consistency to latent variable relationshiplatent variable relationship

Page 21: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Tangential things you should know…Tangential things you should know… Principal components analysis (PCA) is Principal components analysis (PCA) is NOTNOT factor factor

analysis. When you run a PCA, you are analysis. When you run a PCA, you are NOTNOT estimating estimating the common factor model.the common factor model.

Situations where PCA is appropriate are quite rare in Situations where PCA is appropriate are quite rare in social and personality psychology.social and personality psychology.

The Pearson product-moment correlation is often The Pearson product-moment correlation is often NOT NOT adequate for describing the relationships among item-adequate for describing the relationships among item-level categorical variables!level categorical variables!

When factor analyzing items, we should usually use When factor analyzing items, we should usually use something other than product-moment correlations.something other than product-moment correlations.

One approach is to analyze One approach is to analyze polychoric polychoric correlations.correlations.

Page 22: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Modern Psychometric TheoryModern Psychometric Theory

Another approach that properly models item-Another approach that properly models item-level variables as categorical is level variables as categorical is Item Response Item Response Theory Theory (IRT).(IRT).

IRT represents a collection of models for relating IRT represents a collection of models for relating individual items within a test or scale to the individual items within a test or scale to the latent variable(s) they measure.latent variable(s) they measure.

IRT leads to test scores with smaller IRT leads to test scores with smaller measurement error than traditional item sums or measurement error than traditional item sums or means.means.

Page 23: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

IRTIRT The properties of each item are summarized The properties of each item are summarized

with an with an item characteristic curve item characteristic curve (ICC).(ICC).

The slope of the curve indicates itemThe slope of the curve indicates item discriminationdiscrimination, i.e., the strength of relationship , i.e., the strength of relationship between the item and the latent construct.between the item and the latent construct.

The horizontal location of the curve indicates The horizontal location of the curve indicates item item difficultydifficulty or or severityseverity..

Page 24: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

XX-axis, “theta,” represents latent trait or construct.-axis, “theta,” represents latent trait or construct. YY-axis represents probability of a positive item response.-axis represents probability of a positive item response.

Item characteristic Item characteristic curves (ICCs) for curves (ICCs) for four binary items four binary items with equal with equal discrimination but discrimination but varying “difficulty.”varying “difficulty.”

Page 25: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

Item characteristic Item characteristic curves (ICCs) for curves (ICCs) for four binary items four binary items with varying with varying discrimination and discrimination and varying difficulty.varying difficulty.

1 23

4

Items 1 and 2 have stronger discrimination than 3 and 4.Items 1 and 2 have stronger discrimination than 3 and 4. Item 1 has the lowest difficulty, item 4 the highest.Item 1 has the lowest difficulty, item 4 the highest.

Page 26: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

A “test information function”A “test information function” Shows precision of measurement as a function Shows precision of measurement as a function

of latent trait levelof latent trait level

Page 27: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

IRT scoresIRT scores

Scale scores constructed using IRTScale scores constructed using IRT

- take into account item discrimination, whereas - take into account item discrimination, whereas simple sum (or mean) scores assume all items simple sum (or mean) scores assume all items measure the construct equally wellmeasure the construct equally well

- have a proper interval scale of measurement, - have a proper interval scale of measurement, whereas simple sum scores are typically ordinal, whereas simple sum scores are typically ordinal, strictly speakingstrictly speaking

- have measurement error that varies across the - have measurement error that varies across the range of the construct, whereas simple sum range of the construct, whereas simple sum scores assume a single reliability value for the scores assume a single reliability value for the whole rangewhole range

Page 28: Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

The big pictureThe big picture

IRT was often presented as an alternative approach to IRT was often presented as an alternative approach to test theory at odds with classical test theory (CTT).test theory at odds with classical test theory (CTT).

Current perspective is that CTT and IRT complement Current perspective is that CTT and IRT complement and enhance each other.and enhance each other.-For example, the mathematical link between IRT and -For example, the mathematical link between IRT and factor analysis is now well understood.factor analysis is now well understood.

A well validated test will still produce scores with A well validated test will still produce scores with measurement error.measurement error.

Ideas from CTT, IRT, and structural equation modeling Ideas from CTT, IRT, and structural equation modeling can be implemented to produce powerful results that can be implemented to produce powerful results that account for measurement error, thus modeling account for measurement error, thus modeling relationships among the constructs themselves rather relationships among the constructs themselves rather than the operational variables.than the operational variables.