65
1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

Embed Size (px)

Citation preview

Page 1: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

1

Class 3

Classical Methods of Scale Construction October 13, 2005

Anita L. StewartInstitute for Health & Aging

University of California, San Francisco

Page 2: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

2

Readings and Homework

Homework as stated in syllabus is for the following week

Readings are relevant to the current week

Page 3: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

3

Overview of class

Types of measurement scales Rationale for multi-item measures Scale construction methods Error concepts

Page 4: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

4

Types of Measurement Scales

Categorical (nominal)– Classification– Numbers are labels for categories

Continuous (along a continuum)– Ordinal – Interval– Ratio

Page 5: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

5

Classification vs. Continuous Scores

CES-D continuous score– 20 items summed using Likert scaling methods– Range of sum is 0-60, used as continuous score in

correlational studies CES-D classification score:

– Those scoring 16 or higher are “classified” as having likely depression » Referred for further screening

Page 6: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

6

Categorical (Nominal) Scales/Measures

Primary language 1 Spanish 2 English 3 Other

Can you walk without help? 1 Yes

2 No

Numbers have no inherent meaning

Page 7: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

7

Ordinal Scales: Numbers Reflect Increasing Level

Change in health:1 Better

2 No change

3 Worse

Income:1 < $10,000

2 $10,000 - <$20,000

3 $20,000 - <$30,000

4 >$30,000

Numbers have no inherent meaning other than “more” or “less.”

Page 8: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

8

Another Example of Ordinal Scale

How much pain did you have this past week?1 None

2 Very mild

3 Mild

4 Moderate

5 Severe

6 Very severe

Page 9: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

9

Feature of Ordinal Scales

Distances between numbers are unknown and probably vary– some closer together in meaning than others

When ordinal responses are determining extent of agreement (agree, disagree) – referred to as a Likert scale

Likert scale has since come to have other meanings in health measurement

Page 10: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

10

Interval Scales

Numbers have equal intervals A unit change is constant across the scale Example - temperature

– can add and subtract scores

– a 2 unit change is the same at lower temperatures as higher temperatures

Page 11: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

11

Ratio Scale

Has a meaningful zero point Change scores have specific meaning and can multiply

– e.g., one score can be 2 or 3 times another Examples

– Weight in pounds– Income in dollars– Number of visits

Page 12: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

12

Types of Measurement Scales and Their Properties

Property of Numbers

Type of scale Rank order

Equal interval

Absolute zero

Nominal No No No

Ordinal Yes No No

Interval Yes Yes No

Ratio Yes Yes Yes

Page 13: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

13

Overview of class

Types of measurement scales Rationale for multi-item measures Scale construction methods Error concepts

Page 14: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

14

Single- and Multi-Item Measures

Advantages of single items– Response choices are interpretable

Disadvantages– Numbers are not easily interpretable– Limited variability

» Easy to get skewed distributions

– Reliability is usually low– Difficult to assess a complex concept with

one item

Page 15: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

15

Interpretability of “Numbers” in Single Item Ordinal Scale

How much pain did you havethis past week?

1 - none

2 – very mild

3 - mild

4 - moderate

5 - severe

6 – very severe

Page 16: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

16

Interpretability of “Numbers” in Single Item Ordinal Scale

How much pain did you havethis past week?

1 - none

2 – very mild

3 - mild

4 - moderate

5 - severe

6 – very severe

Is “very severe”twice as painful

as “mild”?

Page 17: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

17

Estimated Distance Between Levels in Ordinal Scale (N=2,928) (0-100 scale)

How much pain did you have this past week?

0-100 transform

M pain scale

1 - none 0 3.30

2 – very mild 20 12.19

3 - mild 40 21.89

4 - moderate 60 38.76

5 - severe 80 59.43

6 – very severe 100 75.38

Page 18: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

18

Distance Between Levels in an Ordinal Scale (N=2,928)

How much pain did you havethis past week?

Mean: pain scale

1 - none 3.30

2 – very mild 12.19

3 - mild 21.89

4 - moderate 38.76

5 - severe 59.43

6 – very severe 75.38

9

10

17

20

16

Page 19: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

19

Distance Between Levels: “In general, how would you rate your health?”

Mean: current health scale

ScreeningN=~11,000

BaselineN=3,054

1 – poor 0 10.8 10.8

2 – fair 25 30.0 30.6

3 – good 50 57.6 55.9

4 – very good 75 75.5 75.4

5 – excellent 100 87.9 86.9

Page 20: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

20

Distance Between Levels: “In general, how would you rate your health?”

Mean: current health scale

ScreeningN=~11,000

BaselineN=3,054

1 – poor 10.8 10.8

2 – fair 30.0 30.6

3 – good 57.6 55.9

4 – very good 75.5 75.4

5 – excellent 87.9 86.9

20

26

18

11

Page 21: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

21

Multi-Item Measures or Scales

Multi-item measures are created by combining two or more items into an overall measure or scale score

Page 22: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

22

Advantages of Multi-item measures

More scale values (enhances sensitivity) Improves score distribution (more normal) Reduces number of variables needed to

measure one concept Improves reliability (reduces random error) Can estimate a score if some items are missing Enriches the concept being measured (more

valid)

Page 23: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

23

Overview of class

Types of measurement scales Rationale for multi-item measures Scale construction methods Error concepts

Page 24: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

24

Types of Scale Construction

Summated ratings scales– Likert scaling

Utility weighting or preference-based measures (econometric scales)

Guttman scaling Thurstone scales Many others

Page 25: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

25

Example of a 2-item Summated Ratings Scale

How much of the time .... tired?

1 - All of the time

2 - Most of the time

3 - Some of the time

4 - A little of the time

5 - None of the time

How much of the time…. full of energy?

1 - All of the time

2 - Most of the time

3 - Some of the time

4 - A little of the time

5 - None of the time

Page 26: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

26

Step 1: Reverse One Item So They Are All in the Same Direction

How much of the time .... tired?

1 - All of the time

2 - Most of the time

3 - Some of the time

4 - A little of the time

5 - None of the time

How much of the time…. full of energy?

1=5 All of the time

2=4 Most of the time

3=3 Some of the time

4=2 A little of the time

5=1 None of the time

Reverse “energy” item so high score = more energy

Page 27: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

27

Step 2: Sum the Two Items

How much of the time .... tired?

1 - All of the time 2 - Most of the time

3 - Some of the time

4 - A little of the time

5 - None of the time

How much of the time…. full of energy?5 All of the time

4 Most of the time

3 Some of the time

2 A little of the time

1 None of the time

Highest = 10 (tired none of the time, full of energy all of the time)Lowest = 2 (tired all of the time, full of energy none of the time)

Page 28: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

28

Step 2: Average the Two Items

How much of the time .... tired?

1 - All of the time 2 - Most of the time

3 - Some of the time

4 - A little of the time

5 - None of the time

How much of the time…. full of energy?5 All of the time

4 Most of the time

3 Some of the time

2 A little of the time

1 None of the time

Highest = 5.0 (tired none of the time, full of energy all of the time)Lowest = 1.0 (tired all of the time, full of energy none of the time)

Page 29: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

29

Summed or Averaged: Increase Number of Levels from 5 to 9

Summed Averaged

2 1.0

3 1.5

4 2.0

5 2.5

6 3.0

7 3.5

8 4.0

9 4.5

10 5.0

Page 30: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

30

Summated Scales: Scaling Analyses

To create a summated scale, one needs to first test whether a set of items that appear to measure the same concept can be combined– Need to test hypothesis that the items do indeed

belong together to form a single concept Five criteria need to be met to combine items

into a summated scale

Page 31: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

31

Five Criteria to Meet to Qualify as a Summated Scale

Item convergence Item discrimination No unhypothesized dimensions Items contribute similar proportion of

information to score Items have equal variances

Page 32: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

32

First Criterion: Item Convergence

Each item correlates substantially with the total score of all items– with the item taken out or “corrected for

overlap” Typical criterion is >= .30

– for well-developed scales, often set at>= .40

Page 33: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

33

Example: Analyzing Convergent Validity for Adaptive Coping Scale

Item-scale correlations

Adaptive coping (alpha = .70)

5 Get emotional support from others .49

11 See it in a different light .62

18 Accept the reality of it .25

20 Find comfort in religion .58

13 Get comfort from someone .45

21 Learn to live with it .21

23 Pray or meditate .39 Moody-Ayers SY et al. Prevalence and correlates of perceived

societal racism in older African American adults with type 2 diabetes mellitus. J Amer Geriatr Soc, 2005, in press.

Page 34: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

34

Example: Analyzing Convergent Validity for Adaptive Coping Scale

Item-scale correlations

Adaptive coping (alpha = .70)

5 Get emotional support from others .49

11 See it in a different light .62

18 Accept the reality of it .25 <.30

20 Find comfort in religion .58

13 Get comfort from someone .45

21 Learn to live with it .21 <.30

23 Pray or meditate .39

Page 35: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

35

Example: Analyzing Convergent Validity for Adaptive Coping Scale

Item-scale correlations

Adaptive coping (alpha = .76)

5 Get emotional support from others .45

11 See it in a different light .59

20 Find comfort in religion .73

13 Get comfort from someone .45

23 Pray or meditate .51

Acceptance (alpha = .67)

21 Learn to live with it .50 18 Accept the reality of it .50

Page 36: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

36

SAS/SPSS Make Item Convergence Analysis Easy

Reliability programs provide this– Item-scale correlations corrected for overlap

– Internal consistency reliability (coefficient alpha)

– Reliability with each item removed» To see effect of removing a bad item

Page 37: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

37

Second Criterion: Item Discrimination

Each item correlates significantly higher with the construct it is hypothesized to measure than with other constructs– Item discrimination

Statistical significance is determined by standard error of the correlation – Determined by sample size

Page 38: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

38

Multitrait Scaling - An Approach to Constructing Multi-item Scales

Confirms whether hypothesized item groupings can be summed into a scale score

Examines extent to which all five criteria are met

Examines resulting scales

Page 39: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

39

Example: Two Subscales Being Developed

Depression and Anxiety subscales of MOS Psychological Distress measure

Page 40: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

40

Example of Multitrait Scaling Matrix: Hypothesized Scales

ANXIETY DEPRESSION ANXIETY

Nervous person .80 .65

Tense, high strung .83 .70

Anxious, worried .78 .78

Restless, fidgety .76 .68DEPRESSION

Low spirits .75 .89

Downhearted .74 .88

Depressed .76 .90

Moody .77 .82

Page 41: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

41

Example of Multitrait Scaling Matrix: Item Convergence

ANXIETY DEPRESSION ANXIETY

Nervous person .80* .65

Tense, high strung .83* .70

Anxious, worried .78* .78

Restless, fidgety .76* .68DEPRESSION

Low spirits .75 .89*

Downhearted .74 .88*

Depressed .76 .90*

Moody .77 .82*

Page 42: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

42

Example of Multitrait Scaling Matrix: Item Convergence

ANXIETY DEPRESSION ANXIETY

Nervous person .80* .65

Tense, high strung .83* .70

Anxious, worried .78* .78

Restless, fidgety .76* .68DEPRESSION

Low spirits .75 .89*

Downhearted .74 .88*

Depressed .76 .90*

Moody .77 .82*

Page 43: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

43

Example of Multitrait Scaling Matrix: Item Discrimination

ANXIETY DEPRESSION ANXIETY

Nervous person .80* .65

Tense, high strung .83* .70

Anxious, worried .78* .78

Restless, fidgety .76* .68DEPRESSION

Low spirits .75 .89*

Downhearted .74 .88*

Depressed .76 .90*

Moody .77 .82*

Page 44: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

44

Preference Based or Utility Measures

Utilities are numeric measurements that reflect the desirability people associate with a health state or condition– Value of that health state

– Preference for that health state (rather than another)

Page 45: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

45

Methods for Assigning Values?

Four steps:– Identify the population of judges who will

assign “preferences”– Sample and describe health states to be

assigned utilities– Select a preference measurement method– Collect preference judgments, analyze the

data, and assign weights to the health states

Page 46: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

46

Preference Based or Utility Measures (cont.)

Advantages– Combine complex health states into a single number

Score reflects the value or preference for the overall health state

Need two absolute reference points– 0 represents death– 1 represents perfect health

Methods for obtaining value weights– Time tradeoff, standard gamble, rating scales

Page 47: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

47

Readings on Utility Measurement

A huge literature Some readings available on request

Page 48: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

48

Overview

Types of measurement scales Rationale for multi-item measures Scale construction methods Error concepts

Page 49: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

49

Concepts of Error

How to depict error Distinction between random error and

systematic error

Page 50: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

50

Components of an Individual’s Observed Item Score

(NOTE: Simplistic view)

Observed true item score score

= + error

Page 51: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

51

Components of Variability in Item Scores of a Group of Individuals

Observed true score score variance variance

Total variance (Variation is the sum of all observed item scores)

= + errorvariance

Page 52: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

52

Combining Items into Multi-Item Scales

When items are combined into a scale score, error cancels out to some extent– Error variance is reduced as more items are

combined– As you reduce random error, amount of “true

score” increases– Multi-item scale is thus more reliable than any

single item

Page 53: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

53

Sources of Error

Subjects Observers or interviewers Measure or instrument

Page 54: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

54

Measuring Weight in Pounds of Children: Weight without shoes

Observed scores is a linear combination of many sources of variation for an individual

Page 55: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

55

Measuring Weight in Pounds of Children: Weight without shoes

Scale ismiscalibrated

True weight

Amount of water

past 30 min

Weightof clothes

Observed weight

Person weighing children

is not very precise

= + +

+ +

Page 56: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

56

Measuring Weight in Pounds of Children: Weight without shoes

Scale ismiscalibrated

+1 lb

True weight80 lbs

Amount of water

past 30 min+.25 lb

Weightof clothes

+.75 lb

Observed weight83 lbs

Person weighing children

is not very precise+1 lb

= + +

+ +

83 = 80 +.25 +.75 +1 +1

Page 57: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

57

Sources of Error

Weight of clothes– Subject source of error

Person weighing child is not precise– Observer source of error

Scale is miscalibrated– Instrument source of error

Page 58: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

58

Measuring Depressive Symptoms in Asian and Latino Men

Unwillingnessto tell

interviewer

“True” depression

Low awarenessof negative

affect

Observed depression

score

Depression measure

not culturallysensitive

= +

+ +

Page 59: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

59

Measuring Depressive Symptoms in Asian and Latino Men

Unwillingnessto tell

interviewer-3

“True” depression

16

Hard to choose onenumber on the 1-6response choices

+2

Observed depression

score13

Measurenot culturally

Sensitive-2

= +

+ +

13 = 16 +2 -3 -2

Page 60: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

60

Return to Components of an Individual’s Observed Item Score

Observed true item score score

= + error

Page 61: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

61

Components of an Individual’s Observed Item Score

Observed true item score score

= + error random

systematic

Page 62: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

62

Sources of Error in Measuring Weight

Weight of clothes– Subject source of random error

Scale is miscalibrated– Instrument source of systematic error

Person weighing child is not precise– Observer source of random error

Page 63: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

63

Sources of Error in Measuring Depression

Hard to choose one number on 1-6 response scale– Subject source of random error

Unwillingness to tell interviewer– Subject source of systematic error (underreporting

true depression) Instrument is not culturally sensitive (missing

some components)– Instrument source of systematic error

Page 64: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

64

Next Week – Week 4

Variability Reliability Interpretability

Page 65: 1 Class 3 Classical Methods of Scale Construction October 13, 2005 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

65

Homework for Week 4

Complete rows 1-12 on the matrix for each measure you want to review– Handout

– On the web site for this class Download matrix and fill it in