Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

ReliabilityREVIEW

• Inferential• Infer sample findings to entire population

• Chi Square (2 nominal variables)

• t-test (1 nominal variable for 2 groups, 1 continuous)

• ANOVA (1 nominal variable for 3+ groups, 1 continuous)

• Variance

• Standard deviation

• Correlation• Are two variables related?

• What happens to Y when X changes?

• Linear relationship between two variables• Quantifies the RELIABILITY & VALIDITY of a test or

measurement

1

2

2

n

MXS

2SS

• Reliability (0-1; .80+ goal)• All scores: observed = true + error

• rxx=S2t/S2o • proportion of observed score variance that is true score

variance

• Interclass reliability coefficients (correlates 2 trials)

• Test/retest time, fatigue, practice effect• Equivalent reduces test length by 50%• Split-halves

• Index of Reliability• Tells you what?• Related to C of D how? 'xxr

Standard Error of MeasurementRELIABILITY MEASURE

Reflects the degree to which a person'sobserved score fluctuates as a result

of measurement errors

'1 xxrSSEM S=standard deviation of the test

rxx’=reliability of the test

EXAMPLE:Test standard deviation=100 r=.84

SEM =

=100(.16)=100(.4)=40

84.1100

SEM is the standard deviation of the measurement errors around an observed score

EXAMPLE:Test score=500 SEM=40

68% of all scores should fall between 460-540 (500+40)

95% of all scores range between: ? 420-580

Factors Affecting Test Reliability

1) Fatigue ↓2) Practice ↑3) Subject variability homogeneous ↓, heterogeneous ↑4) Time between testing more time= ↓5) Circumstances surrounding the testing periods change=↓6) Test difficulty too hard/easy= ↓7) Precision of measurement precise= ↑8) Environmental conditions change=↓

SO WHAT? A test must first be reliable to be valid

Validity Types THIS SLIDE IS HUGE!!!!

• Content-Related Validity (a.k.a., face validity)

• Should represent knowledge to be learned• Criterion for content validity rests w/ interpreter• Use “experts” to establish

• Criterion-Related Validity• Test has a statistical relationship w/ trait measured• Alternative measures validated w/ criterion measure

• Concurrent: criterion/alternate measured same time• Predictive: criterion measured in future

• Construct-Related Validity• Validates theoretical measures that are unobservable

Standard Error of Estimate(reflects accuracy of estimating a score on the criterion measure)

VALIDITY MEASURE

Standard Error

Standard Error of Prediction

xyrSSEE 21

Standard Errors

'1 xxrSSEM

SE of Measurement

xyrSSEE 21

SE of Estimate

Methods of Obtaining a Criterion Measure

• Actual participation• Play the game over multiple trials

• Perform the criterion• known valid criterion (e.g., treadmill performance)

• Expert judges

• Tournament participation• Round robin (to identify best player/team)

• Known valid test (may be too long/time consuming)

Interpreting the “r” you obtainTHIS IS HUGE!!!!

Table 6-8Correlation Matrix for Development of a Golf Skill Test

(From Green et al., 1987)

Playing golf

Long putt Chip shot Pitch shot Middle distance

shot

Drive

Playing golf

1.00

Long putt .59 1.00

Chip shot .58 .47 1.00

Pitch shot .54 .37 .35 1.00

Middle distance

shot

.66 .55 .61 .40 1.00

Drive -.65 -.62 -.48 -.52 -.79 1.00

What are these?

ConcurrentValidity coefficients

Interpret these correlations

Actual golf score

Putting

Trial 1

Putting

Trial 2

Driving

Trial 1

Driving

Trial 2

Observer1

Observer 2

Actual golf score

1.00

Putting T1 .78 1.00

Putting T2 .74 .83 1.00

Driving T1 .58 .21 .25 1.00

Driving T2 .68 .25 .30 .70 1.00

Observer 1 .48 .34 .40 .43 .38 1.00

Observer 2 .39 .30 .41 .47 .35 .50 1.00

What are these?

ConcurrentValidity coefficients

Criterion


Actual golf score

Putting

Trial 1

Putting

Trial 2

Driving

Trial 1

Driving

Trial 2

Observer1

Observer 2

Actual golf score

1.00

Putting T1 .78 1.00

Putting T2 .74 .83 1.00

Driving T1 .58 .21 .25 1.00

Driving T2 .68 .25 .30 .70 1.00

Observer 1 .48 .34 .40 .43 .38 1.00

Observer 2 .39 .30 .41 .47 .35 .50 1.00

What are these?

Reliabilitycoefficients


Actual golf score

Putting

Trial 1

Putting

Trial 2

Driving

Trial 1

Driving

Trial 2

Observer1

Observer 2

Actual golf score

1.00

Putting T1 .78 1.00

Putting T2 .74 .83 1.00

Driving T1 .58 .21 .25 1.00

Driving T2 .68 .25 .30 .70 1.00

Observer 1 .48 .34 .40 .43 .38 1.00

Observer 2 .39 .30 .41 .47 .35 .50 1.00

What is this?

Objectivitycoefficient

Concurrent Validity

This square represents variance in performance in a skill (e.g., golf)

Concurrent Validity

The different colors and patternsrepresent different parts of a skills

test battery to measure the criterion (e.g., golf)

Concurrent Validity

The orange color represents ERROR orunexplained variance in the criterion (e.g., golf)

Error

Concurrent Validity

A C DB

Consider the Concurrent validity ofthe above 4 possible skills test batteries

Concurrent Validity

A C DB

Which test battery would you be LEASTlikely to use? Why?

D – it has the MOST errorand requires 4 tests to

be administered

Concurrent Validity

A C DB

Which test battery would yoube MOST likely to use? Why?

C – it has the LEAST errorbut it requires 3 tests to

be administered

Concurrent Validity

A C DB

Which test battery would youuse if you are limited in time?

A or B – requires 1 or 2 tests tobe administered but you

lose some validity

Documents

Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)