23
Reliability REVIEW • Inferential • Infer sample findings to entire population • Chi Square (2 nominal variables) • t-test (1 nominal variable for 2 groups, 1 continuous) • ANOVA (1 nominal variable for 3 + groups, 1 continuous)

Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Embed Size (px)

Citation preview

Page 1: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

ReliabilityREVIEW

• Inferential• Infer sample findings to entire population

• Chi Square (2 nominal variables)

• t-test (1 nominal variable for 2 groups, 1 continuous)

• ANOVA (1 nominal variable for 3+ groups, 1 continuous)

Page 2: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

• Variance

• Standard deviation

• Correlation• Are two variables related?

• What happens to Y when X changes?

• Linear relationship between two variables• Quantifies the RELIABILITY & VALIDITY of a test or

measurement

1

2

2

n

MXS

2SS

Page 3: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

• Reliability (0-1; .80+ goal)• All scores: observed = true + error

• rxx=S2t/S2o • proportion of observed score variance that is true score

variance

• Interclass reliability coefficients (correlates 2 trials)

• Test/retest time, fatigue, practice effect• Equivalent reduces test length by 50%• Split-halves

• Index of Reliability• Tells you what?• Related to C of D how? 'xxr

Page 4: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Standard Error of MeasurementRELIABILITY MEASURE

Reflects the degree to which a person'sobserved score fluctuates as a result

of measurement errors

'1 xxrSSEM S=standard deviation of the test

rxx’=reliability of the test

Page 5: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

EXAMPLE:Test standard deviation=100 r=.84

SEM =

=100(.16)=100(.4)=40

84.1100

Page 6: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

SEM is the standard deviation of the measurement errors around an observed score

EXAMPLE:Test score=500 SEM=40

68% of all scores should fall between 460-540 (500+40)

95% of all scores range between: ? 420-580

Page 7: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Factors Affecting Test Reliability

1) Fatigue ↓2) Practice ↑3) Subject variability homogeneous ↓, heterogeneous ↑4) Time between testing more time= ↓5) Circumstances surrounding the testing periods change=↓6) Test difficulty too hard/easy= ↓7) Precision of measurement precise= ↑8) Environmental conditions change=↓

SO WHAT? A test must first be reliable to be valid

Page 8: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Validity Types THIS SLIDE IS HUGE!!!!

• Content-Related Validity (a.k.a., face validity)

• Should represent knowledge to be learned• Criterion for content validity rests w/ interpreter• Use “experts” to establish

• Criterion-Related Validity• Test has a statistical relationship w/ trait measured• Alternative measures validated w/ criterion measure

• Concurrent: criterion/alternate measured same time• Predictive: criterion measured in future

• Construct-Related Validity• Validates theoretical measures that are unobservable

Page 9: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Standard Error of Estimate(reflects accuracy of estimating a score on the criterion measure)

VALIDITY MEASURE

Standard Error

Standard Error of Prediction

xyrSSEE 21

Page 10: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Standard Errors

'1 xxrSSEM

SE of Measurement

xyrSSEE 21

SE of Estimate

Page 11: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Methods of Obtaining a Criterion Measure

• Actual participation• Play the game over multiple trials

• Perform the criterion• known valid criterion (e.g., treadmill performance)

• Expert judges

• Tournament participation• Round robin (to identify best player/team)

• Known valid test (may be too long/time consuming)

Page 12: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Interpreting the “r” you obtainTHIS IS HUGE!!!!

Page 13: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Table 6-8Correlation Matrix for Development of a Golf Skill Test

(From Green et al., 1987)

Playing golf

Long putt Chip shot Pitch shot Middle distance

shot

Drive

Playing golf

1.00

Long putt .59 1.00

Chip shot .58 .47 1.00

Pitch shot .54 .37 .35 1.00

Middle distance

shot

.66 .55 .61 .40 1.00

Drive -.65 -.62 -.48 -.52 -.79 1.00

What are these?

ConcurrentValidity coefficients

Page 14: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Interpret these correlations

Actual golf score

Putting

Trial 1

Putting

Trial 2

Driving

Trial 1

Driving

Trial 2

Observer1

Observer 2

Actual golf score

1.00

Putting T1 .78 1.00

Putting T2 .74 .83 1.00

Driving T1 .58 .21 .25 1.00

Driving T2 .68 .25 .30 .70 1.00

Observer 1 .48 .34 .40 .43 .38 1.00

Observer 2 .39 .30 .41 .47 .35 .50 1.00

What are these?

ConcurrentValidity coefficients

Criterion

Page 15: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Interpret these correlations

Actual golf score

Putting

Trial 1

Putting

Trial 2

Driving

Trial 1

Driving

Trial 2

Observer1

Observer 2

Actual golf score

1.00

Putting T1 .78 1.00

Putting T2 .74 .83 1.00

Driving T1 .58 .21 .25 1.00

Driving T2 .68 .25 .30 .70 1.00

Observer 1 .48 .34 .40 .43 .38 1.00

Observer 2 .39 .30 .41 .47 .35 .50 1.00

What are these?

Reliabilitycoefficients

Page 16: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Interpret these correlations

Actual golf score

Putting

Trial 1

Putting

Trial 2

Driving

Trial 1

Driving

Trial 2

Observer1

Observer 2

Actual golf score

1.00

Putting T1 .78 1.00

Putting T2 .74 .83 1.00

Driving T1 .58 .21 .25 1.00

Driving T2 .68 .25 .30 .70 1.00

Observer 1 .48 .34 .40 .43 .38 1.00

Observer 2 .39 .30 .41 .47 .35 .50 1.00

What is this?

Objectivitycoefficient

Page 17: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Concurrent Validity

This square represents variance in performance in a skill (e.g., golf)

Page 18: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Concurrent Validity

The different colors and patternsrepresent different parts of a skills

test battery to measure the criterion (e.g., golf)

Page 19: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Concurrent Validity

The orange color represents ERROR orunexplained variance in the criterion (e.g., golf)

Error

Page 20: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Concurrent Validity

A C DB

Consider the Concurrent validity ofthe above 4 possible skills test batteries

Page 21: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Concurrent Validity

A C DB

Which test battery would you be LEASTlikely to use? Why?

D – it has the MOST errorand requires 4 tests to

be administered

Page 22: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Concurrent Validity

A C DB

Which test battery would yoube MOST likely to use? Why?

C – it has the LEAST errorbut it requires 3 tests to

be administered

Page 23: Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Concurrent Validity

A C DB

Which test battery would youuse if you are limited in time?

A or B – requires 1 or 2 tests tobe administered but you

lose some validity