Upload
drusilla-goodman
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
ReliabilityREVIEW
• Inferential• Infer sample findings to entire population
• Chi Square (2 nominal variables)
• t-test (1 nominal variable for 2 groups, 1 continuous)
• ANOVA (1 nominal variable for 3+ groups, 1 continuous)
• Variance
• Standard deviation
• Correlation• Are two variables related?
• What happens to Y when X changes?
• Linear relationship between two variables• Quantifies the RELIABILITY & VALIDITY of a test or
measurement
1
2
2
n
MXS
2SS
• Reliability (0-1; .80+ goal)• All scores: observed = true + error
• rxx=S2t/S2o • proportion of observed score variance that is true score
variance
• Interclass reliability coefficients (correlates 2 trials)
• Test/retest time, fatigue, practice effect• Equivalent reduces test length by 50%• Split-halves
• Index of Reliability• Tells you what?• Related to C of D how? 'xxr
Standard Error of MeasurementRELIABILITY MEASURE
Reflects the degree to which a person'sobserved score fluctuates as a result
of measurement errors
'1 xxrSSEM S=standard deviation of the test
rxx’=reliability of the test
EXAMPLE:Test standard deviation=100 r=.84
SEM =
=100(.16)=100(.4)=40
84.1100
SEM is the standard deviation of the measurement errors around an observed score
EXAMPLE:Test score=500 SEM=40
68% of all scores should fall between 460-540 (500+40)
95% of all scores range between: ? 420-580
Factors Affecting Test Reliability
1) Fatigue ↓2) Practice ↑3) Subject variability homogeneous ↓, heterogeneous ↑4) Time between testing more time= ↓5) Circumstances surrounding the testing periods change=↓6) Test difficulty too hard/easy= ↓7) Precision of measurement precise= ↑8) Environmental conditions change=↓
SO WHAT? A test must first be reliable to be valid
Validity Types THIS SLIDE IS HUGE!!!!
• Content-Related Validity (a.k.a., face validity)
• Should represent knowledge to be learned• Criterion for content validity rests w/ interpreter• Use “experts” to establish
• Criterion-Related Validity• Test has a statistical relationship w/ trait measured• Alternative measures validated w/ criterion measure
• Concurrent: criterion/alternate measured same time• Predictive: criterion measured in future
• Construct-Related Validity• Validates theoretical measures that are unobservable
Standard Error of Estimate(reflects accuracy of estimating a score on the criterion measure)
VALIDITY MEASURE
Standard Error
Standard Error of Prediction
xyrSSEE 21
Standard Errors
'1 xxrSSEM
SE of Measurement
xyrSSEE 21
SE of Estimate
Methods of Obtaining a Criterion Measure
• Actual participation• Play the game over multiple trials
• Perform the criterion• known valid criterion (e.g., treadmill performance)
• Expert judges
• Tournament participation• Round robin (to identify best player/team)
• Known valid test (may be too long/time consuming)
Interpreting the “r” you obtainTHIS IS HUGE!!!!
Table 6-8Correlation Matrix for Development of a Golf Skill Test
(From Green et al., 1987)
Playing golf
Long putt Chip shot Pitch shot Middle distance
shot
Drive
Playing golf
1.00
Long putt .59 1.00
Chip shot .58 .47 1.00
Pitch shot .54 .37 .35 1.00
Middle distance
shot
.66 .55 .61 .40 1.00
Drive -.65 -.62 -.48 -.52 -.79 1.00
What are these?
ConcurrentValidity coefficients
Interpret these correlations
Actual golf score
Putting
Trial 1
Putting
Trial 2
Driving
Trial 1
Driving
Trial 2
Observer1
Observer 2
Actual golf score
1.00
Putting T1 .78 1.00
Putting T2 .74 .83 1.00
Driving T1 .58 .21 .25 1.00
Driving T2 .68 .25 .30 .70 1.00
Observer 1 .48 .34 .40 .43 .38 1.00
Observer 2 .39 .30 .41 .47 .35 .50 1.00
What are these?
ConcurrentValidity coefficients
Criterion
Interpret these correlations
Actual golf score
Putting
Trial 1
Putting
Trial 2
Driving
Trial 1
Driving
Trial 2
Observer1
Observer 2
Actual golf score
1.00
Putting T1 .78 1.00
Putting T2 .74 .83 1.00
Driving T1 .58 .21 .25 1.00
Driving T2 .68 .25 .30 .70 1.00
Observer 1 .48 .34 .40 .43 .38 1.00
Observer 2 .39 .30 .41 .47 .35 .50 1.00
What are these?
Reliabilitycoefficients
Interpret these correlations
Actual golf score
Putting
Trial 1
Putting
Trial 2
Driving
Trial 1
Driving
Trial 2
Observer1
Observer 2
Actual golf score
1.00
Putting T1 .78 1.00
Putting T2 .74 .83 1.00
Driving T1 .58 .21 .25 1.00
Driving T2 .68 .25 .30 .70 1.00
Observer 1 .48 .34 .40 .43 .38 1.00
Observer 2 .39 .30 .41 .47 .35 .50 1.00
What is this?
Objectivitycoefficient
Concurrent Validity
This square represents variance in performance in a skill (e.g., golf)
Concurrent Validity
The different colors and patternsrepresent different parts of a skills
test battery to measure the criterion (e.g., golf)
Concurrent Validity
The orange color represents ERROR orunexplained variance in the criterion (e.g., golf)
Error
Concurrent Validity
A C DB
Consider the Concurrent validity ofthe above 4 possible skills test batteries
Concurrent Validity
A C DB
Which test battery would you be LEASTlikely to use? Why?
D – it has the MOST errorand requires 4 tests to
be administered
Concurrent Validity
A C DB
Which test battery would yoube MOST likely to use? Why?
C – it has the LEAST errorbut it requires 3 tests to
be administered
Concurrent Validity
A C DB
Which test battery would youuse if you are limited in time?
A or B – requires 1 or 2 tests tobe administered but you
lose some validity