Upload
morgan-dean
View
216
Download
0
Embed Size (px)
Citation preview
Comparing r and b
How to Choose, Moving From One to the Other, and Sampling Distributions
Questions
• How is the raw score slope related to the correlation coefficient?
• Describe a concrete situation where two groups have the same correlation between two variables but different slopes.
• Describe a concrete situation where you would prefer r to b.
• Describe the sampling distribution of r. Include bias, sampling variance, skew, sample size, power.
• Draw a picture and describe the sampling distribution of the regression line.
Slope Estimates
• r is b when X and Y are z scores.
• The test for the significance of the difference between groups for r and b means 2 different things.
• Males: r = .30, SX=50, SY=1, b=.006.• Females:r = .60, SX=100, SY=1,
b=.006.• What if r = .60 for both groups? b = ?
b rS
SY X XYY
X.
With correlation, there is only standardized slope. With regression, there is slope, intercept and standard error of prediction.
Choice Between r and b
• Always report correlation matrix with M & SD so people can choose.
• Correlation to show strength of association between vbls or across settings
• Regression for prediction problems• If units have meaning, may want regression
(consider slope and intercept), e.g., SAT = 0; change in GPA, graduation rate, etc.
Sampling Distribution of rSampling distribution depends on N and ρ.
0.80.60.40.20.0-0.2r
80
60
40
20
0
Fre
qu
en
cy
Sampling Distribution of r1000 Samples of N=50, rho = .30
0.80.60.40.20.0-0.2r
Sampling Distribution of r1000 Samples of N=50, rho = .30
rho=.30
0.80.60.40.20.0-0.2r
Sampling Distribution of r1000 Samples of N=50, rho = .30
r=.27
M=.295, slight bias.
Slight negative skew.
Big power problem.
With N=50, critical value of r is .27, so about half of observed rs will not be significant. Power is about .5. This size correlation and sample are common in psych.
1
)1( 222
Nr
Sampling Distribution of r (2)
1.00.90.80.70.60.5r
100
80
60
40
20
0
Fre
quen
cy
Sampling Distribution of r1000 Samples of N = 50, rho = .80
1.00.90.80.70.60.5r
Sampling Distribution of r1000 Samples of N = 50, rho = .80
rho = .80M=.795, slight bias. A little more negative skew.
No power problem.
Uncommon situation unless you are estimating reliability. Correlation and regression demand large samples for significant results unless the effects of the IV are very large. Large effects are not common in most areas of psychology (social science generally).
1.00.80.50.30.0-0.3-0.5-0.8-1.0r
800
600
400
200
0
rho = 0
1.00.80.50.30.0-0.3-0.5-0.8-1.0r
800
600
400
200
0
rho = .5
1.00.80.50.30.0-0.3-0.5-0.8-1.0r
800
600
400
200
0
rho = .1
1.00.80.50.30.0-0.3-0.5-0.8-1.0r
800
600
400
200
0
rho = .6
1.00.80.50.30.0-0.3-0.5-0.8-1.0r
800
600
400
200
0
rho = .2
1.00.80.50.30.0-0.3-0.5-0.8-1.0r
800
600
400
200
0
rho = .7
1.00.80.50.30.0-0.3-0.5-0.8-1.0r
800
600
400
200
0
rho = .3
1.00.80.50.30.0-0.3-0.5-0.8-1.0r
800
600
400
200
0
rho = .8
1.00.80.50.30.0-0.3-0.5-0.8-1.0r
800
600
400
200
0
rho = .4
1.00.80.50.30.0-0.3-0.5-0.8-1.0r
1000
800
600
400
200
0
rho = .9
Empirical Sampling Distributions,rho = .0 to rho = .9
Sampling Distribution of Regression Line
201612840X
20
10
0
-10
-20Y
Sampling Distribution of a Regression LineY = -11+1.6X , N=100
rho=.8
Sigma(Y) = 2, Sigma(X) = 1
Mu(Y) = 5, Mu(X) = 10
20
10
0
-10
-20Y
Sampling Distribution of a Regression LineY = -11+1.6X , N=100
Note fan shape. You will see this in the line’s confidence interval.
The means of X and Y are typically pretty well estimated. The line always goes thru Xbar, Ybar.
A little difference in the slope has little impact on the line close to the mean, but more and more the farther out we get from the mean of X. Note the relation to leverage.
Review
• How is the raw score slope related to the correlation coefficient?
• Describe a concrete situation where two groups have the same correlation between two variables but different slopes.
• Describe a concrete situation where you would prefer r to b.
• Describe the sampling distribution of r. Include bias, sampling variance, skew, sample size, power.
• Draw a picture and describe the sampling distribution of the regression line.