Upload
ann-hopkins
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Anthony Greene 1
Correlation
The Association Between Variables
Anthony Greene 2
When to Use
t-test or ANOVA
When the independent variable is Categorical
0
5
10
15
20
25
30
Control Treat A0
102030405060708090
100
Control Treat A Treat B Treat C
MaleFemale
Anthony Greene 3
When to UseCorrelation
& Regression
When the independent variable is Ratio or Interval
Tour De France Gatorade Consumption
0
2
4
6
8
10
12
14
16
60 70 80 90 100 110 120Average Daytime Temperature
Lit
ers
of G
ator
ade
Anthony Greene 4
ScatterPlots
Anthony Greene 5
ScatterPlots
Times and costs for five word-processing jobs
0
50
100
150
200
250
300
350
400
450
500
0 5 10 15 20 25
Anthony Greene 7
Four data points
0
1
2
3
4
5
6
7
0 1 2 3 4 5
Anthony Greene 8
Direct Relationship: Positive Slope
Anthony Greene 9
Age and price data for a sample of 11 used cars
Anthony Greene 10
Scatter diagram for the age and price data of used cars
Anthony Greene 11
Inverse Relationship: Negative Slope
Anthony Greene 12
Various degrees of linear correlation (Slide 1 of 3)
Anthony Greene 13
Various degrees of linear correlation (Slide 3 of 3)
Anthony Greene 14
Various degrees of linear correlation (Slide 2 of 3)
Anthony Greene 15
Examples of positive and negative relationships
16
The Simple Idea
• If the corresponding x and y z-scores are always in agreement, r will be high.
• If they are sometimes in agreement r will be moderate
• If they are generally different, r will be near zero• If they are in agreement, but in opposite
directions, r will be negative
n
zzr yx
Anthony Greene 17
The Basic Idea n
zzr yx
Zx Zy Zx Zy Zx Zy
+1.2 +1.6 +1.2 -2.3 +1.2 -1.6
-1.1 -0.7 -1.1 -0.2 -1.1 +0.7
+0.8 +1.1 +0.8 +1.6 +0.8 -1.1
+3.2 +2.8 +3.2 -0.7 +3.2 -2.8
-2.7 -2.3 -2.7 +1.1 -2.7 +2.3
+0.1 -0.2 +0.1 +2.8 +0.1 +0.2
Anthony Greene 18
We define SSx, SSp and SSy by
Notation Used in Regression and Correlation
nyySS
nyxxyS
nxxSS
MySS
MyMxS
MxSS
y
p
x
yy
yxp
xx
22
22
2
2
formula nalcomputatio Or the
Anthony Greene 19
Obtaining the three sums of squares for the used car data using the computational formulas
Anthony Greene 20
The linear correlation coefficient, r, of n data points is defined by
or by the computational formula
Linear Correlation Coefficient
yx
p
SSSS
Sr
n
zzr yx
Anthony Greene 21
Linear Correlation Coefficient
n
zzr yx
22yx
yx
MyMx
MyMxr
n
My
n
Mxn
MyMxr
yx
yx
22
yx
p
SSSS
Sr
Anthony Greene 22
Coefficient of Determination
The coefficient of determination, r2, is the proportion of variation in the observed values of the response variable that is explained by the regression:
The coefficient of the determination always lies between 0 and 1 and is a descriptive measure of of the utility of the regression equation for making predictions. Values of r2 near 0 indicate that the regression equation is not useful for making predictions, whereas values near 1 indicate that the regression equation is extremely useful for making predictions.
varianceexplained of proportion variancetotal
varianceexplained2 r
Anthony Greene 23
t-Distribution for a Correlation Test
t r
1 r2
n 2
For samples of size n, the variable
has the t-distribution with df = n – 2 if the null hypothesis ρ = 0 ρ or rho is pronounced “row”
Anthony Greene 24
The t-test for correlation (Slide 1 of 3)
With df = n-2 use table B.6
Anthony Greene 25
The t-test for correlation (Slide 2 of 3)
Anthony Greene 26
The t-test for correlation (Slide 3 of 3) Step 4 Compute the test statistic r. Table B.6 allows a
direct lookup. Alternatively, r has a t distribution and Table B.2 will yield an identical conclusion
Step 5 If the value of the test statistic falls in the rejection region, reject the null hypothesis.
Step 6 State the conclusion in words
21 2
nr
rt
Anthony Greene 27
Criterion for deciding whether or not to reject the null hypothesis
Anthony Greene 28
Correlation Matrix
a b c d
a 1.00
b 0.84 1.00
c 0.68 0.58 1.00
d 0.12 0.19 0.08 1.00
Anthony Greene 29
Computer printouts for correlations