Upload
duongkhanh
View
218
Download
4
Embed Size (px)
Citation preview
Assignment #8
Chapter 14: 26 Chapter 15: 18, 27 Due next Friday Nov. 27th by 2pm in your TA’s homework box
Assignment #9
Chapter 16: 20 Chapter 17: 33 Not Due! Just for practice. Answers will be posted on Friday Dec. 4th
Reading
For Today: Chapter 17 For Thursday: Chapter 17
Lab Report • Posted on web-site • Dates
– Rough draft due to TAs homework box Monday Nov. 16th – Rough draft returned in your registered lab section this week – Final draft due at start of your registered lab section next week
à MUST HAND IN ROUGH DRAFT WITH FINAL DRAFT (penalty -10 points)
• 10% of course grade – Rough Draft - 5% – Final draft - 5% – If you’re happy with your rough draft mark, you can tell your TA to use it for
the final draft à YOU MUST TELL YOUR TA
• Read the “Writing a Lab Report” section of your lab notebook for guidance!!
Lab Report • Posted on web-site • Dates
– Rough draft due to TAs homework box Monday Nov. 16th – Rough draft returned in your registered lab section this week – Final draft due at start of your registered lab section next week
à MUST HAND IN ROUGH DRAFT WITH FINAL DRAFT (penalty -10 points)
• 10% of course grade – Rough Draft - 5% – Final draft - 5% – If you’re happy with your rough draft mark, you can tell your TA to use it for
the final draft à YOU MUST TELL YOUR TA
• Read the “Writing a Lab Report” section of your lab notebook for guidance!!
Lab Report • Posted on web-site • Dates
– Rough draft due to TAs homework box Monday Nov. 16th – Rough draft returned in your registered lab section this week – Final draft due at start of your registered lab section next week
à MUST HAND IN ROUGH DRAFT WITH FINAL DRAFT (penalty -10 points)
• 10% of course grade – Rough Draft - 5% – Final draft - 5% – If you’re happy with your rough draft mark, you can tell your TA to use it for
the final draft à YOU MUST TELL YOUR TA
• Read the “Writing a Lab Report” section of your lab notebook for guidance!!
Chapter 16 Review
Correlation: r
• r is called the “correlation coefficient”
• Describes the relationship between two numerical variables
• Parameter: ρ (rho) Estimate: r
• -1 < ρ < 1 -1 < r < 1
Estimating the correlation coefficient
€
r =
Xi − X ( )∑ Yi − Y ( )
Xi − X ( )2∑ Yi − Y ( )2∑
“Sum of products”
“Sum of squares”
Standard error of r
€
SEr =1− r 2
n − 2
If ρ = 0,...
€
t =rSEr
r is normally distributed with mean 0
Therefore, we test a null hypothesis of no correlation using:
with df = n -2
Hypotheses
H0: X and Y are not correlated (ρ = 0). HA: X and Y are correlated (ρ ≠ 0).
Correlation assumes...
• Random sample
• X is normally distributed with equal variance for all values of Y
• Y is normally distributed with equal variance for all values of X
Bivariate Normal Distribution
• The relationship between X and Y is linear
• The cloud of points in a scatter plot of X and Y has a circular or elliptical shape • The frequency distribution of X and Y separately are normal
Most Frequent departures from bivariate normal distribution
Chapter 16 Continued: Correlation between numerical variables
Spearman's rank correlation
• An alternative to correlation that does not make so many assumptions
Example: Spearman's rs VERSIONS: 1. Boy climbs up rope, climbs down again 2. Boy climbs up rope, seems to vanish, re-appears at top, climbs down again 3. Boy climbs up rope, seems to vanish at top 4. Boy climbs up rope, vanishes at top, reappears somewhere the audience was not looking 5. Boy climbs up rope, vanishes at top, reappears in a place which has been in full view
Example: Spearman's rs
Hypotheses H0: The difficulty of the described trick is not correlated with the time elapsed since it was observed. HA: The difficulty of the described trick is correlated with the time elapsed since it was observed.
Years Elapsed Rank Years Impressiveness Score Rank Impressiveness
2 1 1 2
5 3.5 1 2
5 3.5 1 2
4 2 2 5
17 5.5 2 5
17 5.5 2 5
31 13 3 7
20 7 4 12.5
22 8 4 12.5
25 9 4 12.5
28 10.5 4 12.5
29 12 4 12.5
34 14.5 4 12.5
43 17 4 12.5
44 18 4 12.5
46 19 4 12.5
34 14.5 4 12.5
28 10.5 5 19.5
39 16 5 19.5
50 20.5 5 19.5
50 20.5 5 19.5
Finding rs
Ri − R( ) Si − S( )i=1
n
∑ = RiSi∑#
$%
&
'(−
Ri Si∑∑n
= 566
Ri − R( )2
i=1
n
∑ = Ri2( )∑ −
Ri∑#
$%
&
'(
2
n= 767.5
Si − S( )2
i=1
n
∑ = Si2( )∑ −
Si∑#
$%
&
'(
2
n= 678.5
rS =566
767.5( ) 678.5( )= 0.784
rS(0.05,21)=0.434 rS(0.01,21)=0.550 Since rS=0.784 is greater than 0.550, P<0.01 We reject the null hypothesis There is a positive correlation between the impressiveness score and number of years elapsed
Spearman’s rank correlation for n >100
SE[rS ]=1− rS
2
n− 2
t = rSSE[rS ]
df = n− 2
Attenuation: The estimated correlation will be lower
if X or Y are estimated with error
Real correlation
Y estimated with measurement
error
X and Y estimated with measurement
error
Correlation depends on range
Chapter 17: Regression
Regression
• Predicts Y from X
• Linear regression assumes that the relationship between X and Y can be described by a line
Correlation vs. regression
Regression assumes... • Random sample
• Y is normally distributed with equal variance for all values of X
The least squares regression line is the line for which the sum of all the squared
deviations in Y is smallest
The parameters of linear regression
Y = α + β X
Intercept Slope
Positive β
Negative β
β = 0
Higher α
Lower α
Estimating a regression line
Y = a + b X
Nomenclature
Residual:
€
Yi − ˆ Y i
Predicted Value:
Yi
Data Point:
Xi,Yi
Finding the "least squares" regression line
€
SSresidual = Yi − ˆ Y i( )2
i =1
n
∑Minimize:
Best estimate of the slope
€
b =
Xi − X ( ) Yi − Y ( )i =1
n
∑
Xi − X ( )2i =1
n
∑
(= "Sum of products" over "Sum of squares of X")
Remember the shortcuts:
€
Xi − X ( ) Yi − Y ( )i =1
n
∑ = XiYi∑$
% & &
'
( ) ) −
Xi Yi∑∑
n
Xi − X ( )2i =1
n
∑ = Xi2( )∑ −
Xi∑$
% & &
'
( ) )
2
n
Finding a
€
Y = a + bX So..
€
a = Y − bX
Example: Predicting age based on radioactivity in teeth
Many above ground nuclear bomb tests in the ‘50s and ‘60s may have left a radioactive signal in developing teeth. Is it possible to predict a person’s age based on dental 14C?
Data from 1965 to present from Spalding et al. 2005. Forensics: age written in teeth by nuclear tests. Nature 437: 333–334.
Teeth data:
Δ14C Date of Birth
622 1963.5
262 1971.7
471 1963.7
112 1990.5
285 1975
439 1970.2
363 1972.6
391 1971.8
Δ14C Date of Birth
89 1985.5
109 1983.5
91 1990.5
127 1987.5
99 1990.5
110 1984.5
123 1983.5
105 1989.5
Teeth data:
X = 3798, Y∑∑ = 31674
X 2 =1340776, XY( )∑∑ = 7495223
Y 2∑ = 62704042
n =16
X = 237.375 Y =1979.63
Let X be the Δ14C, and Y be the year of birth.
Xi − X( ) Yi −Y( )i=1
n
∑ = XiYi∑#
$%
&
'(−
Xi Yi∑∑n
= 7495223−3798( ) 31674( )
16= −23393
Xi − X( )2
i=1
n
∑ = Xi2( )∑ −
Xi∑#
$%
&
'(
2
n
=1340776−3798( )2
16= 439226
b = −23393439226
= −0.053
Calculating a
a =Y − bX=1979.63− −0.053( )237.375=1992.2
Predicted Values The predicted value of Y from a regression line (Y hat)
estimates the the mean value of Y for all individuals having a given value of X
YX1
YX2
YX3
YX4
Y =1992.2− 0.053X
Predicting Y from X
Y =1992.2− 0.053X=1992.2− 0.053 200( )=1981.6
If a cadaver has a tooth with Δ14C content equal to 200, what does the regression line predict its year of birth to be?
Testing hypotheses about regression
H0: β = 0 HA: β ≠ 0
b has a t distribution
Confidence interval for a slope:
€
b ± tα[2],df SEb
Hypothesis tests can use t:
€
t =b − β0SEb
Standard error of a slope
SEb =MSresidual
Xi − X( )2∑
MSresidual = SSresidual / dfresidual
Sums of squares for regression
€
SSTotal = Yi2∑ −
Yi∑$
% &
'
( )
2
n
SSregression = b Xi − X ( )∑ Yi −Y ( )
SSresidual + SSregression = SSTotal
With n - 2 degrees of freedom for the residual
Radioactive teeth: Sums of squares
SSTotal = Yi2∑ −
Yi∑#
$%
&
'(
2
n
= 62704042−31674( )2
16=1339.75
SSregression = b Xi − X( )∑ Yi −Y( )
= −0.053( ) −23393( ) =1239.8
Teeth: Sums of squares
SSresidual = SSTotal − SSregression =1339.75−1239.8 = 99.9dfresidual =16− 2 =14
Calculating residual mean squares
MSresidual = SSresidual / dfresidual
MSresidual =99.914
= 7.1
Standard error of a slope
SEb =MSresidual
Xi − X( )2∑
= 7.1439226
= 0.004
b has a t distribution
Confidence interval for a slope:
€
b ± tα[2],df SEb
Hypothesis tests can use t:
€
t =b − β0SEb
Example: 95% confidence interval for slope with teeth
example
b± tα[2],df SEb = b± t0.05[2],14SEb
= −0.053± 2.14 0.004( )= −0.053± 0.0018
Confidence bands: confidence intervals for predictions of
mean Y for given X
Prediction intervals: confidence intervals for predictions of
individual Y for given X
Hypothesis tests on slopes
H0: β = 0 HA: β ≠ 0
€
t =b − β0SEb
t = −0.053− 00.004
=13.25
t0.0001(2),14= ±5.36
So we can reject H0, P<0.0001
r2 predicts the amount of variance in Y explained by the
regression line
r2 is the “coefficient of determination: it is the square of the correlation coefficient r