Upload
maximillian-mathews
View
213
Download
0
Embed Size (px)
DESCRIPTION
Objectives JIM 212 After going through this lesson, you should be able to: Draw a scatter plot for a set of ordered pairs Compute the correlation coefficient, r Test the hypothesis: H 0 : ρ = 0 (test the significance of correlation coefficient) 3
Citation preview
Video Conference 1Video Conference 1
AS 2013/2012AS 2013/2012Chapters 10 – Correlation and Regression Chapters 10 – Correlation and Regression
15 December 2013 15 December 2013 10 am – 11 am10 am – 11 am
Puan Hasmawati Binti [email protected]
04-6532285
Chapter 10 OverviewChapter 10 Overview Introduction 10-1 Scatter Plots and Correlation 10-2 Regression 10-3 Coefficient of Determination and
Standard Error of the Estimate 10-4 Multiple Regression (Optional)
2
ObjectivesObjectives
JIM 212
After going through this lesson, you should be able to:
Draw a scatter plot for a set of ordered pairs
Compute the correlation coefficient, r Test the hypothesis: H0: ρ = 0 (test the significance of correlation
coefficient)3
4
ObjectivesObjectives1. Draw a scatter plot for a set of ordered pairs.2. Compute the correlation coefficient.3. Test the hypothesis Ho: ρ = 0.4. Compute the equation of the regression line.5. Compute the standard error of the estimate.6. Find a prediction interval.7. Be familiar with the concept of multiple
regression - determining whether a relationship between two or more numerical or quantitative variables exists.
JIM 2125
Terminology
1. Correlation2. Independent variable3. Dependent variable4. Relationship5. Simple relationship6. Multiple relationship7. Positive relationship8. Negative relationship9. Linear relationship10.Correlation coefficient11.Prediction
JIM 2126
In addition to hypothesis testing and confidence intervals, inferential statistics involves determining whether a relationshiprelationship between two or more numerical or quantitative variables exists.
Introduction
JIM 212
• CorrelationCorrelation is a statistical method used to determine whether a linear relationship between variables exists.
7
Introduction (cont…)
JIM 2128
• The purpose of this chapter is to answer these questions statistically:
1. Are two or more variables related?2. If so, what is the strength of the
relationship?3. What type of relationship exists?4. What kind of predictions can be
made from the relationship?
Introduction (cont…)
JIM 2129
Introduction (cont…)
1. Are two or more variables related?2. If so, what is the strength of the
relationship?
To answer these two questions, statisticians use the correlation coefficientcorrelation coefficient, a numerical measure to determine whether two or more variables are related and to determine the strength of the relationship between or among the variables.
JIM 21210
Introduction (cont…)
3. What type of relationship exists?
There are two types of relationships: simple and multiple.
In a simple relationship, there are two variables: an independent variable independent variable (predictor variable) and a dependent variable dependent variable (response variable).
In a multiple relationship, there are two or more independent variables that are used to predict one dependent variable.
JIM 21211
4. What kind of predictions can be made from the relationship?
Predictions are made in all areas and daily. Examples include weather forecasting, stock market analyses, sales predictions, crop predictions, gasoline price predictions, and sports predictions. Some predictions are more accurate than others, due to the strength of the relationship. That is, the stronger the relationship is between variables, the more accurate the prediction is.
Introduction (cont…)
• Both are STATISTICAL METHODS• CorrelationCorrelation - to determine whether relationship relationship
between variables exists• RegressionRegression - to describe the nature of the relationship nature of the relationship
between variables (+ or -, linear or nonlinear)
Correlation & RegressionCorrelation & Regression
12
13
The purpose of this chapter is to answer these questions statistically:
1. Are two or more variables related?2. If so, what is the strength of the relationship?
3. What type of relationship exists?
4. What kind of predictions can be made from the relationship?
correlation correlation coefficientcoefficient
simple & multiplesimple & multiple
all areas and dailyall areas and daily
JIM 212
• Graph of ordered pairs (x, y) of numbers consisting of the independent variable x independent variable x and the dependent variable ydependent variable y.
• Independent variable? Independent variable? • Dependent variable?Dependent variable?
Scatter PlotsScatter Plots
14
JIM 212
Q1(i) Forest Fires and Acres Burneda) Page 549 Ex. 10 – 1 No. 14
Number of fires vs. number of acres burned15
JIM 21216
CorrelationCorrelation is a statistical method used to determine whether a linear relationship between variables exists.
Correlation
JIM 21217
• The correlation coefficient correlation coefficient computed from the sample data measures the strength and direction of a linear relationship between two variables.
• There are several types of correlation coefficients. The one explained in this section is called the Pearson product moment Pearson product moment correlation coefficient (PPMC)correlation coefficient (PPMC).
• The symbol for the sample correlation sample correlation coefficient is coefficient is rr. The symbol for the population population correlation coefficient is correlation coefficient is ..
Correlation (cont…)
JIM 21218
• The range of the correlation coefficient is from 1 to 1.
• If there is a strong positive linear strong positive linear relationship relationship between the variables, the value of r will be close to 1.
• If there is a strong negative linear strong negative linear relationship relationship between the variables, the value of r will be close to 1.
Correlation (cont…)
JIM 21219
Correlation (cont…)
JIM 212
o Numerical measure to determine whether two or more variables are
linearlylinearly related, ando to determine the strengthstrength of the
relationship between or among the variables.
Correlation Coefficient
20
JIM 212
the strength (strong, weak) and direction (+ , -) of a linearlinear relationship between two variables.
r : sample correlation coefficient : population correlation coefficient Range: -1 ≤ ≤ 1
**Look at page 540 Figure 10-6
Correlation Coefficient (cont…)
21
JIM 21222
2 22 2
n xy x yr
n x x n y y
Formula for Correlation Coefficient
One of the formula for r :
where n is the number of data pairs.
494x 260y 2 31,692x 2 10,596y
17,285 8xy n
2 22 2
n xy x yr
n x x n y y
2 2
8 17,285 494 260
8 31,692 494 8 10,596 260
0.771
1(i) b) Page 549 Ex. 10 – 1 No. 14
JIM 21223
The Significance of the Correlation Coefficient
Use hypothesis-testing procedure, in order to make the decision.
3 ways 1. Traditional method2. P-value method3. Using Table I in Appendix C
JIM 21224
JIM 21225
• In hypothesis testing, one of the following is true:H0: 0 This null hypothesis means that
there is no correlation no correlation between the x and y variables in the population.
H1: 0 This alternative hypothesis means that there is a significant significant
correlation correlation between the variables in the population.
Hypothesis Testing
0
1
H : 0H : 0
2
21nt rr
Decision: Reject the null hypothesis, since the test value falls in the critical region. There is significant linear relationship between the number of forest fires and the number of acres burned.
2
8 20.7711 0.771
2.966
. 2.447c v
1(i) (c, d, e) Page 549 Ex. 10 – 1 No. 14 cont...
JIM 21226
JIM 21227
Now try using the other two procedures.
10.2 Regression10.2 Regression If the value of the correlation coefficient is
significant, the next step is to determine the equation of the regression line regression line which is the data’s line of best fit.
28
RegressionRegression
29
Best fit Best fit means that the sum of the squares of the vertical distance from each point to the line is at a minimum.
Regression LineRegression Line
30
y a bx
2
22
22
where = intercept = the slope of the line.
y x x xya
n x x
n xy x yb
n x x
a yb
31
Q1(ii) Forest Fires and Acres BurnedQ1(ii) Forest Fires and Acres BurnedPage 559 Ex. 10 – 2 No. 14Page 559 Ex. 10 – 2 No. 14
2
22
y x x xya
n x x
2
260 31,692 494 17,285
8 31,692 494
298,8709500
31.46
2 2
494 260 17, 285
31,692 10,596 8
x y xy
x y n
32
22
n xy x yb
n x x
2
8 17,285 494 260
8 31,692 494
98409500
1.036
' 31.46 1.036y x
(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...
33
' 31.46 1.036y x
Number of fires vs. number of acres burned
(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...
' 31.46 1.036y x
' when 60y x
' 31.46 1.036 60y
30.7 acres
(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...
34
' 31.46 1.036y x Regression line:
2 10,596y 260y 17,285 8xy n
10,596 31.46 260 1.036 17,2858 2
2
2est
y a y b xyS
n
12.03
Q1(iii) Q1(iii) Page 574 Ex. 10 – 3 No. 16Page 574 Ex. 10 – 3 No. 16 ((Forest Fires and Acres Burned)Forest Fires and Acres Burned)
35
2
/ 2 22
1' 1est
n x Xy t S
n n x x
494x 2 31,692x
When 60, ' 30.7x y
494 61.758
X 12.03estS
/ 2 2.447t
Q1(iv) Q1(iv) Page 574 Ex. 10 – 3 No. 20Page 574 Ex. 10 – 3 No. 20 ((Forest Fires and Acres Burned)Forest Fires and Acres Burned)
36
2
2
8 60 61.75130.7 2.447 12.03 18 8 31,692 494
30.7 31.259
2
/ 2 22
1' 1est
n x Xy t S
n n x x
0.559 61.959y
(Q1(iv)) (Q1(iv)) Page 574 Ex. 10 – 3 No. 20 cont...Page 574 Ex. 10 – 3 No. 20 cont... ((Forest Fires and Acres Burned)Forest Fires and Acres Burned)
37
JIM 21238
Q2(i) State Debt and Per Capita Taxa) Page 549 Ex. 10 – 1 No. 16
500 700 900 1100 1300 1500 1700 1900500
700
900
1100
1300
1500
1700
1900
x
y
JIM 21239
2(i) b) Page 549 Ex. 10 – 1 No. 16
2 22 2
n xy x yr
n x x n y y
2 2
5 11,247,109 6545 8416
5 9,635,035 6545 5 14,351,678 8416
0.518
2 2
6545 8416 11,247,109
9,635,035 14,351,678
x y xy
x y
JIM 21240
2(i) (c, d, e) Page 549 Ex. 10 – 1 No. 16 cont...
0
1
H : 0H : 0
. . 5 2 3, 0.05, . . 0.878d f c v
Decision: Do not reject. There is nosignificant linear relationship between percapita debt and tax.
0.518r
0.8780.878 0.518
41
Q2(ii) State Debt and Per Capita TaxQ2(ii) State Debt and Per Capita TaxPage 549 Ex. 10 – 2 No. 16Page 549 Ex. 10 – 2 No. 16
From the hypothesis testing done, the null hypothesis is not rejected (r is not significant).
Therefore, there is no significant linear relationship between state debt and per capita tax.
Therefore, no regression should be done.
0.518r
No regression line no prediction??? When r is not significant, ......?........ is the
best predictor of y.
42
Q2(ii) State Debt and Per Capita TaxQ2(ii) State Debt and Per Capita TaxPage 549 Ex. 10 – 2 No. 16 (cont...)Page 549 Ex. 10 – 2 No. 16 (cont...)
Standard Error of the EstimateStandard Error of the Estimate The standard error of estimatestandard error of estimate, denoted
by sest is the standard deviation of the observed y values about the predicted y' values. The formula for the standard error of estimate is:
43
2
2
est
y ys
n
2
2
est
y a y b xys
n
44
Since r is not significant, the standard error should not be calculated.
Q2(iii) Q2(iii) Page 574 Ex. 10 – 3 No. 18Page 574 Ex. 10 – 3 No. 18 ((State Debt and Per Capita Tax)State Debt and Per Capita Tax)
2
/ 2 22
2
/ 2 22
11
1
'
1'
est
esty
n x Xt
n n x x
n x Xt
n n
S
xS
x
y
y
Prediction IntervalPrediction Interval
45
46
Since r is not significant, the prediction interval should not be calculated.
Q1(iv) Q1(iv) Page 574 Ex. 10 – 3 No. 22Page 574 Ex. 10 – 3 No. 22 ((State Debt and Per Capita Tax)State Debt and Per Capita Tax)
47
Multiple RegressionMultiple Regression
In multiple regression, there are several independent variables and one dependent variable, and the equation is
1 1 2 2 k ky a b x b x b x
1 2
where , , , = independent variables. kx x x
48
Assumptions for Multiple RegressionAssumptions for Multiple Regression1. normality assumption – for any specific value of the
independent variable, the values of the y variable are normally distributed.
2. equal-variance assumption - the variances (or standard deviations) for the y variables are the same for each value of the independent variable.
3. linearity assumption - there is a linear relationship between the dependent variable and the independent variables.
4. nonmulticollinearity assumption - the independent variables are not correlated.
5. independence assumption - the values for the y variables are independent.
49
Q3. Special Occasion CakesQ3. Special Occasion Cakes Page 581 Ex. 10 – 4 No. 8Page 581 Ex. 10 – 4 No. 8
1 2 326.279 14.855 3.1035 0.73079y x x x
1
2
3
number of layers desirednumber of servings neededamount of filling mix used
xxx
price of a cakey
26.279 14.855 3 3.1035 48 0.73079 40y
$196.49
50
Thank Thank YouYou
51