45
Chapter 5 : Introductory Linear Regression

Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

Embed Size (px)

Citation preview

Page 1: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

Chapter 5 :

Introductory Linear

Regression

Page 2: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

INTRODUCTION TO LINEAR

REGRESSION • Regression – is a statistical procedure for establishing the relationship

between 2 or more variables.

• This is done by fitting a linear equation to the observed data.

• The regression line is used by the researcher to see the trend and make

prediction of values for the data.

• There are 2 types of relationship:

– Simple ( 2 variables)

– Multiple (more than 2 variables)

Page 3: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

Many problems in science and engineering involve exploring the

relationship between two or more variables.

Two statistical techniques:

(1) Regression Analysis

(2) Computing the Correlation Coefficient (r).

Linear regression - study on the linear relationship between two

or more variables.

This is done by formulate a linear equation to the observed data.

The linear equation is then used to predict values for the data.

Page 4: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

In simple linear regression only two variables are involved:

i. X is the independent variable.

ii. Y is dependent variable.

The correlation coefficient (r) tells us how strongly two

variables are related.

Page 5: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

Example 5.1:

1) A nutritionist studying weight loss programs might wants to find out

if reducing intake of carbohydrate can help a person reduce weight.

a) X is the carbohydrate intake (independent variable).

b) Y is the weight (dependent variable).

2) An entrepreneur might want to know whether increasing the cost of

packaging his new product will have an effect on the sales volume.

a) X is cost

b) Y is sales volume

Page 6: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

SCATTER DIAGRAM

• A scatter plot is a graph or ordered pairs (x,y).

• The purpose of scatter plot – to describe the nature of the

relationships between independent variable, X and

dependent variable, Y in visual way.

• The independent variable, x is plotted on the horizontal axis

and the dependent variable, y is plotted on the vertical axis.

Page 7: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

Positive Linear Relationship

E(y)

x

Slope b1

is positive

Regression line

Intercept b0

SCATTER DIAGRAM

Page 8: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

Negative Linear Relationship

E(y)

x

Slope b1

is negative

Regression line Intercept b0

SCATTER DIAGRAM

Page 9: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

No Relationship

E(y)

x

Slope b1

is 0

Regression line Intercept b0

SCATTER DIAGRAM

Page 10: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

GRAPHICAL METHOD FOR DETERMINING

REGRESSION

• A linear regression can be develop by freehand plot of the

data.

Example 10.2:

The given table contains values for 2 variables, X and Y. Plot

the given data and make a freehand estimated regression line.

X -3 -2 -1 0 1 2 3

Y 1 2 3 5 8 11 12

Page 11: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis
Page 12: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

5.1 SIMPLE LINEAR REGRESSION

MODEL

Linear regression model is a model that expresses the

linear relationship between two variables.

The simple linear regression model is written as:

where ;

0

1

ˆ = intercept of the line with the Y-axis

ˆ slope of the line

b

b

0 1ˆ ˆY Xb b

Page 13: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

The Least Square method is the method most commonly

used for estimating the regression coefficients

The straight line fitted to the data set is the line:

where is the estimated value of y for a given value

of X.

5.2 INFERENCES ABOUT ESTIMATED

PARAMETERS

Y

0 1 and b b

0 1ˆ ˆY Xb b

LEAST SQUARES METHOD

Page 14: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

i) y-Intercept for the Estimated Regression

Equation,

0 1ˆ ˆy xb b

and are the mean of and respectivelyx y x y

0b

Page 15: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

ii) Slope for the Estimated Regression Equation,

1 1

1

2

12

1

2

12

1

n n

i ini i

xy i i

i

n

ini

yy i

i

n

ini

xx i

i

x y

S x yn

y

S yn

x

S xn

1

xy

xx

S

Sb

Page 16: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

Math 1,

x

65 63 76 46 68 72 68 57 36 96

Math 2,

y

68 66 86 48 65 66 71 57 42 87

a) Develop an estimated linear regression model with “Math 1” as the independent variable and “Math 2” as the dependent variable.

b) Predict the score a student would obtain “Math 2” if he scored 60 marks in “Math 1”.

The data below represent scores obtained by ten students in

subject Mathematics 1 and Mathematics 2.

EXAMPLE 5.2: STUDENTS SCORE IN MATHEMATICS

Page 17: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

x y x2 y2 xy

65 68 4225 4624 4420

63 66 3969 4356 4158

76 86 5776 7396 6536

46 48 2116 2304 2208

68 65 4624 4225 4420

72 66 5184 4356 4752

68 71 4624 5041 4828

57 57 3249 3249 3249

36 42 1296 1764 1512

96 87 9216 7569 8352

TOTAL 647 656 44279 44884 44435

Solution

Page 18: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

102

1

10 102

1 1

10 10

1 1

10 44279

647 64.7 44884

656 65.6 44435

Solution

n x

x x y

y y xy

Page 19: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

0 1

1

0 1

ˆ ˆˆ

ˆ

ˆ ˆ

xy

xx

Y X

S

S

y x

b b

b

b b

647 65644435

10

1991.8

xy

x yS xy

n

2

2

64744279

10

2418.1

xx

xS x

n

1

1991.8

2418.1

0.8237

xy

xx

S

Sb

Page 20: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

0 1

ˆ ˆ

65.6 0.8237 64.7

12.3063

y xb b

0 1ˆ ˆˆ

12.3063 0.8237

Y X

X

b b

Page 21: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

b When 60,

ˆ 12.3063 0.8237 60

61.7283

61.73

X

Y

Page 22: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

5.3 ADEQUACY OF THE MODEL COEFFICIENT OF

DETERMINATION( R2)

• The coefficient of determination is a measure of the variation of

the dependent variable (Y) that is explained by the regression

line and the independent variable (X).

• The symbol for the coefficient of determination is r2 or R2.

Example :

If r = 0.90, then r2 =0.81. It means that 81% of the variation in

the dependent variable (Y) is accounted for by the variations in the

independent variable (X).

Page 23: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

• The rest of the variation, 0.19 or 19%, is

unexplained and called the coefficient of non

determination.

• Formula for the coefficient of non

determination is 1- r2

Page 24: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

The coefficient of determination is:

2

2 xy

xx yy

Sr

S S

Page 25: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

5.4 PEARSON PRODUCT MOMENT

CORRELATION COEFFICIENT (r)

Correlation measures the strength of a linear

relationship between the two variables.

Also known as Pearso ’s product o e t coefficie t of correlation.

The symbol for the sample coefficient of correlation

is (r)

Formula :

.

xy

xx yy

Sr

S S

Page 26: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

Properties of (r):

Values of r close to 1 implies there is a strong

positive linear relationship between x and y.

Values of r close to -1 implies there is a strong

negative linear relationship between x and y.

Values of r close to 0 implies little or no linear

relationship between x and y.

1 1r

Page 27: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

EXAMPLE 5.4: REFER PREVIOUS EXAMPLE 5.2,

Calculate the value of r and interpret its

meaning.

SOLUTION:

.

1991.8

2418.1 1850.4

0.9416

xy

xx yy

Sr

S S

Thus, there is a strong positive linear relationship between

score obtain Math 1 (x) and Math 2 (y).

2

2

2

10

65644884

10

1850.4

yy

yS y

Page 28: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

• To test the existence of a linear relationship

between two variables x and y, we proceed with

testing the hypothesis.

• Test commonly used:

5.5 TEST FOR LINEARITY OF REGRESSION

t -Test F-Test

Page 29: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

1. Determine the hypotheses.

2. Compute Critical Value/ level of significance.

3. Compute the test statistic.

( no linear relationship)

(exist linear relationship)

t-Test

, 2 2

nt

0:0:

11

10

bb

HH

xx

xyyy

Sn

SSVar

Vart

1

2

ˆ)ˆ(

)ˆ(

ˆ

1

1

1

1

bb

b

b

Page 30: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

2,2

2,2

or

nn

tttt

4. Determine the Rejection Rule.

Reject H0 if :

There is a significant relationship between

variable X and Y.

5.Conclusion.

Page 31: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

EXAMPLE 5.5: REFER PREVIOUS EXAMPLE 5.3,

Test to determine if their scores in Math 1 and Math 2 is related.

Use α=0.05

SOLUTION: 1)

2)

( no linear r/ship)

(exist linear r/ship) 0:0:

11

10

bb

HH

306.205.0

8,2

05.0

t

Page 32: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

3)

b

b

1

1( )

0.8237

0.0108

7.926

testt

Var

bb

1

1

1( )

2

1850.4 (0.8237)(1991.8) 1

8 2418.1

0.0108

yy xy

xx

S SVar

n S

Page 33: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

4) Rejection Rule:

5) Conclusion:

Thus, we reject H0. The score Math 1(x) has a linear relationship

to the score in Math 2(y) .

0.025,8

7.926 2.306

testt t

Page 34: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

F Test

1. Determine the hypothesis

2. Determine the rejection region

3. Compute the test statistics

4. Conclusion

Page 35: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

1. Determine the hypothesis

(NO RELATIONSHIP)

(THERE IS RELATIONSHIP)

2. Compute Critical Value/ level of significance.

3. Compute the test statistics

0:10

bH

0:11

bH

test

MSRF

MSE

,1, 2nF

Page 36: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

2. Determine the rejection region

We reject H0 if

p-value <

3. Conclusion

If we reject H0 there is a significant relationship between variable

X and Y.

,1, 2test nf F

Page 37: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

General form of ANOVA table:

ANOVA Test

1) State the hypothesis

2) Select the distribution to use: F-distribution

3) Calculate the value of the test statistic: F

4) Determine rejection and non rejection regions:

5) Make a decision: Reject H0/failed to reject H0

Source of

Variation

Degrees of

Freedom(df)

Sum of Squares Mean Squares Value of the

Test Statistic

Regression 1 MSR=SSR/1

F=MSR

MSE Error n-2 MSE=SSE/n-2

Total n-1

1 xySSR Sb

yySST S

SSE SST SSR

Page 38: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

Example The manufacturer of Cardio Glide exercise equipment wants to

study the relationship between the number of months since the

glide was purchased and the length of time the equipment was

used last week.

1) Determine the regression equation.

2) At α=0.01, test whether there is a linear relationship between the

variables

Page 39: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

Regression equation:

Solution (1):

ˆ 9.939 0.637Y X

Page 40: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

1) Hypothesis:

1) F-distribution table:

2) Test Statistic:

F = MSR/MSE = 17.303

or using p-value approach:

significant value =0.003

4) Rejection region:

Since F statistic > F table (17.303>11.2586 ), we reject H0 or since

p-value (0.003<0.01 )we reject H0

5) Thus, there is a linear relationship between the number of months

and length of time the equipment was used.

Solution (2):

0 1

1 1

: 0

: 0

H

H

bb

0.01,1,8 11.2586F

Page 41: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis
Page 42: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis
Page 43: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

EXERCISE 5.1:

The owner of a small factory that produces working gloves is

concerned about the high cost of air conditioning in the summer.

Keeping the higher temperature in the factory may lower productivity.

During summer, he conducted an experiment with temperature

settings from 68 to 81 degrees Fahrenheit and measures each day’s productivity which produced the following table:

(a) Find the regression model.

(b) Predict the number of pairs of gloves produced if x = 74.

(c) Compute the Pearson correlation coefficient. What you can say

about the relationship of the two variables?

(d) Can you conclude that the temperature is linearly related to the

number of pairs of gloves produced? Use α=0.05.

Temperature 72 71 78 75 81 77 68 76

Number of Pairs of gloves

(in hundreds)

37 37 32 36 33 35 39 34

Page 44: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

EXERCISE 5.2 :

An agricultural scientist planted alfalfa on several plots of land,

identical except for the soil pH. Following are the dry matter

yields (in pounds per acre) for each plot.

pH Yield

4.6 1056

4.8 1833

5.2 1629

5.4 1852

5.6 1783

5.8 2647

6.0 2131

Page 45: Chapter 5 : Introductory Linear Regression - UniMAP Portalportal.unimap.edu.my/portal/page/portal30/Lecturer Notes/IMK... · INTRODUCTION TO LINEAR REGRESSION ... (1) Regression Analysis

a) Compute the estimated regression line for predicting yield

from pH.

b) If the pH is increased by 0.1, by how much would you

predict the yield to increase or decrease?

c) For what pH would you predict a yield of 1500 pounds per

acre?

d) Calculate coefficient correlation, and interpret the results.