31
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Embed Size (px)

Citation preview

Page 1: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Applied Quantitative Analysis and Practices

LECTURE#23

By

Dr. Osman Sadiq Paracha

Page 2: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Previous Lecture Summary Simple Linear Regression Correlation Vs Regression Introduction to Simple Linear Regression Simple Linear Regression Model Least Square Method Interpretation of Model Measures of variation

Page 3: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Simple Linear Regression Model

Only one independent variable, X Relationship between X and Y is

described by a linear function Changes in Y are assumed to be related

to changes in X

Page 4: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

ii10i εXββY Linear component

Simple Linear Regression Model

Population Y intercept

Population SlopeCoefficient

Random Error term

Dependent Variable

Independent Variable

Random Error component

Page 5: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

(continued)

Random Error for this Xi value

Y

X

Observed Value of Y for Xi

Predicted Value of Y for Xi

ii10i εXββY

Xi

Slope = β1

Intercept = β0

εi

Simple Linear Regression Model

Page 6: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

i10i XbbY

The simple linear regression equation provides an estimate of the population regression line

Simple Linear Regression Equation (Prediction Line)

Estimate of the regression

intercept

Estimate of the regression slope

Estimated (or predicted) Y value for observation i

Value of X for observation i

Page 7: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

The Least Squares Method

b0 and b1 are obtained by finding the values of

that minimize the sum of the squared

differences between Y and :

2i10i

2ii ))Xb(b(Ymin)Y(Ymin

Y

Page 8: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

b0 is the estimated average value of Y

when the value of X is zero

b1 is the estimated change in the

average value of Y as a result of a one-unit increase in X

Interpretation of the Slope and the Intercept

Page 9: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

0

50

100

150

200

250

300

350

400

450

0 500 1000 1500 2000 2500 3000

Square Feet

Ho

use

Pri

ce (

$100

0s)

Simple Linear Regression Example: Graphical Representation

House price model: Scatter Plot and Prediction Line

feet) (square 0.10977 98.24833 price house

Slope = 0.10977

Intercept = 98.248

Page 10: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

0

50

100

150

200

250

300

350

400

450

0 500 1000 1500 2000 2500 3000

Square Feet

Ho

use

Pri

ce (

$100

0s)

Simple Linear Regression Example: Making Predictions

When using a regression model for prediction, only predict within the relevant range of data

Relevant range for interpolation

Do not try to extrapolate

beyond the range of observed X’s

Page 11: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Measures of Variation

Total variation is made up of two parts:

SSE SSR SST Total Sum of

SquaresRegression Sum

of SquaresError Sum of

Squares

2i )YY(SST 2

ii )YY(SSE 2i )YY(SSR

where:

= Mean value of the dependent variable

Yi = Observed value of the dependent variable

= Predicted value of Y for the given Xi valueiY

Y

Page 12: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

SST = total sum of squares (Total Variation)

Measures the variation of the Yi values around their mean Y

SSR = regression sum of squares (Explained Variation) Variation attributable to the relationship between X

and Y SSE = error sum of squares (Unexplained Variation)

Variation in Y attributable to factors other than X

(continued)

Measures of Variation

Page 13: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

(continued)

Xi

Y

X

Yi

SST = (Yi - Y)2

SSE = (Yi - Yi )2

SSR = (Yi - Y)2

_

_

_

Y

Y

Y_Y

Measures of Variation

Page 14: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable

The coefficient of determination is also called r-squared and is denoted as r2

Coefficient of Determination, r2

1r0 2 note:

squares of sum total

squares of sum regression2 SST

SSRr

Page 15: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

r2 = 1

Examples of Approximate r2 Values

Y

X

Y

X

r2 = 1

r2 = 1

Perfect linear relationship between X and Y:

100% of the variation in Y is explained by variation in X

Page 16: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Examples of Approximate r2 Values

Y

X

Y

X

0 < r2 < 1

Weaker linear relationships between X and Y:

Some but not all of the variation in Y is explained by variation in X

Page 17: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Examples of Approximate r2 Values

r2 = 0

No linear relationship between X and Y:

The value of Y does not depend on X. (None of the variation in Y is explained by variation in X)

Y

Xr2 = 0

Page 18: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Simple Linear Regression Example: Coefficient of Determination

Regression Statistics

Multiple R 0.76211

R Square 0.58082

Adjusted R Square 0.52842

Standard Error 41.33032

Observations 10

ANOVA  df SS MS F Significance F

Regression 1 18934.9348 18934.9348 11.0848 0.01039

Residual 8 13665.5652 1708.1957

Total 9 32600.5000      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

58.08% of the variation in house prices is explained by

variation in square feet

0.5808232600.5000

18934.9348

SST

SSRr 2

Page 19: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Standard Error of Estimate

The standard deviation of the variation of observations around the regression line is estimated by

2

)ˆ(

21

2

n

YY

n

SSES

n

iii

YX

WhereSSE = error sum of squares n = sample size

Page 20: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Simple Linear Regression Example:Standard Error of Estimate

Regression Statistics

Multiple R 0.76211

R Square 0.58082

Adjusted R Square 0.52842

Standard Error 41.33032

Observations 10

ANOVA  df SS MS F Significance F

Regression 1 18934.9348 18934.9348 11.0848 0.01039

Residual 8 13665.5652 1708.1957

Total 9 32600.5000      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

41.33032SYX

Page 21: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Comparing Standard Errors

YY

X XYX

S smallYX

S large

SYX is a measure of the variation of observed Y values from the regression line

The magnitude of SYX should always be judged relative to the size of the Y values in the sample data

i.e., SYX = $41.33K is moderately small relative to house prices in the $200K - $400K range

Page 22: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Assumptions of RegressionL.I.N.E

Linearity The relationship between X and Y is linear

Independence of Errors Error values are statistically independent

Normality of Error Error values are normally distributed for any given

value of X Equal Variance (also called homoscedasticity)

The probability distribution of the errors has constant variance

Page 23: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Residual Analysis

The residual for observation i, ei, is the difference between its observed and predicted value

Check the assumptions of regression by examining the residuals

Examine for linearity assumption Evaluate independence assumption Evaluate normal distribution assumption Examine for constant variance for all levels of X

(homoscedasticity)

Graphical Analysis of Residuals Can plot residuals vs. X

iii YYe

Page 24: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Residual Analysis for Linearity

Not Linear Linear

x

resi

dua

ls

x

Y

x

Y

x

resi

dua

ls

Page 25: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Residual Analysis for Independence

Not IndependentIndependent

X

Xresi

dua

ls

resi

dua

ls

X

resi

dua

ls

Page 26: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Checking for Normality

Examine the Stem-and-Leaf Display of the Residuals

Examine the Boxplot of the Residuals Examine the Histogram of the Residuals Construct a Normal Probability Plot of the

Residuals

Page 27: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Residual Analysis for Normality

Percent

Residual

When using a normal probability plot, normal errors will approximately display in a straight line

-3 -2 -1 0 1 2 3

0

100

Page 28: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Residual Analysis for Equal Variance

Non-constant variance Constant variance

x x

Y

x x

Y

resi

dua

ls

resi

dua

ls

Page 29: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

House Price Model Residual Plot

-60

-40

-20

0

20

40

60

80

0 1000 2000 3000

Square Feet

Re

sid

ua

ls

Simple Linear Regression Example:

RESIDUAL OUTPUT

Predicted House Price Residuals

1 251.92316 -6.923162

2 273.87671 38.12329

3 284.85348 -5.853484

4 304.06284 3.937162

5 218.99284 -19.99284

6 268.38832 -49.38832

7 356.20251 48.79749

8 367.17929 -43.17929

9 254.6674 64.33264

10 284.85348 -29.85348

Does not appear to violate any regression assumptions

Page 30: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Inferences About the Slope

The standard error of the regression slope coefficient (b1) is estimated by

2i

YXYXb

)X(X

S

SSX

SS

1

where:

= Estimate of the standard error of the slope

= Standard error of the estimate

1bS

2n

SSESYX

Page 31: Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha

Lecture Summary Measures of Variation (Graphical

Representation) Co-efficient of Determination (r2) Example of Co-efficient of Determination Standard Error Estimate Assumptions of Regression L.I.N.E

Linearity Independence of Errors Normality of Errors Equal Variance

Inference about slope