16
1 We use sample data to estimate a population mean () or ( 1 - 2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about or ( 1 - 2 ) test of hypothesis about p or (p 1 - p 2 ). Now we want to use sample data to investigate the relationships among a group of variables and to create a mathematical model that can be used to predict its value in the future. The process of finding a mathematical Introduction to Regression Analysis

1 We use sample data to estimate a population mean ( ) or ( 1 - 2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

Embed Size (px)

Citation preview

Page 1: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

1

• We use sample data to • estimate a population mean () or (1 - 2) • estimate a population proportion (p) or (p1 - p2)• test of hypothesis about or (1 - 2)• test of hypothesis about p or (p1 - p2).

• Now we want to use sample data to investigate the relationships among a group of variables and to create a mathematical model that can be used to predict its value in the future.

• The process of finding a mathematical model (an equation) that best fits the data is known as regression analysis.

Introduction to Regression Analysis

Page 2: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

2

• The variable to be predicted (or modeled), y, is called the dependent variable.

• The variables used to predict (or model) y are called independent variables and are denoted by the symbols x1, x2, x3, etc..

• General form of probabilistic model in regression:

where y = dependent variable

= mean or expected value of y, deterministic component

= unexplainable, or random error component• Estimation/prediction equation

Introduction to Regression Analysis

kkxxxy xxxyk

...22110,...,,| 21

kxxxy ,...,,| 21

kk xbxbxbby ...ˆ 22110

Page 3: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

3

Form of The Simple LinearRegression Model

Form of The Simple LinearRegression Model

y|x = b0 + b1x is the mean value of the dependent variable y when the value of the independent variable is x

b0 is the y-intercept, the mean of y when x is 0 (when there is observed any values of x near 0)

b1 is the slope, the change in the mean of y per unit change in x (over the range of sample x-values)

e is an error term that describes the effect on y of all factors other than x

εxββ=εμy= y|x 10

Page 4: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

4

The Simple Linear Regression ModelIllustrated

The Simple Linear Regression ModelIllustrated

Page 5: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

5

Regression TermsRegression Terms

• β0 and β1 are called regression parameters

• β0 is the y-intercept and β1 is the slope

• We do not know the true values of these parameters

• So, we must use sample data to estimate them

• b0 is the estimate of β0 and b1 is the estimate of β1

Page 6: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

6

The Least Squares Point EstimatesThe Least Squares Point Estimates

n

xx i

xy 10 bb ˆEstimation/prediction equation

MS EXCEL: =SLOPE(y range, x range) =INTERCEPT(y range, x range)

Slope:

y-intercept:

SSxx

SSxyb 1

xbyb 10

n

yy i n=sample size

yxnyxyyxxSS iiiixy ))((

222 )()( xnxxxSS iixx

Page 7: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

7

An Estimator of 2

22

n

SSEs

where

n = sample sizes = standard deviation of error = standard error of estimate

xyixyyyii SSbynySSbSSyySSE 122

12 )()ˆ(

Page 8: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

8

A 100(1-)% confidence interval for the simple linear regression slope 1

where

t/2 is based on (n-2) degree of freedom

12/1 bstb

xx

bSS

ss

1

Page 9: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

9

Testing the Significance of the Slope

One Tailed Test Two Tailed TestHo: 1 = 0 Ho: 1 = 0Ha: 1 < 0 Ha: 1 0 or 1 > 0

Test Statistic:

Rejection region: t< -t Rejection region: |t|>t/2

or t> t Where t is based on Where t/2 is based on(n-2) degree of freedom (n-2) degree of freedom

1

1

bs

bt

Page 10: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

10

The 100(1-)% confidence interval for the mean value of y for x=xp

( )

/y t sn

x x

SSp

xx

2

21

Where t/2 is based on (n-2) degree of freedom

Page 11: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

11

The 100(1-)% prediction interval for an individual y for x=xp

( )

/y t sn

x x

SSp

xx

2

2

11

Where t/2 is based on (n-2) degree of freedom

Page 12: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

12

Simple Coefficient of Determination

r2 =Explained Variation

Total Variation

2

2

)(

)ˆ(

yy

yy

i

i

About 100(r2)% of the sample variation in y can be explained by using x to predict y in the simple linear regression model.

Total VariationExplained Variationy

xi

yi

iy

Un-Explained Variation

Page 13: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

13

The coefficient of correlation

SSxy r = ---------------- SSxx SSyy r for sample and (rho) for population -1< r <1 r > 0 means that y increases as x increases r < 0 means that y decreases as x increases r 0 little or no linear relationship between y and x. the closer r to 1 or –1, the stronger the relationship. High correlation does not imply causality. Only a linear trend may exist between x and y.

222)( ynyyySS iiyy

Where

2rr 2rr when b1>0 or when b1<0

Page 14: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

Exercise

• What is the range of values that the coefficient of determination can assume? ___

• If the value of r is -0.96, what does this indicate about the dependent variable as the independent variable increases? __

• If the correlation between sales and advertising is +0.6, what percent of the variation in sales can be attributed to advertising? __

• What does the coefficient of determination equal if r = 0.89?

Page 15: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

15

Exercise

• In the regression equation, what does the letter "b" represent?

• What is the null hypothesis to test the significance of the slope in a regression equation?

• The regression equation is Ŷ = 29.29 - 0.96X, the sample size is 8, and the standard error of the slope is 0.22. What is the test statistic to test the significance of the slope?

Page 16: 1 We use sample data to estimate a population mean (  ) or (  1 -  2 ) estimate a population proportion (p) or (p 1 - p 2 ) test of hypothesis about

16

Exercise

• Page 488 no. 26• Page 494 no. 31• Page 500 no. 38• Page 502 no. 46• Page 506 no. 56