edf

Three Variable Regression Model

• Yi = B1+B2X2i+B3X3i_ Nonstochastic form, PRF

• Yi = B1+B2X2i+B3X3i+ui stochastic

• B2, B3 called partial regression or partial slope coefficients

• B2 measures the change in mean value of Y, per unit change in X2 holding the value of X3 constant

• Yi = b1+b2X2i+b3X3i+ei SRF

Assumptions

• Linear relationship

• Xs are non-stochastic variables.

• No linear relationship exists between two or more independent variables (no multi-collinearaity). Ex:X2i = 3 +2X3

• Error has zero expected value, constant variance and normally distributed

• RSS = ∑e2 = ∑(Yi – Ŷi)2 = ∑(Yi – b1-b2X2i-b3X3i)2

Least squire estimatorsLeast squire estimators

• Like 2-variable case, we can derive formulae for var(b1), var(b2) & var(b3) and hence their S.E.s

• We can also estimate σ2 as

• Goodness of fit, R2 = ESS/RSS

R2 = [b2∑yix2i+b3 ∑yix3i]/∑yi2

• 0 ≤ R2 ≤ 1

2ˆ = ∑e 2/(n – 3)

Testing of hypothesis, t-testTesting of hypothesis, t-test

• Say, Ŷi = -1336.09 + 12.7413X2i+85.7640X3i

(175.2725) (0.9123) (8.8019)

p=0.000 0.000 0.000

R2 = 0.89, n =32

• H0: B1=0, b1/se(b1)~ t(n-3)

• H0: B2=0, b2/se(b2)~ t(n-3)

• H0: B3=β, (b3 - β) /se(b3)~ t(n-3)

Testing Joint Hypothesis, F TestTesting Joint Hypothesis, F Test

H0 : B2 = B3 = 0

Or, H0 : R2 = 0

• X2 & X3 explain zero percent of the variation of Y

H1: At least one B ≠ 0

• A test of either hypothesis is called a test of overall significance of the estimated multiple regression

• We know, TSS = ESS + RSS

F test• If computed F value exceeds critical F value,

we reject the null hypothesis that the impact of explanatory variables is simultaneously equal to zero

• Otherwise we cannot reject the null hypothesis

• It may happen that not all the explanatory variables individually have much impact on dependent variable (i.e., some of the t values may be statically insignificant) yet all of them collectively influence dependent variable (H0 is rejected in F test)

• This happen only we have the problem of multicollinearity

Specification error• In this example we have seen

that both the explanatory variables are individually and collectively different from zero

• If we omit any one of these explanatory variable from our model, then there would be specification error

• What would be b1, b2 & R2 in 2-variable model?

Specification error

• Ŷi = -1336.09 + 12.7413X2i+85.7640X3i

(175.2725) (0.9123) (8.8019)

p=0.000 0.000 0.000R2 = 0.89, n =32

• Ŷi = -191.66 + 10.48X2

(264.43) (1.79)R2 = 0.53

• Ŷi = 807.95 + 54.57X3i

(231.95) (23.57)

R2 = 0.15

RR2 2 versus Adjusted Rversus Adjusted R22

• Larger the number of explanatory variables in the model, the higher the R2 will be

• However, R2 does not take into account dof

• Therefore, comparing R2 values of the two models with same dependent variable but different numbers of explanatory variables is essentially like comparing apples and bananas

• We need a measure of fit that is adjusted for the no. of explanatory variables in the model

RR2 2 versus Adjusted Rversus Adjusted R22

• Such a measure is called Adj R2

• If k > 1, Adj R2 ≤ R2, as the no of explanatory variables increases in the model, Adj R2 becomes increasingly smaller than R2

• It enable us to compare two models that have same dependent variable but different numbers of independent variables

• In our example, it can be shown that Adj R2=0.88 < 0.89 (R2)

2 2 ( 1)1 (1 )

( )

nR R

n k

When to add an additional variable?When to add an additional variable?

• We often faced with problem of deciding among several competing explanatory variables

• Common practice is to add variables as long as Adj R2 increases even though its numerical value may be smaller than R2

Computer output & Reporting

The Chicken Consumption Example

• Explain US Consumption of Chicken

• Time Series Observations - 1950-1984

Variable Definitions

• CHCONS - Chicken consumption in the US

• LDY - Log of disposable income in the US

• PC/PB - Price of Chicken relative to the Price of ‘Best Red Meat’

Data Time plots

Actual plots of the data over time follows

• Note the trends and cycles• What are the relationships between

the variables?• Are movements in CHCONS related

to movements in LDY and PC/PB?

0.0

10.0

20.0

30.0

40.0

50.0

60.0

1950

1952

1954

1956

1958

1960

1962

1964

1966

1968

1970

1972

1974

1976

1978

1980

1982

1984

CH

CO

NS

YEAR

Time plot - CHCONS Actual Data

0.0000

1.0000

2.0000

3.0000

4.0000

5.0000

6.0000

7.0000

8.0000

9.0000

10.0000

LD

Y

Year

Timeplot-LDY Actual Data

0.0000

0.2000

0.4000

0.6000

0.8000

1.0000

1.2000

1.4000

1.6000

1950

1953

1956

1959

1962

1965

1968

1971

1974

1977

1980

1983

PC

/PB

Year

Timeplot-PC/PB Actual Data

Chicken Consumption vs. Income

• There may be a relationship between CHCONS and LDY

• A simple plot of the two variables seems to reveal this

• Note the positive relationship

0.0

10.0

20.0

30.0

40.0

50.0

60.0

7.0000 7.5000 8.0000 8.5000 9.0000 9.5000

CH

CO

NS

LYD

Scatter Plot - CHCONS vs. LYD

Chicken Consumption vs. Relative Price of Chicken

• There may also be a relationship between CHCONS and PC/PB

• A plot of these two variables shows the relationship

• Note the negative relationship

0.0

10.0

20.0

30.0

40.0

50.0

60.0

0.0000 0.2000 0.4000 0.6000 0.8000 1.0000 1.2000 1.4000 1.6000

CH

CO

NS

PC/PB

Scatter Plot - CHCONS vs PC/PB

CHCONS = f(LDY)

• Simple linear regression captures the relationship between CHCONS and LDY, assuming no other relationships

• This regression explains much of the change in CHCONS, but not everything

• The plotted regression line shows the hypothesized relationship and the actual data

CHCONS = f(LDY) LDY Const.Coeff 15.86 -92.17SE(b) 0.53 4.34

R2 = 0.9641 SE(y) = 2.03 F = 879.05 df = 33 SSReg= 3639.12 SSResid = 136.61(also called SSE) (also called SSR)

0.00

10.00

20.00

30.00

40.00

50.00

60.00

7.0000 7.5000 8.0000 8.5000 9.0000 9.5000

CH

CO

NS

LYD

Regression Line - CHCONS = f(LYD)

CHCONS = f(LYD) Actual Data

CHCONS = f(PC/PB)

• Another simple regression examines the relationship between CHCONS and PC/PB

• While the line explains some of the variation of CHCONS, there is more unexplained error

CHCONS = f(PC/PB)

PC/PB Const.Coeff -28.83 50.77SEb 2.93 1.75

R2 = 0.746 SE(y) = 5.39 F = 97.14 df = 33 SSReg = 2818.32 SSResid = 957.42(also called ESS) (also called RSS)

0.00

10.00

20.00

30.00

40.00

50.00

60.00

0.0000 0.5000 1.0000 1.5000

CH

CO

NS

PC/PB

Regression Line - CHCONS = f(PC/PB)

CHCONS=f(PC/PB) Actual Data

CHCONS = f(LDY,PC/PB) LDY PC/PB Const.Coeff 12.79 -8.08 -63.19SEb 0.54 1.12 4.84

R2 = .986 SEy = 1.27 F = 1149.89 df = 32 SSReg = 3723.92 SSResid = 51.82 (SSE) (SSR)

0.0

10.0

20.0

30.0

40.0

50.0

60.0

CH

CO

NS

YEAR

Actual vs. Predicted

Actual CHCONS=f(LDY,PC/PB)

• Table 7.8 Gujarati: US Defense budget outlays 1962 – 1981

Yt= Defense budget outlays for year t ($ Bn)

X2t=GNP for year t ($ Bn)

X3t=US military sales/assistance ($ Bn)

X4t=Aerospace industry sales ($ Bn)

X5t= Military conflicts involving troops

=0, if troops < 100000=1, if troops > 100000

• Table 8.10, GujaratiTable gives data used by a telephone cable manufacturer to predict sales to a major consumer for the period 1968 – 1983

Y=annual sales in MPF (million paired feet)X2=GNP (billion $)

X3=housing starts (1000 of units)

X4=Unemployment rate (%)

X5=Prime rate lagged 6 months

X6= Customer line gains (%)• Introduce later

• Table 7.10, Gujarati

Consider following demand function for money in US for 1980 – 1998

Where, M = Real money demandY = Real GDPr = Interest rateLTRATE: Long term interest rate

(30 yr tr bond)TBRATE: 3 months tr bill rate

tubt

btt erYbM 32

1

Documents

edf