Topic4 Ordinary Least Squares

Topic4

Ordinary Least Squares

• Suppose that X is a non-random variable• Y is a random variable that is affected by X in a

linear fashion and by the random variable with E() = 0That is,

E(Y) = + X

Or, Y = + X +

O X

Y

..

. ..

Observed points

O X

Y

ActualLine

. .Y= 1 + 2x..

.

O X

Y

ActualLine

.Y= 1 + 2x..

..

O X

Y

ActualLine

.Y= 1 + 2x.

.

..

O X

Y

ActualLine

Y= 1 + 2x.

.

. ..

O X

Y

ActualLine

Y= 1 + 2x

.

. ..

.

O X

Y

. ActualLine

Y= 1 + 2x

.

. ..

O X

Y

. ActualLine

Y= 1 + 2x. ..

Y= b1 + b2xFitted Line

.

BC is an error of EstimationAC is an effect of the random factor

C

B

. A.

• The Ordinary Least Squares (OLS) estimates are obtained by minimising the sum of the squares of each of these errors.

• The OLS estimates are obtained from the values of X and the actual Y values (YA) as follows:

Error of estimation (e) YA –YE |

where YE is the estimated value of Y.e2 YA

–YE ]2

e2 YA –(b1 + b2 X)]2

e2/b1 YA –(b1 + b2X)] (-1) =0

e2 /b2 YA –(b1 + b2X)] (-X) = 0

Y –(b1 + b2X)] (-1) = 0

-NYMEAN + N b1 + b2NXMEAN = 0

b1 = YMEAN – b2XMEAN ….. (1)

e2/b2 Y –(b1+ b2X)] (-X) = 0

Y –(b1 + b2X)] (-X) = 0

b1X –b2X2 = XY ………..(2)

b1 = YMEAN - b2XMEAN ….. (1)

• These estimates are given below (with the superscripts for Y dropped).

^1 = (∑Y)(ΣX2) – (∑X)(∑XY)

N∑ X2 - (∑X)2

^2 = N∑YX – (∑X)(∑Y)

N∑ X2 - (∑X)2

• Alternatively,

^1 = YMEAN - ^2XMEAN

^2 = Covariance(X,Y) Variance(X)

(a) ei (Yi– YiE) = 0 and

(b) X2iei X2i(Yi– YiE) = 0

where YiE is the estimated value of Yi.

X2i is the same as Xi from before Proof: (Yi– YiE) = Yi– ^1 - ^2 X2i)

= Yi– ^1 - ^2 X2i

= nYMEAN – n^1 - n^2 XMEAN

= n(YMEAN – ^1 - ^2 XMEAN)

= 0 [ since ^1 = YMEAN - ^2XMEAN ]

Two Important Results

See the lecture notes for a proof of part (b)

Total sum of squares (TSS) (Yi– YMEAN )

2

Residual sum of squares (RSS) (Yi– Yi

E )

2

Explained sum of squares (ESS) (Yi

E – YMEAN )

2

To prove that

TSS = RSS + ESS

TSS ≡ (Yi– YMEAN)2

= {(Yi– YiE + Yi

E– YMEAN)}2

= (Yi– YiE)2 + (Yi

E– YMEAN)}2

(Yi– Yi E)(Yi

E– YMEAN)

= RSS + ESS (Yi– YiE)(Yi

E– YMEAN)

(Yi– YiE)(Yi

E– YMEAN)

Yi– YiE)(Yi

E ) -YMEAN Yi– YiE)

Yi– YiE)(Yi

E ) [by (a) above]

Yi– YiE)(Yi

E ) = Yi– YiE)(^1^2

Xi)

= ^1 Yi– YiE)^2 XiYi– Yi

E)

= 0 [by (a) and (b) above]

R2 ≡ ESS/TSS

Since TSS = RSS + ESS, it follows that

0 R2

Topic 5

Properties of Estimators

In the discussion that follows, ^ is an estimator of the parameter of interest,

Bias of ^ ≡ E(^) -

^ is unbiased if Bias of ^ = 0.

^ is negatively biased if Bias of ^ < 0.

^ is positively biased if Bias of ^ > 0.

Mean Squared Errors (MSE) of estimation for ^ is given asMSE^ ≡ E[(^-)]2

MSE^ ≡ E[(^-)2]≡ E[{^-E(^) +E(^)-≡ E[{^-E(^)}2] + E[{E(^)- 2E[{^-E(^)}*{E(^)-≡ Var(^) + {E(^)- 2E[{^-E(^)}*{E(^)-

Now, E[{^-E(^)}*{E(^)-

≡ {E(^)-E(^)}*{E(^)-

MSE^ ≡ Var(^) + {E(^)-

MSE^ ≡ Var(^) + (bias)2 .

≡ 0*{E(^)-

If ^ is unbiased, that is, if E( ^)- = 0. then we have,

MSE^ ≡ Var(^)

An unbiased estimator ^ of a parameter is efficient if and only if it has the smallest variance of all unbiased estimatorsThat is, for any other unbiased estimator p of

Var(^)≤ Var(p)

An estimator ^ is said to be consistent if it converges in probability to . That is,

Lim n Prob(|^- | > ) = 0 for every > 0.

When the above condition holds, ^ is said to be the probability limit of , that is,plim^

Sufficient conditions for consistency: If the mean of ^convergesto and var(^) converges to zero (as n approaches ) then ^is consistent.

That is, ^n is consistent if it can be shown that

Lim n E(^n

And Lim n Var(^n

The Regression Model with TWO Variables

The Model :: Y = X +

Y is the DEPENDENT variable

X is the INDEPENDENT variable

Yi X1i X2i i

The OLS estimates ^1 and ^2 are sample

statistics used to estimate 1and2 respectively

Yi X1i X2i i

Here X1i ≡ 1 for all i and X2 is

nothing but X .

Assumptions about X2:

(1a) X2 is non-random (chosen by the

investigator) (1b) Random sampling is performed from a population of fixed values of X2 .

(1c) : Lim (1/n)x22i) = Q > 0

n [ where x2i X2i – X2MEAN.]

(1c) : Lim (1/n)X2i) = P > 0

n

Assumptions about the disturbance term

2a. E() = 0

2b. Var(i) = 2 for all i.

2c. Cov(ij ) = 0 for i j. (The values

are uncorrelated across observations). 2d. The i all have a normal distribution

Homoskedasticity

Result^2 is linear in the dependent variable Yi

^2 = Covariance(X,Y)

Variance(X)

^2 = Yi–YMEAN )Xi–XMEAN )

Xi–XMEAN )2

Proof:

^2 = YiXi–XMEAN )

Xi–XMEAN )2

+ K

CiYiK

where the Ci andK are constants

Therefore,

^2 is a linear function of Yi

Since, Yi

X1i X2i i

^2 is a linear function of i and hence

is normally distributed

Similarly,

^1 is a linear function of Yi (and

hence i ) and is normally distributed

Both ^1 and ^2 are unbiased estimates of 1 and 2 respectively.

That is, E( ^1 ) = 1 and

E( ^2 ) = 2

Each of ^1 and ^2 is an efficient estimators of 1 and 2 respectively.

Thus, each of ^1 and ^2 is a

Best (efficient)

Linear (in the dependent variable Yi )

Unbiased

Estimator of 1 and 2 respectively.

Each of ^1 and ^2 is a consistent

estimator of 1 and 2 respectively.

Also,

Var(^1 ) = (1/n +X 2mean2x2i

2)

Var(^2 ) = x2i2)

. Cov(^1, ^2 ) = -X 2meanx2i2

LimVar(^2 )

n = Lim x2i

2

n = Lim /nx2i

2/n

n = 0/Q [using assumption (1c)]

= 0

Because ^2 is an unbiased estimator of 2 and

LimVar(^2 ) = 0

n

^2 is a consistent estimator of 2

The variance of the random term, , is not known

To perform statistical analysis, we estimate by

^2 RSS/(n-2)

This is because ^2 is an unbiased estimator of 2

Documents

Topic4 Ordinary Least Squares