Lecture 7 Preview: Estimating the Variance of an Estimate’s Probability Distribution Review: Ordinary Least Squares (OLS) Estimation Procedure Importance

Lecture 7 Preview: Estimating the Variance of an Estimate’s Probability Distribution

Review: Ordinary Least Squares (OLS) Estimation Procedure

Importance of the Coefficient Estimate’s Probability Distribution

General Properties of the Ordinary Least Squares (OLS) Estimation Procedure

Step 1: Estimate the Variance of the Error Term’s Probability Distribution

Step 2: Use the Estimated Variance of the Error Term’s Probability Distribution to Estimate the Variance of the Coefficient Estimate’s Probability Distribution

Degrees of Freedom

Estimating the Variance of the Coefficient Estimate’s Probability Distribution

First Attempt: Variance of the Error Term’s Numerical ValuesSecond Attempt: Variance of the Residual’s Numerical ValuesThird Attempt: “Adjusted” Variance of the Residual’s Numerical Values

Three Important Parts

Regression Printouts

Mean (Center) of the Coefficient Estimate’s Probability DistributionVariance (Spread) of the Coefficient Estimate’s Probability Distribution

Summary: The Ordinary Least Squares (OLS) Estimation Procedure

Value of the CoefficientVariance of the Error Term’s Probability DistributionVariance of the Coefficient Estimate’s Probability Distribution

The Problem: But there is a problem here, isn’t there? We need to know the variance of the error term’s probability distribution to calculate the variance of the coefficient estimate’s probability distribution. Unfortunately, the variance of the error term’s probability distribution is unobservable. In reality, we can never know the variance of the error term’s probability distribution.

How can Clint proceed?

Importance of the Probability Distribution’s Mean (Center) and Variance (Spread)Mean: When the mean of the coefficient estimate’s probability distribution, Mean[bx], equals the actual value of the coefficient, x, the estimation procedure is unbiased.

Variance: When the estimation procedure for the coefficient value is unbiased, the variance of the estimate’s probability distribution, Var[bx], determines the reliability of the estimate.

Mean[bx] = x

Estimation Procedure Is Unbiased

Var[bx] =

Determines the Reliability of the

Estimate

As Var[bx] Decreases Reliability of bx Increases

As the variance decreases, the probability distribution becomes more tightly cropped around the actual value making it more likely for the coefficient estimate to be close to the actual coefficient value.

The estimation procedure does not systematically underestimate or overestimate the actual coefficient value.

General Properties of the Ordinary Least Squares (OLS) Estimation ProcedureWhen the standard ordinary least squares premises are met, the following equations describe the coefficient estimate’s probability distribution:

Probability Distribution of Coefficient Estimates

Mean[bx] = x

Probability Distributions of Coefficient Estimates

Mean[bx] = x Mean[bx] = x

Variance large Variance small

Clint’s Strategy: Estimating the Variance of the Coefficient Estimate’s Probability Distribution

Step 2: Apply the relationship between the variances of the coefficient estimate’s and

error term’s probability distributions to estimate the variance of the coefficient

estimate’s probability distribution:

Step 1: Estimate the variance of the error term’s probability distribution from the available information –

information from the first quiz:

Strategy: Two Steps

EstVar[e]

EstVar[bx] =

Var[bx] =

EstVar[e]

When Clint was faced with a similar problem before, what did he do?

Econometrician’s Philosophy: If you lack the information to determine the value directly, estimate the value to the best of your ability using the information you do have.

What information does Clint have?

Information from Professor Lord’s first quiz.

First Quiz Student x y 1 5 66 2 15 87 3 25 90

Step 1: Estimating the Variance of the Error Term’s Probability Distribution

Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment, the distribution of the numerical values from the experiments mirrors the random variable’s probability distribution; the two distributions are identical:

Distribution of the Numerical Values

After many, many repetitions Probability Distribution

Variance of the Numerical Values

Variance of Probability Distribution

Applying this to the variance:

We shall use simulations to assess these attempts by exploiting the relative frequency interpretation of probability:

Variance of the error term’s numerical values from the first quiz.

Variance of the residual’s numerical values from the first quiz

“Adjusted” variance of the residual’s numerical values from the first quiz

Preview: While the first two attempts fail for different reasons, they provide the motivation for the third attempt which succeeds.

Three Attempts to Estimate the Variance of the Error Term’s Probability Distribution

Estimating Var[e], Var[Clint’s 3 Error Terms] – 1st Try

Error term represents random influences: Mean[e] = 0.

Calculate the variance of the three error terms that were observed on the first quiz;

Strategy: Use the variance of the three error terms from Professor Lord’s first quiz to estimate the variance of the error term’s probability distribution.

yt = Const + xxt + et et = yt (Const + xxt)

First Quiz Student xt yt

Const = 50 x = 2Const + xxt = 50 + 2xt

50 + 25 = 6050 + 215 = 8050 + 225 = 100

et = yt (Const + xxt)et= yt (50 + 2xt)

66 60 = 687 80 = 790 100 = 10

62 = 3672 = 49

102 = 100SSE = 185

1 5 662 15 873 25 90

Compute the deviations from the mean.

Var[e1, e2, and e3 1st Quiz]

Square the deviations.

Calculate the average.

Question: As a consequence of random influences, can we expect the variance of the numerical values from one repetition, the first quiz, to equal the actual variance of the error term’s probability distribution? No

What can we hope for then?

We can hope that this procedure is unbiased; we can hope that the procedure does not systematically underestimate or overestimate the actual variance.

Does the error term represent a random

influence?Does the simulation

represent the variance of the error term’s

probability distribution accurately?

Is the estimation procedure for the variance of the error

term’s probability distribution unbiased?

Lab 7.1

Lab 7.1

http://www3.amherst.edu/~fwesthoff/MITLinks/MIT-Lab-07-01.html

Is the estimation procedure for the variance of the error term’s probability distribution unbiased?

Mean (Average) of the Estimates Actual for the Variance of the Error Term’s Var[e] Repetitions Probability Distribution

500 200 50

>1,000,000>1,000,000>1,000,000

500 200.50

Question: What is the best we can hope for?Answer: We can hope that this procedure is unbiased; we can hope that the procedure does not systematically underestimate or overestimate the actual variance.

Question: How can we determine whether or not the estimation procedure for variance of the error term’s probability distribution unbiased?Answer: Exploit the relative frequency interpretation of probability:

Compare the actual variance of the error term’s probability distribution and the mean (average) of the variance estimates after many, many repetitions.

Observations:

Can we expect the estimate to equal the actual value? No.In fact, we can be all but certain that the estimate will not equal the actual value.Sometimes the estimate is less than the actual value and sometimes it is greater.

We cannot predict the value of the estimate for the variance of the error term’s probability distribution beforehand even when we know the actual value of the variance.The estimate is a random variable.

Lab 7.1


Estimating Var[e], Var[Clint’s 3 Error Terms] – 1st Try

Error term represents random influences: Mean[e] = 0.

Calculate the variance of the three error terms that were observed on first quiz;

Strategy: Use the variance of the three error terms from Professor Lord’s first quiz to estimate the variance of the error term’s probability distribution.

yt = Const + xxt + et et = yt (Const+ xxt)

First Quiz Student x y

Const = 50 x = 2Const + xx = 50 + 2x

50 + 25 = 6050 + 215 = 8050 + 225 = 100

et = yt (Const + xxt)et 1st Quiz

66 60 = 687 80 = 790 100 = 10

e2 1st Quiz62 = 3672 = 49

102 = 100SSE = 185

1 5 662 15 873 25 90

Compute the deviations from the mean.

Square the deviations.

Calculate the average.

But we used the actual constant and coefficient, Const and x, to calculate the errors.

Bad news: It does not help Clint. Clint does not know the values of Const or x.

Good news: This procedure is unbiased. Despite the bad news, keep the

good news in mind.

Var[e1, e2, and e3 1st Quiz]

Sum of Squared Errors (SSE) Versus Sum of Squared Residuals (SSR) Sum of Squared Errors (SSE)

Based on the value of the error terms

yt = Const + xxt + et

et = yt (Const + xxt)

Sum of Squared Residuals (SSR)

Based on the value of the residuals

Need the actual constant and

coefficient, Const and x, calculate the sum of squared errors.

But, Const and x are unobservable; that

is the whole problem. Clint cannot calculate the sum of squared errors.

Use the OLS procedure to calculate

the estimates of the constant and coefficient, bConst and bx.

Clint can calculate the sum of

squared residuals.

Strategy: We just showed in our simulations that the sum of squared errors, is an unbiased estimation procedure for the variance of the error term’s probability distribution.Clint cannot calculate the sum of squared errors, however.

Perhaps Clint can use the sum of squared residuals instead.

Econometrician’s Philosophy: If you lack the information to determine the value directly, estimate the value to the best of your ability using the information you do have.

We can think of an observation’s residual as an estimate of its error term.

Rest = yt Estyt where Estyt = bConst + bxx

Rest = yt (bConst + bxx)

Estimating Var[e], Var[Clint’s 3 Residuals] – 2nd Try

First Quiz Student xt yt

66 69 = 3

87 81 = 6

90 93 = 3

32 = 9

62 = 36

32 = 9

SSR = 54

1 5 66

2 15 87

3 25 90

Clint uses the estimated constant and coefficient to calculate the “estimated” error terms, the residuals.

Good news: Clint has the information to perform these calculations.

Bad news: This procedure is biased. It systematically underestimates the variance.

Lab 7.2

Rest = yt Estyt = yt (bConst + bxx)

Var[Res1, Res2, and Res3]

Mean[Res] = Mean[Res1, Res2, and Res3 1st Quiz] = Res1 + Res2 + Res3

3

Var[Res1, Res2, and Res3 1st Quiz]

Question: Is the procedure is unbiased? No

In fact, we can prove that the mean of the residuals must equal 0.


Why Is Our Second Attempt Biased?

Question: How were bConst and bx chosen?

Answer: To minimize the sum.

SSR < SSE

We can be all but certain that

bConstConst and bxx.

Unbiased

Systematically underestimates the variance

The estimation procedure based on the SSE’s is unbiased.

How do SSE and SSR differ?

Const and xversus bConst and bx.Sum using b’s Sum using ’s

<

SSE

SSR =

Lab 7.3

Error: et = yt (Const + xxt) Residual: Rest = yt (bConst + bxxt)

Var[e1, e2, and e3 1st Quiz] Var[Res1, Res2, and Res3 1st Quiz] =

Var[e1, e2, and e3 1st Quiz]Var[Res1, Res2, and Res3 1st Quiz]

= [y1 (bConst + bxx1)]2 + [y2 (bConst + bxx2)]2 + [y3 (bConst + bxx3)] 2

= [y1 (Const + xx1)]2 + [y2 (Const + xx2)]2 + [y3 (Const + xx3)] 2

<

<

The estimation procedure based on the SSR’s is biased downward.

Biased downward

When the actual constant and

coefficient are used, the procedure

is unbiased.


66 69 = 3

87 81 = 6

90 93 = 3

32 = 9

62 = 36

32 = 9SSR = 54

1 5 66

2 15 87

3 25 90

Estimating Var[e], AdjVar[Clint’s 3 Residuals] – 3rd Try

Good news: Clint can perform to this calculation.

Number ofDegrees of Freedom = Sample Size Estimated Parameters

= 3 2 = 1

From before:

Question: Is the procedure is unbiased?

Good news: The procedure is unbiased. Lab 7.4Yes

First Quiz: Student x y

NB: We shall postpone our discussion of degrees of freedom for a few minutes.


Clint’s Strategy To Estimate the Variance of the Coefficient Estimate’s Probability Distribution

Step 2: Apply the relationship between the variances of the coefficient estimate’s and error term’s probability distributions to estimate the variance of the coefficient

estimate’s probability distribution:

Step 1: Estimate the variance of the error term’s probability distribution from the available information –

information from the first quiz:

=54

200 = .27

The square root of the estimated variance is

called the standard error.

= .5196

= 54SSR

Degrees of Freedom=541

=

What can we hope to be able to say about the estimation procedure for the variance of the coefficient estimate’s probability distribution?

We can hope that this procedure is unbiased also; that is, we can hope that the procedure does not systematically underestimate or overestimate the actual variance of the coefficient estimate’s probability distribution

What can we sayabout the estimation procedure for the variance of the error term’s probability distribution?

It is unbiased.

EstVar[e] Var[bx] =

EstVar[bx] =EstVar[e]

x’s: x1 = 5 x2 = 15 x3 = 25

= (-10)2 + 02 + 102 = 100 + 0 + 100 = 200

= 15

Is the estimation procedure for the variance of the coefficient

estimate’s probability distribution unbiased?

Is the estimation procedure for the variance of the coefficient estimate’s

probability distribution unbiased?

Variance of the Coefficient Mean (Average) of the Estimates Actual Estimate’s Probability for the Variance of the Coefficient Var[e] Distribution: Var[bx] Estimate’s Probability Distribution

500 200 50

Lab 7.5=

200500

= 2.520050

= 1.0= .25

2.51.0.25

2.5 1.0.25


Degrees of Freedom

Attempt 2: We divided by the sample size:

Error terms

et = yt (Const + xxt)

Residuals

Rest = yt (bConst + bxxt)

Since the residuals are the “estimated errors,” it seems natural to divide the sum of squared residuals by the sample size, 3 in Clint’s case.

But this procedure proved to be biased; it systematically underestimates the actual variance.

Attempt 3: We divided by the degrees of freedom rather than the sample size:

Recall Attempts 2 and 3 to estimate the variance of the error terms probability distribution.

Think of the residuals are

the estimated

errors.

Var[Res1, Res2, and Res3]

Since Mean[Res] = 0:

AdjVar[Res1, Res2, and Res3]

Degrees of Freedom = Sample Size Number of Estimated Parameters = 3 2 = 1

Dividing by the degrees of freedom rather than the sample size solves the bias problem.

The modified procedure proved to be unbiased.

Strategy: Use the variance of the residuals (“estimated errors”) to estimate the variance of the error term’s probability distribution.

Question: Why does dividing by the sample size fail, but dividing by the degrees of freedom succeeed?

Question: Why does dividing by 1 rather than 3 work?

How Do We Calculate an Average?

Monthly Precipitation in Amherst, Massachusetts during the 20th Century

Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec1901 2.09 0.56 5.66 5.80 5.12 0.75 3.77 5.75 3.67 4.17 1.30 8.511902 2.13 3.32 5.47 2.92 2.42 4.54 4.66 4.65 5.83 5.59 1.27 4.271903 3.28 4.27 6.40 2.30 0.48 7.79 4.64 4.92 1.66 2.72 2.04 3.95

2000 3.00 3.40 3.82 4.14 4.26 7.99 6.88 5.40 5.36 2.29 2.83 4.24

Mean (Average) for June =.75 + 4.54 + 7.79 + … + 7.99

100=

377.76

100= 3.78

Each of the 100 Junes in the twentieth century provide one piece of information in calculating the average. Consequently, to calculate an average we divide the sum by the number of pieces of information.Hence, to calculate the average of the squared deviations, the variance, we must divide by the number of pieces of information.

Key Principle: To calculate an average we divide the sum by the number of pieces of information.

Mean (Average) =Sum

Number of Pieces of Information

Claim: The degrees of freedom equal the number of pieces of information that are available to estimate the variance of the error term’s probability distribution.

Question: Why does subtracting 2 from the sample size make sense?Suppose that the sample size were 2. With only two observations we only have two points.

Consequently, the two residuals, “estimated errors,” for each observation must always equal 0 when the sample size is 2 regardless of what the variance of the error term’s probability distribution actually equals:

Do the first two residuals provide information about the variance of the error term’s probability distribution?

Which observation provides the first piece of information about the variance of the error term’s probability distribution?

The first two observations provide no information about the variance.

Consequently, when the sample size is 3 we should divide by 1 to calculate the “average” of the squared deviations because we really only have 1 piece of information. In general, we should divide by the Degrees of Freedom:

Key principle: To calculate the average divide by the number of pieces of information.

Res1 = 0 and Res2 = 0

No

3rd

The best fitting line passes directly through each of the two points

The third observation provides the first piece of information about the variance.

Sample Size Number of Estimated Parameters

Dependent Variable: yExplanatory Variable(s): Estimate SE t-Statistic Prob

x 1.200000 0.519615 2.309401 0.2601Const 63.00000 8.874120 7.099296 0.0891

Number of Observations 3Sum Squared Residuals 54.00000SE of Regression 7.348469Estimated Equation: Esty = 63 + 1.2x

OLS Estimation Procedure and the Regression PrintoutThe ordinary least squares (OLS) estimation procedure actually includes three procedures:

A Procedure to Estimate the Value of the Parameters

A Procedure to Estimate the Variance of the Error Term’s Probability Distribution

A Procedure to Estimate the Variance of the Coefficient Estimate’s Probability Distribution

EViews

Good News: When the standard ordinary least squares (OLS) premises are satisfied:

Each of the three procedures is unbiased.

The procedure to estimate the value of the parameters is the best linear unbiased estimation procedure.

http://www3.amherst.edu/~fwesthoff/MITLinks/MIT-Quiz1.wf1

Documents

Lecture 7 Preview: Estimating the Variance of an Estimate’s Probability Distribution Review: Ordinary Least Squares (OLS) Estimation Procedure Importance