Econometrics I 2

7/31/2019 Econometrics I 2

1/40

Applied Econometrics

William GreeneDepartment of EconomicsStern School of Business


2/40

Applied Econometrics

2. Regression and Projection


3/40

Statistical RelationshipObjective : Characterize the stochasticrelationship between a variable and a set of 'related' variables

Context: An inverse demand equation,P = + Q + Y, Y = income. Q and P are twoobviously related random variables. We areinterested in studying the relationship between P

and Q.By relationship we mean (usually) covariation.(Cause and effect is problematic.)


4/40


5/40

Implications

Regression is the conditional meanThere is always a conditional mean

It may not equal the structure implied by a theoryWhat is the implication for least squares estimation?LS always estimates regressionsLS does not necessarily estimate structuresStructures may not be estimable they may not beidentified .


6/40

Conditional MomentsThe conditional mean function is the regression function .

P = E[P|Q] + (P - E[P|Q]) = E [P|Q] + E[ |Q] = 0 = E[ ]. Proof: (The Law of iterated expectations)

Variance of the conditional random variable = conditionalvariance, or the scedastic function .

A trivial relationship may be written as P = h(Q) + , where therandom variable =P-h(Q) has zero mean by construction. Lookslike a regression model of sorts.

An extension: Can we carry Y as a parameter in the bivariatedistribution? Examine E [P|Q,Y]


7/40

Sample Data (Experiment)


8/40

50 Observations on P and QShowing Variation of P Around E[P]


9/40

Variation Around E[P|Q](Conditioning Reduces Variation)


10/40

Means of P for Given Group Means of Q


11/40

Another Conditioning Variable

E[P|Q,Y=1]

E[P|Q,Y=2]


12/40

Conditional Mean Functions

No requirement that they be "linear" (we willdiscuss what we mean by linear)No restrictions on conditional variances


13/40

Projections and Regressions

We explore the difference between the linear projectionand the conditional mean functiony = + x + where x, E( |x) = 0

Cov(x,y) = Cov(x, ) + Cov(x,x) + Cov(x, )= 0 + Var(x) + 0

So, = Cov(x,y) / Var(x)E(y) = + E(x) + E( )

= + E(x) + 0= E[y] - E[x].


14/40

Regression and Projection

Does this mean E[y|x] = + x?No. This is the linear projection of y on XIt is true in every bivariate distribution, whether ornot E[y|x] is linear in x.y can always be written y = + x + where x, = Cov(x,y) / Var(x) etc.

The conditional mean function is H(x) such thaty = H(x) + v where E[v|H(x)] = 0.


15/40


16/40


17/40

Linear Least Squares Projection

----------------------------------------------------------------------Ordinary least squares regression ............LHS=Y Mean = 1.21632

Standard deviation = .37592

Number of observs. = 100 Model size Parameters = 2

Degrees of freedom = 98Residuals Sum of squares = 9.95949

Standard error of e = .31879Fit R-squared = .28812

Adjusted R-squared = .28086--------+-------------------------------------------------------------

Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X--------+-------------------------------------------------------------Constant| .83368*** .06861 12.150 .0000

X| .24591*** .03905 6.298 .0000 1.55603--------+-------------------------------------------------------------


18/40

True Conditional Mean Function

True Conditional Mean Function E[y|x]

X

.35

.70

1.05

1.40

1.75

.00

1 2 30

E X P E C T D Y


19/40

True Data Generating Mechanism


20/40

Application: Doctor Visits

German Individual Health Care data: N=27,236Model for number of visits to the doctor:

True E[V|Income] = exp (1.412 - .0745*income)

Linear regression: g*(Income)=3.917 - .208*income


21/40

Conditional Mean and Projection

Notice the problem with the linear approach. Negative predictions.

Most of thedata are inhere This area is

outside the rangeof the data


22/40

Classical Linear Regression ModelThe model is y = f(x 1,x2,,xK , 1, 2, K ) +

= a multiple regression model (as opposed to multivariate).Emphasis on the multiple aspect of multiple regression.Important examples:

Marginal cost in a multiple output settingSeparate age and education effects in an earnings equation.

Form of the model E [y | x ] = a linear function of x .(Regressand vs. regressors)

Dependent and independent variables .Independent of what? Think in terms of autonomous variation.Can y just change? What causes the change?

Very careful on the issue of causality. Cause vs. association. Modelingcausality in econometrics


23/40


24/40

Linearity of the Modelf(x 1,x 2,,x K , 1, 2, K ) = x 1 1 + x 2 2 + + x K K

Notation: x 1 1 + x 2 2 + + x K K = x .

Boldface letter indicates a column vector. x denotes avariable, a function of a variable, or a function of a set of variables.There are K variables on the right hand side of theconditional mean function. The first variable is usually a constant term. (Wisdom:Models should have a constant term unless the theory saysthey should not.)

E[y| x ] = 1*1 + 2*x2 + + K *xK .

( 1*1 = the intercept term).


25/40

Linearity

Simple linear model, E[y|x]= x Quadratic model: E[y|x]= + 1x + 2x2

Loglinear model, E[lny|ln x ]= + k lnxk k Semilog, E[y|x]= + k lnxk k Translog: E[lny|ln x ]= + k lnxk k

+ (1/2) k l kl lnxk lnxl All are linear. An infinite number of variations.


26/40

LinearityLinearity means linear in the parameters , not in thevariables E[y| x ] = 1 f 1() + 2 f 2() + + K f K ().

f k () may be any function of data.Examples:Logs and levels in economicsTime trends, and time trends in loglinear models rates of growthDummy variablesQuadratics, power functions, log-quadratic, trig functions,interactions and so on.


27/40

Uniqueness of the Conditional Mean

The conditional mean relationship must hold for any set of n observations, i = 1,, n. Assume, that n K (justified later)

E[y1| x ] = x 1 E[y2| x ] = x 2

E[yn| x ] = x n All n observations at once: E[ y|X] = X = E .


28/40

Uniqueness of E[y|X]Now, suppose there is a that produces the same

expected value,

E [ y|X] = X = E .

Let = - . Then, X = X - X = E - E = 0 .

Is this possible? X is an n K matrix ( n rows, K columns).What does X = 0 mean? We assume this is not

possible. This is the full rank

assumption it is an identifiability assumption. Ultimately, it will imply thatwe can estimate . (We have yet to develop this.)This requires n K .


29/40

Linear DependenceExample: from your text:

x = [ i , Nonlabor income, Labor income, Total income]More formal statement of the uniqueness condition: No linear dependencies : No variable x K may be written as a

linear function of the other variables in the model. Anidentification condition . Theory does not rule it out, but itmakes estimation impossible. E.g.,y = 1 + 2N + 3S + 4T + , where T = N + S .y = 1 + ( 2+ a )N + ( 3+ a )S + ( 4-a )T + for any a ,

= 1 + 2N + 3S + 4T + .What do we estimate?Note, the model does not rule out nonlinear dependence .Having x and x 2 in the same equation is no problem.

3/49


30/40

An Enduring Art Mystery

Why do largerpaintings commandhigher prices?

The Persistence of Memory. SalvadorDali, 1931

The Persistence of EconometricsGreene, 2011

Graphics show relativesizes of the two works.


31/40

An Unidentified (But Valid)Theory of Art Appreciation

Enhanced Monet Area Effect Model:Height and Width Effects

Log(Price) = + 1 log Area +

2 log Aspect Ratio +

3 log Height +

4 Signature +

(Aspect Ratio = Height/Width)


32/40

Notation

Define column vectors of n observations on y and the K variables.

y

y

y

y

x x x

x x x

x x xn

K

K

n n nK K n

1

2

11 12 1

21 22 2

1 2

1

2

1

2

= X + The assumption means that the rank of the matrix X is K .

No linear dependencies => FULL COLUMN RANK of the matrix X.


33/40

Observed y will equal E[ y | x ] + random variation.y = E [y | x ] + (disturbance)

Is there any information about in x ? That is, doesmovement in x provide useful information about movement in? If so, then we have not fully specified the conditional

mean, and this function we are calling E [y | x ] is not theconditional mean (regression)

There may be information about in other variables. But, notin x . If E [ | x ] 0 then it follows that Cov[ , x ] 0. Thisviolates the (as yet still not fully defined) independence assumption


34/40

Zero Conditional Mean of E[ |all data in X ] = 0

E[ |X ] = 0 is stronger than E[ i | x i] = 0

The second says that knowledge of x i provides no informationabout the mean of i. The first says that no x j providesinformation about the expected value of i, not the i th observation and not any other observation either.

No information is the same as no correlation. Proof:Cov[ X , ] = Cov[ X ,E[ | X ]] = 0


35/40

The Difference Between E[ |X]=0 and E[ ]=0


36/40


37/40

of

An assumption of limited usefulness

Used to facilitate finite sample derivations of certain teststatistics.

Temporary.


38/40

Linear Model y = X + , N observations, K columns in X , including a column of ones.

Standard assumptions about X Standard assumptions about |X E[ |X]=0, E[ ]=0 and Cov[ ,x]=0

Regression?If E[ y | X ] = X

Approximation: Then this is a linear projection, not a Taylorseries.


39/40

Representing the RelationshipConditional mean function: E[y | x ] = g( x )Linear approximation to the conditional mean function: LinearTaylor series

The linear projection (linear regression?)

k

0 K 0 0k=1 k k k

K 00 k=1 k k

g( ) = g( ) + [g | = ](x -x )

= + (x -x )

x x x x

K

k k 0 1 k k

0

-1

g*(x)= (x -E[x ])

E[y]

Var[ ]} {Cov[ ,y]} x x


40/40

Documents

Econometrics I 2