Upload
masabkhan
View
220
Download
0
Embed Size (px)
Citation preview
7/31/2019 Econometrics I 2
1/40
Applied Econometrics
William GreeneDepartment of EconomicsStern School of Business
7/31/2019 Econometrics I 2
2/40
Applied Econometrics
2. Regression and Projection
7/31/2019 Econometrics I 2
3/40
Statistical RelationshipObjective : Characterize the stochasticrelationship between a variable and a set of 'related' variables
Context: An inverse demand equation,P = + Q + Y, Y = income. Q and P are twoobviously related random variables. We areinterested in studying the relationship between P
and Q.By relationship we mean (usually) covariation.(Cause and effect is problematic.)
7/31/2019 Econometrics I 2
4/40
7/31/2019 Econometrics I 2
5/40
Implications
Regression is the conditional meanThere is always a conditional mean
It may not equal the structure implied by a theoryWhat is the implication for least squares estimation?LS always estimates regressionsLS does not necessarily estimate structuresStructures may not be estimable they may not beidentified .
7/31/2019 Econometrics I 2
6/40
Conditional MomentsThe conditional mean function is the regression function .
P = E[P|Q] + (P - E[P|Q]) = E [P|Q] + E[ |Q] = 0 = E[ ]. Proof: (The Law of iterated expectations)
Variance of the conditional random variable = conditionalvariance, or the scedastic function .
A trivial relationship may be written as P = h(Q) + , where therandom variable =P-h(Q) has zero mean by construction. Lookslike a regression model of sorts.
An extension: Can we carry Y as a parameter in the bivariatedistribution? Examine E [P|Q,Y]
7/31/2019 Econometrics I 2
7/40
Sample Data (Experiment)
7/31/2019 Econometrics I 2
8/40
50 Observations on P and QShowing Variation of P Around E[P]
7/31/2019 Econometrics I 2
9/40
Variation Around E[P|Q](Conditioning Reduces Variation)
7/31/2019 Econometrics I 2
10/40
Means of P for Given Group Means of Q
7/31/2019 Econometrics I 2
11/40
Another Conditioning Variable
E[P|Q,Y=1]
E[P|Q,Y=2]
7/31/2019 Econometrics I 2
12/40
Conditional Mean Functions
No requirement that they be "linear" (we willdiscuss what we mean by linear)No restrictions on conditional variances
7/31/2019 Econometrics I 2
13/40
Projections and Regressions
We explore the difference between the linear projectionand the conditional mean functiony = + x + where x, E( |x) = 0
Cov(x,y) = Cov(x, ) + Cov(x,x) + Cov(x, )= 0 + Var(x) + 0
So, = Cov(x,y) / Var(x)E(y) = + E(x) + E( )
= + E(x) + 0= E[y] - E[x].
7/31/2019 Econometrics I 2
14/40
Regression and Projection
Does this mean E[y|x] = + x?No. This is the linear projection of y on XIt is true in every bivariate distribution, whether ornot E[y|x] is linear in x.y can always be written y = + x + where x, = Cov(x,y) / Var(x) etc.
The conditional mean function is H(x) such thaty = H(x) + v where E[v|H(x)] = 0.
7/31/2019 Econometrics I 2
15/40
7/31/2019 Econometrics I 2
16/40
7/31/2019 Econometrics I 2
17/40
Linear Least Squares Projection
----------------------------------------------------------------------Ordinary least squares regression ............LHS=Y Mean = 1.21632
Standard deviation = .37592
Number of observs. = 100 Model size Parameters = 2
Degrees of freedom = 98Residuals Sum of squares = 9.95949
Standard error of e = .31879Fit R-squared = .28812
Adjusted R-squared = .28086--------+-------------------------------------------------------------
Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X--------+-------------------------------------------------------------Constant| .83368*** .06861 12.150 .0000
X| .24591*** .03905 6.298 .0000 1.55603--------+-------------------------------------------------------------
7/31/2019 Econometrics I 2
18/40
True Conditional Mean Function
True Conditional Mean Function E[y|x]
X
.35
.70
1.05
1.40
1.75
.00
1 2 30
E X P E C T D Y
7/31/2019 Econometrics I 2
19/40
True Data Generating Mechanism
7/31/2019 Econometrics I 2
20/40
Application: Doctor Visits
German Individual Health Care data: N=27,236Model for number of visits to the doctor:
True E[V|Income] = exp (1.412 - .0745*income)
Linear regression: g*(Income)=3.917 - .208*income
7/31/2019 Econometrics I 2
21/40
Conditional Mean and Projection
Notice the problem with the linear approach. Negative predictions.
Most of thedata are inhere This area is
outside the rangeof the data
7/31/2019 Econometrics I 2
22/40
Classical Linear Regression ModelThe model is y = f(x 1,x2,,xK , 1, 2, K ) +
= a multiple regression model (as opposed to multivariate).Emphasis on the multiple aspect of multiple regression.Important examples:
Marginal cost in a multiple output settingSeparate age and education effects in an earnings equation.
Form of the model E [y | x ] = a linear function of x .(Regressand vs. regressors)
Dependent and independent variables .Independent of what? Think in terms of autonomous variation.Can y just change? What causes the change?
Very careful on the issue of causality. Cause vs. association. Modelingcausality in econometrics
7/31/2019 Econometrics I 2
23/40
7/31/2019 Econometrics I 2
24/40
Linearity of the Modelf(x 1,x 2,,x K , 1, 2, K ) = x 1 1 + x 2 2 + + x K K
Notation: x 1 1 + x 2 2 + + x K K = x .
Boldface letter indicates a column vector. x denotes avariable, a function of a variable, or a function of a set of variables.There are K variables on the right hand side of theconditional mean function. The first variable is usually a constant term. (Wisdom:Models should have a constant term unless the theory saysthey should not.)
E[y| x ] = 1*1 + 2*x2 + + K *xK .
( 1*1 = the intercept term).
7/31/2019 Econometrics I 2
25/40
Linearity
Simple linear model, E[y|x]= x Quadratic model: E[y|x]= + 1x + 2x2
Loglinear model, E[lny|ln x ]= + k lnxk k Semilog, E[y|x]= + k lnxk k Translog: E[lny|ln x ]= + k lnxk k
+ (1/2) k l kl lnxk lnxl All are linear. An infinite number of variations.
7/31/2019 Econometrics I 2
26/40
LinearityLinearity means linear in the parameters , not in thevariables E[y| x ] = 1 f 1() + 2 f 2() + + K f K ().
f k () may be any function of data.Examples:Logs and levels in economicsTime trends, and time trends in loglinear models rates of growthDummy variablesQuadratics, power functions, log-quadratic, trig functions,interactions and so on.
7/31/2019 Econometrics I 2
27/40
Uniqueness of the Conditional Mean
The conditional mean relationship must hold for any set of n observations, i = 1,, n. Assume, that n K (justified later)
E[y1| x ] = x 1 E[y2| x ] = x 2
E[yn| x ] = x n All n observations at once: E[ y|X] = X = E .
7/31/2019 Econometrics I 2
28/40
Uniqueness of E[y|X]Now, suppose there is a that produces the same
expected value,
E [ y|X] = X = E .
Let = - . Then, X = X - X = E - E = 0 .
Is this possible? X is an n K matrix ( n rows, K columns).What does X = 0 mean? We assume this is not
possible. This is the full rank
assumption it is an identifiability assumption. Ultimately, it will imply thatwe can estimate . (We have yet to develop this.)This requires n K .
7/31/2019 Econometrics I 2
29/40
Linear DependenceExample: from your text:
x = [ i , Nonlabor income, Labor income, Total income]More formal statement of the uniqueness condition: No linear dependencies : No variable x K may be written as a
linear function of the other variables in the model. Anidentification condition . Theory does not rule it out, but itmakes estimation impossible. E.g.,y = 1 + 2N + 3S + 4T + , where T = N + S .y = 1 + ( 2+ a )N + ( 3+ a )S + ( 4-a )T + for any a ,
= 1 + 2N + 3S + 4T + .What do we estimate?Note, the model does not rule out nonlinear dependence .Having x and x 2 in the same equation is no problem.
3/49
7/31/2019 Econometrics I 2
30/40
An Enduring Art Mystery
Why do largerpaintings commandhigher prices?
The Persistence of Memory. SalvadorDali, 1931
The Persistence of EconometricsGreene, 2011
Graphics show relativesizes of the two works.
7/31/2019 Econometrics I 2
31/40
An Unidentified (But Valid)Theory of Art Appreciation
Enhanced Monet Area Effect Model:Height and Width Effects
Log(Price) = + 1 log Area +
2 log Aspect Ratio +
3 log Height +
4 Signature +
(Aspect Ratio = Height/Width)
7/31/2019 Econometrics I 2
32/40
Notation
Define column vectors of n observations on y and the K variables.
y
y
y
y
x x x
x x x
x x xn
K
K
n n nK K n
1
2
11 12 1
21 22 2
1 2
1
2
1
2
= X + The assumption means that the rank of the matrix X is K .
No linear dependencies => FULL COLUMN RANK of the matrix X.
7/31/2019 Econometrics I 2
33/40
Observed y will equal E[ y | x ] + random variation.y = E [y | x ] + (disturbance)
Is there any information about in x ? That is, doesmovement in x provide useful information about movement in? If so, then we have not fully specified the conditional
mean, and this function we are calling E [y | x ] is not theconditional mean (regression)
There may be information about in other variables. But, notin x . If E [ | x ] 0 then it follows that Cov[ , x ] 0. Thisviolates the (as yet still not fully defined) independence assumption
7/31/2019 Econometrics I 2
34/40
Zero Conditional Mean of E[ |all data in X ] = 0
E[ |X ] = 0 is stronger than E[ i | x i] = 0
The second says that knowledge of x i provides no informationabout the mean of i. The first says that no x j providesinformation about the expected value of i, not the i th observation and not any other observation either.
No information is the same as no correlation. Proof:Cov[ X , ] = Cov[ X ,E[ | X ]] = 0
7/31/2019 Econometrics I 2
35/40
The Difference Between E[ |X]=0 and E[ ]=0
7/31/2019 Econometrics I 2
36/40
7/31/2019 Econometrics I 2
37/40
of
An assumption of limited usefulness
Used to facilitate finite sample derivations of certain teststatistics.
Temporary.
7/31/2019 Econometrics I 2
38/40
Linear Model y = X + , N observations, K columns in X , including a column of ones.
Standard assumptions about X Standard assumptions about |X E[ |X]=0, E[ ]=0 and Cov[ ,x]=0
Regression?If E[ y | X ] = X
Approximation: Then this is a linear projection, not a Taylorseries.
7/31/2019 Econometrics I 2
39/40
Representing the RelationshipConditional mean function: E[y | x ] = g( x )Linear approximation to the conditional mean function: LinearTaylor series
The linear projection (linear regression?)
k
0 K 0 0k=1 k k k
K 00 k=1 k k
g( ) = g( ) + [g | = ](x -x )
= + (x -x )
x x x x
K
k k 0 1 k k
0
-1
g*(x)= (x -E[x ])
E[y]
Var[ ]} {Cov[ ,y]} x x
7/31/2019 Econometrics I 2
40/40