Econometrics I 2

Embed Size (px)

Citation preview

  • 7/31/2019 Econometrics I 2

    1/40

    Applied Econometrics

    William GreeneDepartment of EconomicsStern School of Business

  • 7/31/2019 Econometrics I 2

    2/40

    Applied Econometrics

    2. Regression and Projection

  • 7/31/2019 Econometrics I 2

    3/40

    Statistical RelationshipObjective : Characterize the stochasticrelationship between a variable and a set of 'related' variables

    Context: An inverse demand equation,P = + Q + Y, Y = income. Q and P are twoobviously related random variables. We areinterested in studying the relationship between P

    and Q.By relationship we mean (usually) covariation.(Cause and effect is problematic.)

  • 7/31/2019 Econometrics I 2

    4/40

  • 7/31/2019 Econometrics I 2

    5/40

    Implications

    Regression is the conditional meanThere is always a conditional mean

    It may not equal the structure implied by a theoryWhat is the implication for least squares estimation?LS always estimates regressionsLS does not necessarily estimate structuresStructures may not be estimable they may not beidentified .

  • 7/31/2019 Econometrics I 2

    6/40

    Conditional MomentsThe conditional mean function is the regression function .

    P = E[P|Q] + (P - E[P|Q]) = E [P|Q] + E[ |Q] = 0 = E[ ]. Proof: (The Law of iterated expectations)

    Variance of the conditional random variable = conditionalvariance, or the scedastic function .

    A trivial relationship may be written as P = h(Q) + , where therandom variable =P-h(Q) has zero mean by construction. Lookslike a regression model of sorts.

    An extension: Can we carry Y as a parameter in the bivariatedistribution? Examine E [P|Q,Y]

  • 7/31/2019 Econometrics I 2

    7/40

    Sample Data (Experiment)

  • 7/31/2019 Econometrics I 2

    8/40

    50 Observations on P and QShowing Variation of P Around E[P]

  • 7/31/2019 Econometrics I 2

    9/40

    Variation Around E[P|Q](Conditioning Reduces Variation)

  • 7/31/2019 Econometrics I 2

    10/40

    Means of P for Given Group Means of Q

  • 7/31/2019 Econometrics I 2

    11/40

    Another Conditioning Variable

    E[P|Q,Y=1]

    E[P|Q,Y=2]

  • 7/31/2019 Econometrics I 2

    12/40

    Conditional Mean Functions

    No requirement that they be "linear" (we willdiscuss what we mean by linear)No restrictions on conditional variances

  • 7/31/2019 Econometrics I 2

    13/40

    Projections and Regressions

    We explore the difference between the linear projectionand the conditional mean functiony = + x + where x, E( |x) = 0

    Cov(x,y) = Cov(x, ) + Cov(x,x) + Cov(x, )= 0 + Var(x) + 0

    So, = Cov(x,y) / Var(x)E(y) = + E(x) + E( )

    = + E(x) + 0= E[y] - E[x].

  • 7/31/2019 Econometrics I 2

    14/40

    Regression and Projection

    Does this mean E[y|x] = + x?No. This is the linear projection of y on XIt is true in every bivariate distribution, whether ornot E[y|x] is linear in x.y can always be written y = + x + where x, = Cov(x,y) / Var(x) etc.

    The conditional mean function is H(x) such thaty = H(x) + v where E[v|H(x)] = 0.

  • 7/31/2019 Econometrics I 2

    15/40

  • 7/31/2019 Econometrics I 2

    16/40

  • 7/31/2019 Econometrics I 2

    17/40

    Linear Least Squares Projection

    ----------------------------------------------------------------------Ordinary least squares regression ............LHS=Y Mean = 1.21632

    Standard deviation = .37592

    Number of observs. = 100 Model size Parameters = 2

    Degrees of freedom = 98Residuals Sum of squares = 9.95949

    Standard error of e = .31879Fit R-squared = .28812

    Adjusted R-squared = .28086--------+-------------------------------------------------------------

    Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X--------+-------------------------------------------------------------Constant| .83368*** .06861 12.150 .0000

    X| .24591*** .03905 6.298 .0000 1.55603--------+-------------------------------------------------------------

  • 7/31/2019 Econometrics I 2

    18/40

    True Conditional Mean Function

    True Conditional Mean Function E[y|x]

    X

    .35

    .70

    1.05

    1.40

    1.75

    .00

    1 2 30

    E X P E C T D Y

  • 7/31/2019 Econometrics I 2

    19/40

    True Data Generating Mechanism

  • 7/31/2019 Econometrics I 2

    20/40

    Application: Doctor Visits

    German Individual Health Care data: N=27,236Model for number of visits to the doctor:

    True E[V|Income] = exp (1.412 - .0745*income)

    Linear regression: g*(Income)=3.917 - .208*income

  • 7/31/2019 Econometrics I 2

    21/40

    Conditional Mean and Projection

    Notice the problem with the linear approach. Negative predictions.

    Most of thedata are inhere This area is

    outside the rangeof the data

  • 7/31/2019 Econometrics I 2

    22/40

    Classical Linear Regression ModelThe model is y = f(x 1,x2,,xK , 1, 2, K ) +

    = a multiple regression model (as opposed to multivariate).Emphasis on the multiple aspect of multiple regression.Important examples:

    Marginal cost in a multiple output settingSeparate age and education effects in an earnings equation.

    Form of the model E [y | x ] = a linear function of x .(Regressand vs. regressors)

    Dependent and independent variables .Independent of what? Think in terms of autonomous variation.Can y just change? What causes the change?

    Very careful on the issue of causality. Cause vs. association. Modelingcausality in econometrics

  • 7/31/2019 Econometrics I 2

    23/40

  • 7/31/2019 Econometrics I 2

    24/40

    Linearity of the Modelf(x 1,x 2,,x K , 1, 2, K ) = x 1 1 + x 2 2 + + x K K

    Notation: x 1 1 + x 2 2 + + x K K = x .

    Boldface letter indicates a column vector. x denotes avariable, a function of a variable, or a function of a set of variables.There are K variables on the right hand side of theconditional mean function. The first variable is usually a constant term. (Wisdom:Models should have a constant term unless the theory saysthey should not.)

    E[y| x ] = 1*1 + 2*x2 + + K *xK .

    ( 1*1 = the intercept term).

  • 7/31/2019 Econometrics I 2

    25/40

    Linearity

    Simple linear model, E[y|x]= x Quadratic model: E[y|x]= + 1x + 2x2

    Loglinear model, E[lny|ln x ]= + k lnxk k Semilog, E[y|x]= + k lnxk k Translog: E[lny|ln x ]= + k lnxk k

    + (1/2) k l kl lnxk lnxl All are linear. An infinite number of variations.

  • 7/31/2019 Econometrics I 2

    26/40

    LinearityLinearity means linear in the parameters , not in thevariables E[y| x ] = 1 f 1() + 2 f 2() + + K f K ().

    f k () may be any function of data.Examples:Logs and levels in economicsTime trends, and time trends in loglinear models rates of growthDummy variablesQuadratics, power functions, log-quadratic, trig functions,interactions and so on.

  • 7/31/2019 Econometrics I 2

    27/40

    Uniqueness of the Conditional Mean

    The conditional mean relationship must hold for any set of n observations, i = 1,, n. Assume, that n K (justified later)

    E[y1| x ] = x 1 E[y2| x ] = x 2

    E[yn| x ] = x n All n observations at once: E[ y|X] = X = E .

  • 7/31/2019 Econometrics I 2

    28/40

    Uniqueness of E[y|X]Now, suppose there is a that produces the same

    expected value,

    E [ y|X] = X = E .

    Let = - . Then, X = X - X = E - E = 0 .

    Is this possible? X is an n K matrix ( n rows, K columns).What does X = 0 mean? We assume this is not

    possible. This is the full rank

    assumption it is an identifiability assumption. Ultimately, it will imply thatwe can estimate . (We have yet to develop this.)This requires n K .

  • 7/31/2019 Econometrics I 2

    29/40

    Linear DependenceExample: from your text:

    x = [ i , Nonlabor income, Labor income, Total income]More formal statement of the uniqueness condition: No linear dependencies : No variable x K may be written as a

    linear function of the other variables in the model. Anidentification condition . Theory does not rule it out, but itmakes estimation impossible. E.g.,y = 1 + 2N + 3S + 4T + , where T = N + S .y = 1 + ( 2+ a )N + ( 3+ a )S + ( 4-a )T + for any a ,

    = 1 + 2N + 3S + 4T + .What do we estimate?Note, the model does not rule out nonlinear dependence .Having x and x 2 in the same equation is no problem.

    3/49

  • 7/31/2019 Econometrics I 2

    30/40

    An Enduring Art Mystery

    Why do largerpaintings commandhigher prices?

    The Persistence of Memory. SalvadorDali, 1931

    The Persistence of EconometricsGreene, 2011

    Graphics show relativesizes of the two works.

  • 7/31/2019 Econometrics I 2

    31/40

    An Unidentified (But Valid)Theory of Art Appreciation

    Enhanced Monet Area Effect Model:Height and Width Effects

    Log(Price) = + 1 log Area +

    2 log Aspect Ratio +

    3 log Height +

    4 Signature +

    (Aspect Ratio = Height/Width)

  • 7/31/2019 Econometrics I 2

    32/40

    Notation

    Define column vectors of n observations on y and the K variables.

    y

    y

    y

    y

    x x x

    x x x

    x x xn

    K

    K

    n n nK K n

    1

    2

    11 12 1

    21 22 2

    1 2

    1

    2

    1

    2

    = X + The assumption means that the rank of the matrix X is K .

    No linear dependencies => FULL COLUMN RANK of the matrix X.

  • 7/31/2019 Econometrics I 2

    33/40

    Observed y will equal E[ y | x ] + random variation.y = E [y | x ] + (disturbance)

    Is there any information about in x ? That is, doesmovement in x provide useful information about movement in? If so, then we have not fully specified the conditional

    mean, and this function we are calling E [y | x ] is not theconditional mean (regression)

    There may be information about in other variables. But, notin x . If E [ | x ] 0 then it follows that Cov[ , x ] 0. Thisviolates the (as yet still not fully defined) independence assumption

  • 7/31/2019 Econometrics I 2

    34/40

    Zero Conditional Mean of E[ |all data in X ] = 0

    E[ |X ] = 0 is stronger than E[ i | x i] = 0

    The second says that knowledge of x i provides no informationabout the mean of i. The first says that no x j providesinformation about the expected value of i, not the i th observation and not any other observation either.

    No information is the same as no correlation. Proof:Cov[ X , ] = Cov[ X ,E[ | X ]] = 0

  • 7/31/2019 Econometrics I 2

    35/40

    The Difference Between E[ |X]=0 and E[ ]=0

  • 7/31/2019 Econometrics I 2

    36/40

  • 7/31/2019 Econometrics I 2

    37/40

    of

    An assumption of limited usefulness

    Used to facilitate finite sample derivations of certain teststatistics.

    Temporary.

  • 7/31/2019 Econometrics I 2

    38/40

    Linear Model y = X + , N observations, K columns in X , including a column of ones.

    Standard assumptions about X Standard assumptions about |X E[ |X]=0, E[ ]=0 and Cov[ ,x]=0

    Regression?If E[ y | X ] = X

    Approximation: Then this is a linear projection, not a Taylorseries.

  • 7/31/2019 Econometrics I 2

    39/40

    Representing the RelationshipConditional mean function: E[y | x ] = g( x )Linear approximation to the conditional mean function: LinearTaylor series

    The linear projection (linear regression?)

    k

    0 K 0 0k=1 k k k

    K 00 k=1 k k

    g( ) = g( ) + [g | = ](x -x )

    = + (x -x )

    x x x x

    K

    k k 0 1 k k

    0

    -1

    g*(x)= (x -E[x ])

    E[y]

    Var[ ]} {Cov[ ,y]} x x

  • 7/31/2019 Econometrics I 2

    40/40