Folien Econometrics I Teil3

  • Upload
    humus

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

  • 7/25/2019 Folien Econometrics I Teil3

    1/63

    Homoskedasticity

    How big is the difference between the OLS estimator and the

    true parameter? To answer this question, we make an additional

    assumption called homoskedasticity:

    Var (u|X) =2

    . (23)

    This means that the variance of the error term u is the same,

    regardless of the predictor variableX.

    If assumption (23) is violated, e.g. if Var (u|X) = 2h(X), thenwe say the error term is heteroskedastic.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-55

  • 7/25/2019 Folien Econometrics I Teil3

    2/63

    Homoskedasticity

    Assumption (23) certainly holds, if u and Xare assumed to beindependent. However, (23) is a weaker assumption.

    Assumption (23) implies that2 is also the unconditional varianceofu, referred to as error variance:

    Var (u) = E(u2) (E(u))2 =2.

    Its square root is the standard deviation of the error.

    It follows thatVar (Y|X) =2.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-56

  • 7/25/2019 Folien Econometrics I Teil3

    3/63

    Variance of the OLS estimator

    How large is the variation of the OLS estimator around the true

    parameter?

    Difference 1 1 is 0 on average

    Measure the variation of the OLS estimator around the true

    parameter through the expected squared difference, i.e. thevariance:

    Var

    1

    = E((1 1)2) (24)

    Similarly for 0: Var

    0

    = E((0 0)2).

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-57

  • 7/25/2019 Folien Econometrics I Teil3

    4/63

    Variance of the OLS estimatorVariance of the slope estimator 1 follows from (22):

    Var

    1

    =

    1

    N2(s2x)2

    Ni=1

    (xi x)2Var (ui)

    = 2

    N2(s2x)2

    N

    i=1

    (xi

    x)2 = 2

    N s2x. (25)

    The variance of the slope estimator is the larger, the smaller thenumber of observations N(or the smaller, the larger N).

    Increasing Nby a factor of 4 reduces the variance by a factor of1/4.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-58

  • 7/25/2019 Folien Econometrics I Teil3

    5/63

    Variance of the OLS estimator

    Dependence on the error variance2

    :

    The variance of the slope estimator is the larger, the larger theerror variance 2.

    Dependence on the design, i.e. the predictor variable X:

    The variance of the slope estimator is the larger, the smaller thevariation in X, measured by s2x.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-59

  • 7/25/2019 Folien Econometrics I Teil3

    6/63

    Variance of the OLS estimator

    The variance is in general different for the two parameters of the

    simple regression model. Var

    0

    is given by (without proof):

    Var0 = 2

    N s2x

    N

    i=1

    x2i . (26)

    The standard deviations sd(0) and sd(1) of the OLS estimators

    are defined as:

    sd(0) =

    Var

    0

    , sd(1) =

    Var

    1

    .

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-60

  • 7/25/2019 Folien Econometrics I Teil3

    7/63

    Mile stone II

    The Multiple Regression Model

    Step 1: Model Definition

    Step 2: OLS Estimation Step 3: Econometric Inference

    Step 4: OLS Residuals

    Step 5: Testing Hypothesis

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-60

  • 7/25/2019 Folien Econometrics I Teil3

    8/63

    Mile stone II

    Step 6: Model Evaluation and Model Comparison Step 7: Residual Diagnostics

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-61

  • 7/25/2019 Folien Econometrics I Teil3

    9/63

    Cross-sectional data

    We are interested in a dependent (left-hand side, explained, re-

    sponse) variable Y, which is supposed to depend on K explana-

    tory (right-hand sided, independent, control, predictor) variables

    X1, . . . , X K

    Examples: wage is a response and education, gender, and expe-

    rience are predictor variables

    we are observing these variables for N subjects drawn randomlyfrom a population (e.g. for various supermarkets, for various

    individuals):

    (yi, x1,i, . . . , xK,i), i= 1, . . . , N

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-62

  • 7/25/2019 Folien Econometrics I Teil3

    10/63

    II.1 Model formulation

    The multiple regression model describes the relation between the

    response variable Y and the predictor variablesX1, . . . , X K as:

    Y =0+ 1X1+ . . . + KXK+ u, (27)

    0, 1, . . . , Kare unknown parameters.

    Key assumption:

    E(u|X1, . . . , X K) = E(u) = 0. (28)

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-63

  • 7/25/2019 Folien Econometrics I Teil3

    11/63

    Model formulation

    Assumption (28) implies:

    E(Y|X1, . . . , X K) =0+ 1X1+ . . . + KXK. (29)

    E(Y|X1, . . . , X K) is a linear function in the parameters 0, 1, . . . , K (important for ,,easy OLS

    estimation),

    and in the predictor variables X1, . . . , X K (important for thecorrect interpretation of the parameters).

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-64

  • 7/25/2019 Folien Econometrics I Teil3

    12/63

    Understanding the parameters

    The parameter k is the expected absolute change of the responsevariable Y, if the predictor variable Xk is increased by 1, and all

    other predictor variables remain the same (ceteris paribus):

    E(Y|Xk) = E(Y|Xk=x + Xk) E(Y|Xk=x) =0+ 1X1+ . . . + k(x + Xk) + . . . + KXK

    (0+ 1X1+ . . . + kx + . . . + KXK) =kXk.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-65

  • 7/25/2019 Folien Econometrics I Teil3

    13/63

    Understanding the parameters

    The sign shows the direction of the expected change:

    If k >0, then the change of Xk and Y goes into the samedirection.

    If k

  • 7/25/2019 Folien Econometrics I Teil3

    14/63

    The multiple log-linear model

    The multiple log-linear model reads:

    Y =e0 X11 XKK eu. (30)

    The log transformation yields a model that is linear in the parameters

    0, 1, . . . , K,

    log Y =0+ 1 log X1+ . . . + Klog XK+ u, (31)

    but is nonlinear in the predictor variables X1, . . . , X K. Important

    for the correct interpretation of the parameters.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-67

  • 7/25/2019 Folien Econometrics I Teil3

    15/63

    The multiple log-linear model

    The coefficient

    kis the elasticity of the response variableY with

    respect to the variable Xk, i.e. the expected relative change of

    Y, if the predictor variable Xk is increased by 1% and all other

    predictor variables remain the same (ceteris paribus).

    If Xk is increased by p%, then (ceteris paribus) the expectedrelative change ofY is equal to kp%. On average, Y increasesbykp%, ifk >0, and decreases by|k|p%, ifk 0, and increases by|k|p%, ifk

  • 7/25/2019 Folien Econometrics I Teil3

    16/63

    EVIEWS Exercise II.1.2

    Show in EViews, how to define a multiple regression model anddiscuss the meaning of the estimated parameters:

    Case Study Chicken, work file chicken;

    Case Study Marketing, work file marketing; Case Study profit, work file profit;

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-69

  • 7/25/2019 Folien Econometrics I Teil3

    17/63

    II.2 OLS-Estimation

    Let (yi, x1,i, . . . , xK,i), i = 1, . . . , N denote a random sample ofsize Nfrom the population. Hence, for each i:

    yi=0+ 1x1,i+ . . . + kxk,i+ . . . + KxK,i+ ui. (32)

    The population parameters 0, 1, and K are estimated from asample. The parameters estimates (coefficients) are typically deno-

    ted by 0, 1, . . . , K. We will use the following vector notation:

    = (0, . . . , K)

    , = (0, 1, . . . , K)

    . (33)

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-70

  • 7/25/2019 Folien Econometrics I Teil3

    18/63

    II.2 OLS-Estimation

    The commonly used method to estimate the parameters in a multipleregression model is, again, OLS estimation:

    For each observation yi, the prediction yi() of yi depends on= (0, . . . , K).

    For each yi, define the regression residuals (prediction error)ui() as:

    ui() =yi

    yi() =yi

    (0+ 1x1,i+ . . . + KxK,i). (34)

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-71

  • 7/25/2019 Folien Econometrics I Teil3

    19/63

    OLS-Estimation for the Multiple Regression Model

    For each parameter value, an overall measure of fit is obtained

    by aggregating these prediction errors.

    The sum of squared residuals (SSR):

    SSR =Ni=1

    ui()2 =

    Ni=1

    (yi 0 1x1,i . . . KxK,i)2.(35)

    The OLS-estimator = (0, 1, . . . , K) is the parameter thatminimizes the sum of squared residuals.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-72

  • 7/25/2019 Folien Econometrics I Teil3

    20/63

    How to compute the OLS Estimator?

    For a multiple regression model, the estimation problem is solved

    by software packages like EViews.

    Some mathematical details:

    Take the first partial derivative of (35) with respect to each

    parameter k, k= 0, . . . , K .

    This yields a systemK+ 1linear equations in0, . . . , K, whichhas a unique solution under certain conditions on the matrixX,

    having N rows and K+ 1columns, containing in each rowi thepredictor values (1 x1,i . . . xK,i).

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-73

  • 7/25/2019 Folien Econometrics I Teil3

    21/63

    Matrix notation of the multiple regression model

    Matrix notation for the observed data:

    X=

    1 x1,1 ... xK,1

    1 x1,2 ... xK,2... ... ... ...

    1 x1,N1 ... xK,N11 x1,N ... xK,N

    , y=

    y1

    y2...

    yN1

    yN

    .

    X is N (K+ 1)-matrix, y is N 1-vector.

    The X

    X is a quadratic matrix with(K+ 1) rows and columns.(X

    X)1 is the inverse ofX

    X.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-74

  • 7/25/2019 Folien Econometrics I Teil3

    22/63

    Matrix notation of the multiple regression model

    In matrix notation, theNequations given in (32) for i= 1, . . . , N ,

    may be written as:

    y=X + u,

    where

    u=

    u1

    u2...

    uN

    , =

    0...

    K

    .

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-75

  • 7/25/2019 Folien Econometrics I Teil3

    23/63

    The OLS Estimator

    The OLS estimator

    has an explicit form, depending onX

    andthe vector y, containing all observed valuesy1, . . . , yN.

    The OLS estimator is given by:

    = (

    X

    X)

    1X

    y. (36)

    The matrix X

    X has to be invertible, in order to obtain a unique

    estimator .

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-76

  • 7/25/2019 Folien Econometrics I Teil3

    24/63

    The OLS Estimator

    Necessary conditions forX

    Xbeing invertible:

    We have to observe sample variation for each predictor Xk;i.e. the sample variances of xk,1, . . . , xk,N is positive for all

    k= 1, . . . , K .

    Furthermore, no exact linear relation between any predictors XkandXlshould be present; i.e. the empirical correlation coefficient

    of all pairwise data sets(xk,i, xl,i), i= 1, . . . , N is different from

    1 and -1.

    EViews produces an error, ifX

    X is not invertible.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-77

  • 7/25/2019 Folien Econometrics I Teil3

    25/63

    Perfect Multicollinearity

    A sufficient assumptions about the predictors X1, . . . , X K in a

    multiple regression model is the following:

    The predictors X1, . . . , X K are not linearly dependent, i.e. nopredictor Xj may be expressed as a linear function of the remai-

    ning predictors X1, . . . , X j1, Xj+1, . . . , X K.

    If this assumption is violated, then the OLS estimator does not

    exist, as the matrix X

    X is not invertible.

    There are infinitely many parameters values having the same

    minimal sum of squared residuals, defined in (35). The parametersin the regression model are not identified.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-78

  • 7/25/2019 Folien Econometrics I Teil3

    26/63

    Case Study YieldsDemonstration in EVIEWS, workfile yieldus

    yi = 1+ 2x2,i+ 3x3,i+ 4x4,i+ ui,

    yi . . . yield with maturity 3 months

    x2,i . . . yield with maturity 1 month

    x3,i . . . yield with maturity 60 months

    x4,i . . . spread between these yields

    x4,i=x3,i

    x2,i

    x4,i is a linear combination ofx2,i and x3,i

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-79

  • 7/25/2019 Folien Econometrics I Teil3

    27/63

    Case Study Yields

    Let

    = (1

    , 2

    , 3

    , 4) be a certain parameter

    Any parameter = (1, 2 ,

    3 ,

    4), where

    4 may be arbitrarily

    chosen and

    3 =3+ 4

    4

    2 =2 4+ 4will lead to the same sum of mean squared errors as . The OLS

    estimator is not unique.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-80

  • 7/25/2019 Folien Econometrics I Teil3

    28/63

    II.3 Understanding Econometric Inference

    Econometric inference: learning from the data about the unknownparameter in the regression model.

    Use the OLS estimator to learn about the regression parameter.

    Is this estimator equal to the true value? How large is the difference between the OLS estimator and the

    true parameter?

    Is there a better estimator than the OLS estimator?

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-81

  • 7/25/2019 Folien Econometrics I Teil3

    29/63

    Unbiasedness

    Under the assumptions (28), the OLS estimator (if it exists) isunbiased, i.e. the estimated values are on average equal to the

    true values:

    E(j) =j, j = 0, . . . , K .

    In matrix notation:

    E() =, E( ) = 0. (37)

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-82

  • 7/25/2019 Folien Econometrics I Teil3

    30/63

    Unbiasedness of the OLS estimator

    If the data are generated by the modely=X + u, then the OLS

    estimator may be expressed as:

    = (X

    X)1X

    y= (X

    X)1X

    (X + u) = + (X

    X)1X

    u.

    Therefore the estimation error may be expressed as:

    = (XX)1Xu. (38)

    Result (37) follows immediately:

    E( ) = (XX)1XE(u) =0.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-83

  • 7/25/2019 Folien Econometrics I Teil3

    31/63

    Covariance Matrix of the OLS Estimator

    Due to unbiasedness, the expected valueE(j)of the OLS estimator

    is equal to j forj = 0, . . . , K .

    Hence, the variance Var

    j

    measures the variation of the OLS

    estimator j around the true valuej:

    Var

    j

    = E

    (j E(j))2

    = E

    (j j)2

    .

    Are the deviation of the estimator from the true value correlated

    for different coefficients of the OLS estimators?

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-84

  • 7/25/2019 Folien Econometrics I Teil3

    32/63

    Covariance Matrix of the OLS Estimator

    MATLAB Code: regestall.m

    Design 1: xi .5 + Uniform [0, 1] (left hand side) versus Design2: xi 1 + Uniform [0, 1] (N = 50, 2 = 0.1) (right hand side)

    0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.62.5

    2

    1.5

    1

    2

    (price)

    1(constant)

    N=50,2=0.1,Design 1

    0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.62.5

    2

    1.5

    1

    2

    (price)

    1(constant)

    N=50,2=0.1,Design 1

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-85

  • 7/25/2019 Folien Econometrics I Teil3

    33/63

    Covariance Matrix of the OLS Estimator

    The covariance Cov(j

    ,k

    ) of different coefficients of the OLS

    estimators measures, if deviations between the estimator and the

    true value are correlated.

    Cov(j,k) = E(j j)(k k) .This information is summarized for all possible pairs of coefficients

    in the covariance matrix of the OLS estimator. Note that

    Cov() = E((

    )(

    )

    ).

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-86

  • 7/25/2019 Folien Econometrics I Teil3

    34/63

    Covariance Matrix of the OLS Estimator

    The covariance matrix of a random vector is a square matrix,

    containing in the diagonal the variances of the various elements of

    the random vector and the covariances in the off-diagonal elements.

    Cov() =

    Var

    0

    Cov(0,1) Cov(0,K)Cov(0,1) Var

    1

    Cov(1,K)

    ... . . . ...Cov(0,K) Cov(K1,K) Var

    K

    .

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-87

  • 7/25/2019 Folien Econometrics I Teil3

    35/63

    Homoskedasticity

    To derive Cov(), we make an additional assumption, namely

    homoskedasticity:

    Var (u|X1, . . . , X K) =2. (39)

    This means that the variance of the error term u is the same,

    regardless of the predictor variables X1, . . . , X K.

    It follows that

    Var (Y|X1, . . . , X K) =2.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-88

  • 7/25/2019 Folien Econometrics I Teil3

    36/63

    Error Covariance Matrix

    Because the observations are a random sample from the popula-tion, any two observationsyiandyl are uncorrelated. Hence also

    the errors ui and ul are uncorrelated.

    Together with (39) we obtain the following covariance matrix of

    the error vector u:

    Cov(u) =2I,

    with I being the identity matrix.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-89

  • 7/25/2019 Folien Econometrics I Teil3

    37/63

    Covariance Matrix of the OLS Estimator

    Under assumption (28) and (39), the covariance matrix of the

    OLS estimator is given by:

    Cov() =2(X

    X)1. (40)

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-90

  • 7/25/2019 Folien Econometrics I Teil3

    38/63

    Covariance Matrix of the OLS Estimator

    Proof. Using (38), we obtain:

    =Au, A= (XX)1X.

    The following holds:

    E(( )( )) = E(AuuA) =AE(uu)A =ACov(u)A.

    Therefore:

    Cov() =2

    AA

    =2

    (X

    X)1

    X

    X(X

    X)1

    =2

    (X

    X)1

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-91

  • 7/25/2019 Folien Econometrics I Teil3

    39/63

    Covariance Matrix of the OLS Estimator

    The diagonal elements of the matrix2(X

    X)1 define the variance

    Var

    j

    of the OLS estimator for each component.

    The standard deviationsd(j) of each OLS estimator is defined as:

    sd(j) =

    Var

    j

    =

    (XX)1j+1,j+1. (41)

    It measures the estimation error on the same unit as j.

    Evidently, the standard deviation is the larger, the larger the variance

    of the error. What other factors influence the standard deviation?

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-92

  • 7/25/2019 Folien Econometrics I Teil3

    40/63

    Multicollinearity

    In practical regression analysis very often high (but not perfect)

    multicollinearity is present.

    How well may Xj be explained by the other regressors?

    ConsiderXj as left-hand variable in the following regression model,

    whereas all the remaining predictors remain on the right hand side:

    Xj = 0+1X1+ . . . +j1Xj1+j+1Xj+1+ . . . +KXK+ u.

    Use OLS estimation to estimate the parameters and let xj,i be the

    values predicted from this (OLS) regression.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-93

  • 7/25/2019 Folien Econometrics I Teil3

    41/63

    Multicollinearity

    DefineRj as the correlation between the observed valuesxj,iandthe predicted values xj,i in this regression.

    If R2j is close to 0, then Xj cannot be predicted from the otherregressors. Xj contains additional, independent information.

    The closer R2j is to 1, the betterXj is predicted from the otherregressors and multicollinearity is present. Xj does not contain

    much ,,independent information.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-94

  • 7/25/2019 Folien Econometrics I Teil3

    42/63

    The variance of the OLS estimator

    Using Rj, the variance Varj of the OLS estimators of thecoefficient jcorresponding toXjmay be expressed in the following

    way for j = 1, . . . , K :

    Varj = 2

    N s2

    xj(1 R2

    j)

    .

    Hence, the variance Var

    j

    of the estimate j is large, if the

    regressors Xj is highly redundant, given the other regressors (R2j

    close to 1, multicollinearity).

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-95

  • 7/25/2019 Folien Econometrics I Teil3

    43/63

    The variance of the OLS estimator

    All other factors same as for the simple regression model, i.e. the

    variance Var

    j

    of the estimate j is large, if

    the variance 2 of the error term u is large;

    the sampling variation in the regressorXj, i.e. the variances2xj,is small;

    the sample sizeN is small.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-96

  • 7/25/2019 Folien Econometrics I Teil3

    44/63

    II.4 OLS Residuals

    Consider the estimated regression model under OLS estimation:

    yi=0+ 1x1,i+ . . . +KxK,i+ ui= yi+ ui,

    where yi=0+ 1x1,i+ . . . +KxK,i is called the fitted value.

    ui is called the OLS residual. OLS residuals are useful:

    to estimate the variance 2 of the error term;

    to quantify the quality of the fitted regression model;

    for residual diagnostics

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-97

  • 7/25/2019 Folien Econometrics I Teil3

    45/63

  • 7/25/2019 Folien Econometrics I Teil3

    46/63

    OLS residuals as proxies for the error

    Compare the underlying regression model

    Y =0+ 1X1+ . . . + KXK+ u, (42)

    with the estimated model fori= 1, . . . , N :

    yi= 0+1x1,i+ . . . +KxK,i+ ui.

    The OLS residuals u1, . . . ,uNmay be considered as a sampleof the unobservable error u.

    Use the OLS residuals u1, . . . ,uN to estimate 2 = Var (u).

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-99

  • 7/25/2019 Folien Econometrics I Teil3

    47/63

    Algebraic properties of the OLS estimatorThe OLS residuals u1, . . . ,uNobeyK+ 1linear equations and have

    the following algebraic properties:

    The sum (average) of the OLS residualsui is equal to zero:

    1

    N

    N

    i=1

    ui= 0. (43)

    The sample covariance between xk,i and ui is zero:

    1

    N

    Ni=1 x

    k,iui= 0, k= 1, . . . , K . (44)

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-100

  • 7/25/2019 Folien Econometrics I Teil3

    48/63

    Estimating 2

    A naive estimator of 2 would be the sample variance of the OLS

    residuals u1, . . . ,uN:

    2 = 1

    N

    N

    i=1

    u2i 1

    N

    N

    i=1ui

    2=

    1

    N

    N

    i=1u2i =

    SSR

    N ,

    where we used (43) andSSR =N

    i=1u2i is the sum of squared OLS

    residuals.

    However, due to the linear dependence between the OLS residuals,

    u1, . . . ,uN is not an independent sample. Hence, 2 is a biasedestimator of2.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-101

  • 7/25/2019 Folien Econometrics I Teil3

    49/63

    Estimating 2

    Due to the linear dependence between the OLS residuals, only

    df = (N K 1) residuals can be chosen independently.df is also called the degrees of freedom.

    An unbiased estimator of the error variance2 in a homoscedastic

    multiple regression model is given by:

    2 =SSR

    df , (45)

    where df = (N

    K

    1), N is the number of observations, and

    K is the number of predictorsX1, . . . , X K.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-102

  • 7/25/2019 Folien Econometrics I Teil3

    50/63

    The standard errors of the OLS estimator

    The standard deviation sd(j) of the OLS estimator given in (46)

    depends on = 2.To evaluate the estimation error for a given data set in practical

    regression analysis, 2 is substituted by the estimator (45). This

    yields the so-called standard errorse(j) of the OLS estimator:

    se(j) =

    2

    (XX)1j+1,j+1. (46)

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-103

  • 7/25/2019 Folien Econometrics I Teil3

    51/63

    EVIEWS Exercise II.4.1

    EViews (and other packages) report for each predictor the OLS

    estimator together with the standard errors:

    Case Study profit, work file profit;

    Case Study Chicken, work file chicken; Case Study Marketing, work file marketing;

    Note: the standard errors computed by EViews (and other packages)

    are valid only under the assumption made above, in particular,homoscedasticity.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-104

  • 7/25/2019 Folien Econometrics I Teil3

    52/63

    Quantifying the model fit

    How well does the multiple regression model (42) explain the

    variation inY? Compare it with the following simple model withoutany predictors:

    Y =0+ u. (47)

    The OLS estimator of 0 minimizes the following sum of squaredresiduals:

    N

    i=1(yi 0)2

    and is given by 0=y.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-105

  • 7/25/2019 Folien Econometrics I Teil3

    53/63

    Coefficient of Determination

    The minimal sum is equal to the total variation

    SST =Ni=1

    (yi y)2.

    Is it possible to reduce the sum of squared residuals SST of the

    simple model (47) by including the predictor variablesX1, . . . , X Nas in (42)?

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-106

  • 7/25/2019 Folien Econometrics I Teil3

    54/63

    Coefficient of Determination

    The minimal sum of squared residuals SSR of the multiple

    regression model (42) is always smaller than the minimal sum ofsquared residualsSSTof the simple model (47):

    SSR SST. (48)

    The coefficient of determination R2 of the multiple regressionmodel (42) is defined as:

    R2 =SST SSR

    SST = 1 SSR

    SST. (49)

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-106

  • 7/25/2019 Folien Econometrics I Teil3

    55/63

    Coefficient of DeterminationProof of (48). The following variance decomposition holds:

    SST =Ni=1

    (yi yi+ yi y)2 =Ni=1

    u2i + 2Ni=1

    ui(yi y) +Ni=1

    (yi y)2.

    Using the algebraic properties (43) and (44) of the OLS residuals, we obtain:

    Ni=1

    ui(yi y) = 0Ni=1

    ui+ 1

    Ni=1

    uix1,i+ . . . +K

    Ni=1

    uixK,i yNi=1

    ui= 0.

    Therefore:

    SST = SSR +N

    i=1

    (yi

    y)2

    SSR.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-107

  • 7/25/2019 Folien Econometrics I Teil3

    56/63

    Coefficient of Determination

    The coefficient of determinationR2 is a measure of goodness-of-fit:

    If SSR SST, then there is little gained by including thepredictors. R2 is close to 0. The multiple regression model

    explains the variation in Y hardly better than the simple model

    (47).

    IfSSR

  • 7/25/2019 Folien Econometrics I Teil3

    57/63

    Coefficient of Determination

    0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.52

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    2

    2.5SSR=9.5299, SST=120.0481, R

    2=0.92062

    data

    price as predictor

    no predictor

    0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.50.8

    0.6

    0.4

    0.2

    0

    0.2

    0.4

    0.6

    0.8

    1SSR=8.3649, SST=8.6639, R

    2=0.034512

    data

    price as predictor

    no predictor

    MATLAB Code: reg-est-r2.m

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-109

  • 7/25/2019 Folien Econometrics I Teil3

    58/63

    The Gauss Markov Theorem

    The Gauss Markov Theorem. Under the assumptions (28) and

    (39), the OLS estimator is BLUE, i.e. the

    Best

    Linear

    Unbiased

    Estimator

    Here best means that any other linear unbiased estimator haslarger standard errors than the OLS estimator.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-110

  • 7/25/2019 Folien Econometrics I Teil3

    59/63

    II.5 Testing Hypothesis

    Multiple regression model:

    Y =0+ 1X1+ . . . + jXj+ . . . + KXK+ u, (50)

    Does the predictor variableXj exercise an influence on the expected

    mean E(Y) of the response variable Y, if we control for all othervariablesX1, . . . , X j1, Xj+1, . . . , X K? Formally,

    j = 0?

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-111

  • 7/25/2019 Folien Econometrics I Teil3

    60/63

    Understanding the testing problem

    Simulate data from a multiple regression model with 0 = 0.2,

    1= 1.8, and 2=0:

    Y = 0.2 1.8X1+0 X2+ u, u Normal

    0, 2

    .

    Run OLS estimation for a model where2 is unknown:Y =0+ 1X2+ 2X3+ u, u Normal

    0, 2

    ,

    to obtain(0,1, 2). Is 2 different from 0?

    MATLAB Code: regtest.m

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-112

  • 7/25/2019 Folien Econometrics I Teil3

    61/63

    Understanding the testing problem

    2.5 2 1.5 10.5

    0.4

    0.3

    0.2

    0.1

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    2(price)

    3

    (redundantvariable)

    N=50,2=0.1,Design 3

    The OLS estimator 2of2= 0differs from 0 for a single data set,but is 0 on average.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-113

  • 7/25/2019 Folien Econometrics I Teil3

    62/63

    Understanding the testing problem

    OLS estimation for the true model in comparison to estimating a

    model with a redundant predictor variable: including the redundantpredictor X2increases the estimation error for the other parameters

    0 and 1.

    0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.62.5

    2

    1.5

    1

    2

    (price)

    1(constant)

    N=50,2=0.1,Design 1

    0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 0.62.5

    2

    1.5

    1

    2

    (price)

    1(constant)

    N=50,2=0.1,Design 3

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-114

  • 7/25/2019 Folien Econometrics I Teil3

    63/63

    Testing of hypotheses

    What may we learn from the data about hypothesis concerningthe unknown parameters in the regression model, especially aboutthe hypothesis that j = 0?

    May we reject the hypothesis j = 0given data?

    Testing, if j = 0 is not only of importance for the substantivescientist, but also from an econometric point of view, to increase

    efficiency of estimation of non-zero parameters.

    It is possible to answer these questions, if we make additionalassumptions about the error term u in a multiple regression model.

    Sylvia Fruhwirth-Schnatter Econometrics I WS 2012/13 1-115