An Overview of Regression Analysis

Embed Size (px)

Citation preview

  • 8/13/2019 An Overview of Regression Analysis

    1/33

    Chapter 1

    An Overviewof Regression

    Analysis

    Copyright 2011 Pearson Addison-Wesley.

    All rights reserved.Slides by Niels-Hugo Blunch

    Washington and Lee University

  • 8/13/2019 An Overview of Regression Analysis

    2/33

    1-2 2011 Pearson Addison-Wesley. All rightsreserved.

    What is Econometrics?

    Econometrics is too mathematical; its the reasonmy best friend isnt majoring in economics

    There are two things you are better off not

    watching in the making: sausages and econometricestimates

    Econometrics may be defined as the quantitativeanalysis of actual economic phenomena

    Blind people trying to describe an elephant just

    based on what they happen to be touching

  • 8/13/2019 An Overview of Regression Analysis

    3/33

    1-3 2011 Pearson Addison-Wesley. All rightsreserved.

    What is Econometrics? (cont.)

    Econometrics literally means economicmeasurement

    It is the quantitative measurement and analysis of

    actualeconomic and business phenomenaand soinvolves: economic theory

    Statistics

    Math

    observation/data collection

    Econometrics attempts to bridge the gap betweeneconomic theory and the real world

  • 8/13/2019 An Overview of Regression Analysis

    4/33

    1-4 2011 Pearson Addison-Wesley. All rightsreserved.

    What is Econometrics? (cont.)

    Three major uses of econometrics: Describingeconomic reality (what is the quantitative

    relationship between consumption and income?) Testinghypotheses about economic theory (is potato a

    normal good or an inferior good?) Forecastingfuture economic activity (what will the U.S.

    unemployment rate be in 2013?)

    Econometrics vs. Statistics

    Econometrics is based on Statistics, but Econometrics deals with economic issues Econometrics has many unique techniques Econometrics is designed for economic data

  • 8/13/2019 An Overview of Regression Analysis

    5/33

    1-5 2011 Pearson Addison-Wesley. All rightsreserved.

    Example

    Consider this general and purely theoretical

    relationship (from microeconomic theory):

    Q = f(P, Ps, Yd) (1.1)Q: quantity demanded P: price

    Ps: price of a substitute Yd: disposable income

    Econometrics allows this general and purely

    theoretical relationship to become explicit:

    Q = 27.70.11P + 0.03Ps+ 0.23Yd (1.2)

  • 8/13/2019 An Overview of Regression Analysis

    6/33

    1-6 2011 Pearson Addison-Wesley. All rightsreserved.

    What is Econometrics? (cont.)

    Econometrics models/approaches: Linear regression model

    Nonlinear regression model

    Time-series model

    Panel data model

    Discrete choice model

    Simultaneous equations model

    Event counts model

    Duration-time model

    This course will focus on linear regression model

  • 8/13/2019 An Overview of Regression Analysis

    7/33

    1-7 2011 Pearson Addison-Wesley. All rightsreserved.

    What is Regression Analysis?

    Economic theory can give us the directionof achange, e.g. the change in the demand for dvdsfollowing a price decrease (or price increase)

    But what if we want to know not just how?butalso how much?

    Then we need: A sample of data

    A way to estimate such a relationship

    one of the most frequently ways used is regressionanalysis

  • 8/13/2019 An Overview of Regression Analysis

    8/33

    1-8 2011 Pearson Addison-Wesley. All rightsreserved.

    What is Regression Analysis? (cont.)

    Formally, regression analysis is a statistical

    techniquethat attempts to explain

    movements in one variable, the dependentvariable, as a function of movements in a set

    of other variables, the independent(or

    explanatory) variables, through the

    quantification of a single equation

  • 8/13/2019 An Overview of Regression Analysis

    9/33

    1-9 2011 Pearson Addison-Wesley. All rightsreserved.

    Example

    Return to the example from before:

    Q = f(P, Ps, Yd) (1.1)

    Here, Q is the dependentvariable and P, Ps, Yd are

    the independent (explanatory)variables Dont be deceived by the words dependent and

    independent. A statistically significant regressionresult does not necessarilyimply causality(means

    cause-and-effect relationship) We also need:

    Economic theory

    Common sense

  • 8/13/2019 An Overview of Regression Analysis

    10/33

    1-10 2011 Pearson Addison-Wesley. All rightsreserved.

    Linear Model

    The simplest example is:

    Yi = 0+ 1Xi(i = 1,2,,N) (1.3)

    The sare called coefficients 0is the constant or intercept term 1is the slope coefficient: the amount that Y will change

    when X increases by one unit; for a linear model, 1isconstant over the entire function

  • 8/13/2019 An Overview of Regression Analysis

    11/33

    1-11 2011 Pearson Addison-Wesley. All rightsreserved.

    Figure 1.1

    Graphical Representation of the

    Coefficients of the Regression Line

  • 8/13/2019 An Overview of Regression Analysis

    12/33

    1-12 2011 Pearson Addison-Wesley. All rightsreserved.

    Linear Model (cont.)

    Application of linear regression techniques requires that the equation be

    linearsuch as (1.3)

    By contrast, the equation

    Y = 0+ 1X2

    (1.4)is not linear

    What to do? First define

    Z = X2 (1.5)

    Substituting into (1.4) yields:

    Y = 0+ 1Z (1.6)

    This redefined equation is now linear(in the coefficients 0and 1andin

    the variables Y and Z)

  • 8/13/2019 An Overview of Regression Analysis

    13/33

    1-13 2011 Pearson Addison-Wesley. All rightsreserved.

    Linear Regression Model

    Is (1.3) a complete description of origins of variation in Y?

    No, at least four sources of variation in Y other than thevariation in the included Xs: Other potentially important explanatory variables may be missing

    (e.g., X2and X3) Measurement error

    Incorrect functional form

    Purely random and totally unpredictable occurrences

    Inclusion of a stochastic error term () partially takes care

    of these other sources of variation in Y that are NOT capturedby X, so that (1.3) becomes:Y = 0+ 1X + (1.7)

  • 8/13/2019 An Overview of Regression Analysis

    14/33

    1-14 2011 Pearson Addison-Wesley. All rightsreserved.

    Example: Aggregate

    Consumption Function

    Aggregate consumption as a function of aggregate income may belower (or higher) than it would otherwise have been due to:

    consumer uncertaintyhard (impossible?) to measure, i.e. is anomitted variable

    Observed consumption may be different from actual consumption dueto measurement error

    The true consumption function may be nonlinear but a linear one isestimated (see Figure 1.2for a graphical illustration)

    Human behavior always contains some element(s) of pure chance;unpredictable, i.e. random events may increase or decrease

    consumption at any given time Whenever one or more of these factors are at play, the observed Y

    will differ from the Y predicted from the deterministic part, 0+ 1X

  • 8/13/2019 An Overview of Regression Analysis

    15/33

    1-15 2011 Pearson Addison-Wesley. All rightsreserved.

    Figure 1.2

    Errors Caused by Using a Linear Functional Form to Model a

    Nonlinear Relationship

  • 8/13/2019 An Overview of Regression Analysis

    16/33

    1-16 2011 Pearson Addison-Wesley. All rightsreserved.

    General Regression Model

    General regression model: Y = E(Y|X) + deterministiccomponent: E(Y|X)

    stochastic/randomcomponent:

    What is E(Y|X)? The deterministic component can be thought of as the

    expected valueof Y givenXnamely E(Y|X)i.e. themean (or average)value of the Y associated with aparticular valueof X

    In mathematics, E(Y|X) is the conditional expectation(thatis, expectationof Y conditionalon X)

    In linear regression model, E(Y|X) = 0+ 1X

  • 8/13/2019 An Overview of Regression Analysis

    17/33

    1-17 2011 Pearson Addison-Wesley. All rightsreserved.

    Extending the Notation

    Include the index of observations

    Single explanatory variable case:

    Yi= 0+ 1Xi+ i(i = 1,2,,N) (1.10) So there are really N equations, one for each

    observation

    0and 1 are the coefficients (they are population

    parameters!)

    the values of Y, X, and differ across observations

  • 8/13/2019 An Overview of Regression Analysis

    18/33

    1-18 2011 Pearson Addison-Wesley. All rightsreserved.

    Three Types of Data in Econometrics

    Subscript i for data on individuals (so called cross

    sectionaldata)

    Subscript t for time seriesdata (e.g., series of years,

    months, or daysdaily exchange rates, for example )

    Subscript it when we have both(for example,

    panel data)

  • 8/13/2019 An Overview of Regression Analysis

    19/33

    1-19 2011 Pearson Addison-Wesley. All rightsreserved.

    Multivariate Linear Regression Model

    The general case: multivariatelinear regression

    model

    Yi=

    0+

    1X

    1i+

    2X

    2i+

    3X

    3i+

    i (i = 1,2,,N) (1.11)

    Each of the slope coefficients gives the impact of a

    one-unit increase in the corresponding X on Y,

    holding the other included explanatory variablesconstant (i.e., ceteris paribus)

  • 8/13/2019 An Overview of Regression Analysis

    20/33

    1-20 2011 Pearson Addison-Wesley. All rightsreserved.

    Example: Wage Regression

    Let wages (WAGE) depend on:

    years of work experience (EXP)

    years of education (EDU)

    gender of the worker (GEND: 1 if male, 0 if female)

    Substituting into equation (1.11) yields:

    WAGEi= 0+ 1EXPi+ 2EDUi+ 3GENDi+ i (1.12)

  • 8/13/2019 An Overview of Regression Analysis

    21/33

    1-21 2011 Pearson Addison-Wesley. All rightsreserved.

    The Estimated Regression Equation

    The regression equation considered so far is the truebutunknownregression equation

    Instead of true, might think about this as the population

    regression equation The population regression equation has to be estimated, we

    need estimatorsof the regression coefficients

    The sample/estimatedregression equation is:

    The signs on top of the estimators are denoted hat, so that

    we have 0-hat and 1-hat (they are sample statistics orestimators of regression coefficients!)

    For each sample we get a different set of estimated regressioncoefficients

    ii XY

    10

  • 8/13/2019 An Overview of Regression Analysis

    22/33

    1-22 2011 Pearson Addison-Wesley. All rightsreserved.

    The Estimated Regression Equation

    (cont.)

    Yihatis the estimated value (or an estimate) of Yi

    The residual,ei , is given as(1.17)

    Note that ei is different from the error term, i,

    (1.18)

    ei is an estimate of i, the smaller is ei, the better is thefit

  • 8/13/2019 An Overview of Regression Analysis

    23/33

    1-23 2011 Pearson Addison-Wesley. All rightsreserved.

    Figure 1.3

    True and Estimated Regression Lines

  • 8/13/2019 An Overview of Regression Analysis

    24/33

    1-24 2011 Pearson Addison-Wesley. All rightsreserved.

    Example: Using Regression to Explain

    Housing prices

    Houses are not homogenous products that havegenerally known market prices

    So, how to appraise a house against a given asking

    price? Yes, its true: many real estate companies actually

    use regression analysis to do this!

    Consider a specific case: Suppose the asking pricewas $230,000

    Is this fair / too much /too little?

  • 8/13/2019 An Overview of Regression Analysis

    25/33

    1-25 2011 Pearson Addison-Wesley. All rightsreserved.

    Example: Using Regression to Explain

    Housing prices (cont.)

    Depends on size of house (higher size, higher price)

    So, collect cross-sectional data on prices(in thousands of $) and sizes (in square feet)

    for, say, 43 houses Then say this yields the following estimated regression

    line:

    (1.23)ii

    SIZECEIPR 138.00.40

  • 8/13/2019 An Overview of Regression Analysis

    26/33

    1-26 2011 Pearson Addison-Wesley. All rightsreserved.

    Figure 1.5 A Cross-Sectional Model of

    Housing Prices

  • 8/13/2019 An Overview of Regression Analysis

    27/33

    1-27 2011 Pearson Addison-Wesley. All rightsreserved.

    Example: Using Regression to Explain

    Housing prices (cont.)

    Note that the interpretationof the interceptterm is

    problematic in this case (well get back to this later,

    in Section 7.1.2)

    The literal interpretation of the intercept here is the

    price of a house with a size of zerosquare feet

  • 8/13/2019 An Overview of Regression Analysis

    28/33

    1-28 2011 Pearson Addison-Wesley. All rightsreserved.

    Example: Using Regression to Explain

    Housing prices (cont.)

    How to use the estimated regression line / estimated regressioncoefficients to answer the question? Just plug the particular size of the house that you are interested in

    (here, 1,600 square feet) into (1.23)

    Alternatively, read off the estimated price using Figure 1.5 Either way, we get an estimated price of $260.8 thousand

    (=40+0.138*1600)

    So, in terms of our original question, its a good dealgo ahead andpurchase!!

    Note that we simplified a lot in this example by assuming that onlysize matters for housing prices (we ignore the year when the housewas built, the distance to work, the number of bedrooms, etc.)

  • 8/13/2019 An Overview of Regression Analysis

    29/33

    Another Example: Weight and Height

    Yi= 0+ 1Xi+ i

    Yi= the weight (in pounds) of the ith customer

    Xi = the height (in inches above 5 feet) of the ithcustomer

    i = the stochastic error term for the ith customer

    The estimated regression function is :

    Estimated Weight = 103.40 + 6.38*Height (above 5)

    1/16/2013 29

    103.40 6.38*i i

    Y X

  • 8/13/2019 An Overview of Regression Analysis

    30/33

    1-30 2011 Pearson Addison-Wesley. All rightsreserved.

    Another Example: Weight and Height (cont.)

  • 8/13/2019 An Overview of Regression Analysis

    31/33

    1-31 2011 Pearson Addison-Wesley. All rightsreserved.

    Another Example: Weight and Height (cont.)

  • 8/13/2019 An Overview of Regression Analysis

    32/33

    1-32 2011 Pearson Addison-Wesley. All rightsreserved.

    Another Example: Weight and Height (cont.)

  • 8/13/2019 An Overview of Regression Analysis

    33/33

    1-33 2011 Pearson Addison-Wesley All rights

    Key Terms from Chapter 1

    Econometrics

    Regression analysis

    Dependent variable

    Independent variable(s)

    Explanatory variable(s)

    Causality

    Linear regression model Coefficient

    Intercept

    Slope

    Error term

    Conditional expectation

    Multivariate linear regression

    model

    Cross-sectional data

    Time series data Panel data

    Residual