ECON 620 - Lecture 1 - Introduction to Econometrics

Embed Size (px)

Citation preview

  • Lecture 1: Introduction to Econometrics University of San Francisco Department of Economics Prof. Jesse K. Anttila-Hughes January 21st, 2014

  • NYTimes, Jan 29th, 2013

  • Today well cover: Introduction to econometrics Review of probability

    Reading: Wooldridge Appendixes B and C

    Office Hours: Tuesdays 1-4pm or by appt

    Problem set #1 On Blackboard Due next Monday

    Computer assignment #1 Get Stata Download the data files from Blackboard Bring your laptop, with Stata installed and the data files

  • Goals for this class Emphasis on:

    Formal econometrics : Basic statistics and probability Fundamentals of multivariate OLS regression

    Regression execution and interpretation Hypothesis testing and standard errors

    Simple connections between research questions and econometrics Including some common basic problems and their solutions / lack thereof

    Formal foundation for Econometrics II and III Tacit knowledge

    Basics of performing econometrics analysis in Stata Basics good habit for handling and managing data

    By the end of class you should have: Simple, intuitive understandings of what econometrics is and what it can

    and cant do A beginners proficiency with Stata Enough econometrics to start appreciating how much there is to learn

  • Econometrics involves a lot of tacit learning:

  • Administraterrata Class:

    6:30 9:15 pm on Tuesdays 14 lectures total, one midterm

    Blackboard primary resource Syllabus, lecture notes, hws, readings, etc.

    Grades: 12 Problem sets (drop lowest 2): 35% Class participation: 10% Midterm exam: 25% Final exam: 30%

    Book: Wooldridge 5th Edition

  • What were aiming for

  • Hsiang et al. 2013

  • The Nature of Econometrics and Economic Data

  • Econometrics What is econometrics?

    The statistics used by economists

    Why do we do econometrics? Estimating relationships between economic variables Testing economic theories and hypotheses Forecasting economic variables Evaluating or implementing policies

    Etc.

    In general, econometrics starts with an economic model (i.e., something youd learn in a theory class) and then generates testable predictions which come from that model The step of explicitly defining a model is often skipped

  • Economic model of crime (Becker (1968)) Derives equation for criminal activity based on utility maximization

    Functional form of relationship not specified Equation could have been postulated without economic modeling

    Hours spent in criminal activities

    Returns to criminal activities Wage for legal

    employment Other income

    Probability of getting caught

    Probability of conviction if caught

    Expected sentence

    Age

    What kind of theory models?

  • Model of job training and worker productivity What is effect of additional training on worker productivity? Formal economic theory not really needed to derive equation (but may):

    Other factors may be relevant, but these are the most important (?)

    Hourly wage

    Years of formal education Years of work-

    force experience

    Weeks spent in job training

    What kind of theory models?

  • Econometric model of criminal activity The functional form has to be specified Variables may have to be approximated by other quantities

    Measure of cri- minal activity

    Wage for legal employment

    Other income

    Frequency of prior arrests

    Frequency of conviction

    Average sentence length after conviction

    Age

    Unobserved deter- minants of criminal activity

    e.g. moral character, wage in criminal activity, family background

    What kind of econometric models?

  • Econometric model of job training and worker productivity

    Most of econometrics deals with the specification of the error Econometric models may be used for hypothesis testing

    For example, the parameter represents effect of training on wage How large is this effect? Is it different from zero?

    Hourly wage Years of formal education

    Years of work- force experience

    Weeks spent in job training

    Unobserved deter- minants of the wage

    e.g. innate ability, quality of education, family background

    What kind of econometric models?

  • Types of data Econometric analysis requires data

    And there are many, many different things that count as data

    In general, we distinguish between the four major kinds of economic data in terms of how they interact between units of observation and time Cross-sectional data

    Multiple units of obs., single time Time series data

    Single unit of obs., multiple times Pooled cross sections

    Multiple unit of obs., multiple times, but different obs. each time Panel/Longitudinal data

    Multiple units of observation with multiple time observations for each

    Econometric methods depend on the nature of the data used Use of inappropriate methods may lead to misleading results

    This will be a big theme in later classes

  • Cross-sectional data

    Sample of individuals, households, firms, cities, states, countries, or

    other units of interest at a given point of time/in a given period

    Cross-sectional observations must be more or less independent

    For example, pure random sampling from a population

    Sometimes pure random sampling is violated, e.g. units refuse to

    respond in surveys, or if sampling is characterized by clustering

    Types of data

  • Observation number Hourly wage

    Indicator variables (1=yes, 0=no)

    " Cross-sectional data set on wages and other characteristics

    Cross sectional data: One observation per unit of obs

  • Cross-sectional data on growth rates and country characteristics

    Adult secondary education rates

    Government consumtion as percentage of GDP

    Growth rate of real per capita GDP

    Cross sectional data: One observation per unit of obs

  • Time series data Observations of a variable or several variables over time

    For example, stock prices, money supply, consumer price index, gross domestic product, annual homicide rates, automobile sales,

    Time series observations are typically serially correlated Ordering of observations conveys important information Data frequency: daily, weekly, monthly, quarterly, annually, Typical features of time series: trends and seasonality Typical applications: applied macroeconomics and finance

    Types of data

  • Time series data on minimum wages and related variables

    Unemployment rate

    Average coverage rate

    Average minimum wage for given year

    Gross national product

    Types of data

  • Pooled cross sections Two or more cross sections are combined in one data set Cross sections are drawn independently of each other Pooled cross sections often used to evaluate policy changes Example:

    Evaluate effect of change in property taxes on house prices Random sample of house prices for the year 1993 A new random sample of house prices for the year 1995 Compare before/after (1993: before reform, 1995: after reform)

    Types of data

  • Pooled cross sections on housing prices

    Number of bathrooms

    Size of house in square feet

    Property tax

    Before reform

    After reform

    Types of data

  • Panel or longitudinal data The same cross-sectional units are followed over time

    Panel data have both cross-sectional and a time series dimensions

    Hence, panel data can be used to account for time-invariant unobservables

    Panel data can be used to model lagged responses

    Example:

    City crime statistics; each city is observed in two years Time-invariant unobserved city characteristics may be modeled

    Effect of police on crime rates may exhibit time lag

    Types of data

  • Two-year panel data on city crime statistics

    Each city has two time series observations

    Number of police in 1986

    Number of police in 1990

    Types of data

  • Causal Inference vs. Association One of the major aspects of econometrics that distinguishes it from

    statistics is a very strong emphasis on understanding causal inference Causal inference: evaluating whether a change one variable (x) will lead

    to a change in another variable (y) assuming nothing else changes (ceteris paribus)

    Why do we care about causality? A lot of times as econometricians were explicitly trying to evaluate a

    policy to enact Thus we want to know what will happen if we change one variable

    More generally, the statistical tools we have can tell us a lot about how two variables covary But correlation doesnt imply causation, and to get to causal inference we

    generally need to know about how the problem works in real life A particular concern is when our two variables x and y are

    endogenous, or jointly determined Aka x and y influence each other, or theres a third variable Z that affects

    both

  • Example: Medicine (Medieval)

    Four Humors Theory (not falsifiable)

    Four Humors Empirics (sub-optimal outcomes)

  • Medicine (Victorian)

    Miasma Theory (not falsifiable)

    Miasma Empirics (sub-optimal outcomes)

  • Medicine (Early Modern, 1850s)

    Theory : Cholera is a vector-borne disease transmitted by water

    (falsifiable)

    Disease Theory Empirics (decent outcomes)

  • Randomization Ironically given its late arrival to the scientific

    method, medicine first developed what we may now consider the be the ultimate tool in causal inference in the sciences: the randomized control trial (or RCT)

  • Endogeneity and random assignment

    Why is random assignment important? Consider

    health_outcome = A*(took_drug) + where is our error term, took_drug is a binary

    variable indicating that a patient took a drug, and health_outcome is some measurement of health

    Why do we care that took_drug is randomly assigned? If took_drug is randomly assigned, then is it

    correlated with anything? If not, who cares?

  • Why is endogeneity an issue? Randomization allows a researcher to eliminate the

    possibility that they are arguing for a causal, exogenous interpretation of an endogenous system

    Endogenous originating from inside the system, in this case taken to mean co-influential Education and earnings Prices of substitute or complementary goods Development and the environment

    Exogenous means originating outside the system Interpreting an endogenous relationship as exogenous

    means risking interpreting a system with reverse causality as strictly causal

  • Endogeneity by example: Classroom size and educational achievement

    How is the relationship endogenous?

  • Class size: ways of attacking endogeneity Randomly assign students to large or small

    classes Tennessee STAR

    Find a natural experiment that produces something akin to randomization in class size Maimonides Rule in Israel

    But these dont always work the way we think Discontinuous class size cutoffs in Chile

  • Why all this concern about endogeneity? Endogeneity is particularly troublesome in the social

    sciences because humans are self-aware Humans might sort on / select into treatment

    Our understanding of how to deal with endogeneity is relatively new Natural scientists dont normally have intelligent,

    reactive data points Medicine, which one might argue is halfway between

    the natural and social sciences, needed to be concerned with endogeneity early on

    Endogeneity is still a concern in the natural sciences, though correlation does not imply causation