Panel Data

Meo School Of Research East west north or south education is for all.

Please remember me and my teachers and family in your prayers. Superior university Lahore Pakistan

The term ―panel data‖ refers to the pooling of observations on a cross-section of

Households, countries, firms, etc. over several time periods (Baltagi).

Panel data, also known as longitudinal data, have both time series and cross-sectional

dimensions.

They arise when we measure the same collection of people or objects over a period of

time.

Econometrically, the setup is

o o where yit is the dependent variable, is the intercept term, is a k 1 vector of

parameters to be estimated on the explanatory variables, xit; t = 1, …, T;

i = 1, …, N.

The simplest way to deal with this data would be to estimate a single, pooled regression

on all the observations together. But pooling the data assumes that there is no

heterogeneity – i.e. the same relationship holds for all the data.

A panel data set, while having both a cross-sectional and a time series dimension, differs in some

important respects from an independently pooled cross section. To collect panel data—

sometimes called longitudinal data—we follow (or attempt to follow) the same individuals,

families, firms, cities, states, or whatever, across time.

Hsiao (2003) and Klevmarken (1989)

List several benefits from using panel data

Controlling for individual heterogeneity: Panel data suggests that individuals, firms, states or countries are heterogeneous. Time-series and

cross-section studies not controlling this heterogeneity run the risk of obtaining biased results,

e.g. see Moulton (1986, 1987).

It is often of interest to examine how variables, or the relationships between them, change

dynamically (over time).

By structuring the model in an appropriate way, we can remove the impact of certain forms of

omitted variables bias in regression results.

Panel data give more informative data, more variability, less collinearity among the variables,

More degrees of freedom and more efficiency

Time-series studies are plagued with multicollinearity; for example, in the case of demand for

cigarettes above, there is high collinearity between price and income in the aggregate time series

for the USA. This is less likely with a panel across American states since the cross-section

dimension adds a lot of variability, adding more informative data on price and income. In fact,

the variation in the data can be decomposed into variation between states of different sizes and

characteristics, and variation within states.

ititit

uxy



Panel data are better able to study the dynamics of adjustment.

Cross-sectional distributions that look relatively stable hide a multitude of changes. Spells of

unemployment, job turnover, residential and income mobility are better studied with panels.

Panel data are also well suited to study the duration of economic states like unemployment and

poverty, and if these Panels are long enough; they can shed light on the speed of adjustments to

economic policy Changes.

Panel data are better able to identify and measure effects that are simply not detectable

in pure cross-section or pure time-series data. (prof.Balgati&prof:Gujrati)

Characteristics of panel data

Panel data provide information on individual behavior, both across individuals and over time –

they have both cross-sectional and time-series dimensions.

Panel data include N individuals observed at T regular time periods.

Panel data can be balanced when all individuals are observed in all time periods or

unbalanced when individuals are not observed in all time periods

We assume correlation (clustering) over time for a given individual, with independence

over individuals.

o Example: the income for the same individual is correlated over time but it is independent

across individuals.

Panel data types Short panel: many individuals and few time periods (we use this case in class)

Long panel: many time periods and few individuals

Both: many time periods and many individuals

Variation for the dependent variable and Regressor Overall variation: variation over time and individuals.

Between variation: variation between individuals.

Within variation: variation within individuals (over time).

Panel data models Panel data models describe the individual behavior both across time and across

individuals.

There are three types of models: the pooled model, the fixed effects model, and the

random effects model.

Pooled model The pooled model specifies constant coefficients, the usual assumptions for cross-

sectional analysis.

This is the most restrictive panel data model and is not used much in the literature.

Individual-specific effects model

We assume that there is unobserved heterogeneity across individuals captured by

Example: unobserved ability of an individual that affects wages.



Fixed effects model

The fixed effects model for some variable yit may be written

We can think of i as summarizing all of the variables that affect yit cross-sectionally but do

not vary over time – for example, the sector that a firm operates in, a person's gender, or the

country where a bank has its headquarters, etc. Thus we would capture the heterogeneity that

is encapsulated in i by a method that allows for different intercepts for each cross sectional

unit.

This model could be estimated using dummy variables, which would be termed the least

squares dummy variable (LSDV) approach.( ‘Introductory Econometrics for Finance’ © Chris Brooks 2013)

The Random Effects Model

An alternative to the fixed effects model described above is the random effects model,

which is sometimes also known as the error components model.

As with fixed effects, the random effects approach proposes different intercept terms for

each entity and again these intercepts are constant over time, with the relationships

between the explanatory and explained variables assumed to be the same both cross-

sectionally and temporally.

However, the difference is that under the random effects model, the intercepts for each

cross-sectional unit are assumed to arise from a common intercept (which is the same

for all cross-sectional units and over time), plus a random variable i that varies cross-

sectionally but is constant over time.

i measures the random deviation of each entity’s intercept term from the ―global‖

intercept term . We can write the random effects panel model as

Unlike the fixed effects model, there are no dummy variables to capture the heterogeneity

(variation) in the cross-sectional dimension.

Instead, this occurs via the i terms.

Fixed or Random Effects?

It is often said that the random effects model is more appropriate when the entities in the

sample can be thought of as having been randomly selected from the population, but a

fixed effect model is more plausible when the entities in the sample effectively constitute

the entire population.

itiitit

vxy

itiitititit

vxy ,



However, the random effects approach has a major drawback which arises from the fact

that it is valid only when the composite error term it is uncorrelated with all of the

explanatory variables.

This can also be viewed as a consideration of whether any unobserved omitted variables

(that were allowed for by having different intercepts for each entity) are uncorrelated

with the included explanatory variables. If they are uncorrelated, a random effects

approach can be used; otherwise the fixed effects model is preferable.

A test for whether this assumption is valid for the random effects estimator is based on a

slightly more complex version of the Hausman test.

If the assumption does not hold, the parameter estimates will be biased and inconsistent.

To see how this arises, suppose that we have only one explanatory variable, x2it that varies

positively with yit, and also with the error term, it. The estimator will ascribe all of any

increase in y to x when in reality some of it arises from the error term, resulting in biased

coefficients

The main question is whether the individual-specific effects are correlated with the Regressor. If

they are correlated, we have the fixed effects model. If they are not correlated, we have the

random effects model.

Fixed effect model verses random effect model Fixed effect Random effect

Correlation between the individual, or cross-

section specific, error component εi and the X

Regressor. εi (error component) and the X’s

are correlated, FEM may be appropriate

If they are not correlated, we have the random

effects model.

If T (the number of time series data) is large

and N (the number of cross-sectional units) is

small, FEM may be preferable

When N is large and T is small, then ECM is

appropriate

If the individual error component εi and one or

more Regressor are correlated, then the ECM

estimators are biased

If the individual error component εi and one or

more Regressor are correlated, then the ECM

estimators are biased, whereas those obtained

from FEM are unbiased.

Best of luck. Soon ill update this file and elaborate further this topic, because of still I’m in

learning process.



Thanks for being with me

Take care

www.saeedmeo.blogspot.com

[email protected]

1/24/2016

Documents

Panel Data