Slides a13 14

Time Series Analysis

Andrea Beccarini

Center for Quantitative Economics

Winter 2013/2014

Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 1 / 143

IntroductionObjectives

Time series are ubiquitous in economics, and very importantin macro economics and financial economics

GDP, inflation rates, unemployment, interest rates, stock prices

You will learn . . .

the formal mathematical treatment of time series and stochasticprocesseswhat the most important standard models in economics arehow to fit models to real world time series


IntroductionPrerequisites

Descriptive Statistics

Probability Theory

Statistical Inference


IntroductionClass and material

Class

Class teacher: Sarah Meyer

Time: Tu., 12:00-14:00

Location: CAWM 3

Start: 22 October 2013

Material

Course page on Blackboard

Slides and class material are (or will be) downloadable


IntroductionLiterature

Neusser, Klaus (2011), Zeitreihenanalyse in denWirtschaftswissenschaften, 3. Aufl., Teubner, Wiesbaden.−→ available online in the RUB-Netz

Hamilton, James D. (1994), Time Series Analysis,Princeton University Press, Princeton.

Pfaff, Bernhard (2006), Analysis of Integrated andCointegrated Time Series with R, Springer, New York.

Schlittgen, Rainer und Streitberg, Bernd (1997),Zeitreihenanalyse, 7. Aufl., Oldenbourg, Munchen.


BasicsDefinition

Definition: Time series

A sequence of observations ordered by time is called time series

Time series can be univariate or multivariate

Time can be discrete or continous

The states can be discrete or continuous


BasicsDefinition

Typical notationsx1, x2, . . . , xT

or x(1), x(2), . . . , x(T )or xt , t = 1, . . . ,Tor (xt)t≥0

This course is about . . .

univariate time seriesin discrete timewith continuous states


BasicsExamples

Quarterly GDP Germany, 1991 I to 2012 II

1995 2000 2005 2010

350

400

450

500

550

600

650

Time

GD

P (

in c

urre

nt b

illio

n E

uro)


BasicsExamples

DAX index and log(DAX), 31.12.1964 to 6.4.2009

1970 1980 1990 2000 2010

2000

6000

Time

DA

X

1970 1980 1990 2000 2010

6.0

7.0

8.0

9.0

Time

loga

rithm

of D

AX


BasicsDefinition

Definition: Stochastic process

A sequence (Xt)t∈T of random variables, all defined on the sameprobability space (Ω,A,P), is called stochastic process with discrete timeparameter (usually T = N or T = Z)

Short version: A stochastic process is a sequence of random variables

A stochastic process depends on both chance and time


BasicsDefinition

Distinguish four cases: both time and chance can be fixed or variable

t fixed t variable

ω fixed Xt(ω) is a Xt(ω) is a sequence ofreal number real numbers (path,

realization, trajectory)

ω variable Xt(ω) is a Xt(ω) is a stochasticrandom variable process

process.R


BasicsExamples

Example 1: White noise

εt ∼ NID(0, σ2

)Example 2: Random walk

Xt = Xt−1 + εt and X0 = 0

εt ∼ NID(0, σ2)

Example 3: A random constant

Xt = Z

Z ∼ N(0, σ2)


BasicsMoment functions

Definition: Moment functions

The following functions of time are called moment functions:

µ(t) = E (Xt) (expectation function)σ2(t) = Var(Xt) (variance function)γ(s, t) = Cov(Xs ,Xt) (covariance function)

Correlation function (autocorrelation function)

ρ(s, t) =γ(s, t)√

σ2(s)√σ2(t)

moments.R [1]


BasicsEstimation of moment functions

Usually, the moment functions are unknown and have to be estimated

Problem: Only a single path (realization) can be observed

X(1)1 X

(2)1 . . . X

(n)1

X(1)2 X

(2)2 . . . X

(n)2

...... . . .

...

X(1)T X

(2)T . . . X

(n)T

Can we still estimate the expectation function µ(t) and theautocovariance function γ(s, t)? Under which conditions?



X(1)1 X

(2)1 . . . X

(n)1

X(1)2 X

(2)2 . . . X

(n)2

...... . . .

...

X(1)T X

(2)T . . . X

(n)T

Usually, the expectation function µ(t) should be estimated byaveraging over realizations,

µ(t) =1

n

n∑i=1

X(i)t



X(1)1 X

(2)1 . . . X

(n)1

X(1)2 X

(2)2 . . . X

(n)2

...... . . .

...

X(1)T X

(2)T . . . X

(n)T

Under certain conditions, µ(t) can be estimated byaveraging over time,

µ =1

T

T∑t=1

Xt(1)



X(1)1 X

(2)1 . . . X

(n)1

X(1)2 X

(2)2 . . . X

(n)2

...... . . .

...

X(1)T X

(2)T . . . X

(n)T

Usually, the autocovariance γ(t, t + h) should be estimated byaveraging over realizations,

γ(t, t + h) =1

n

n∑i=1

(X(i)t − µ(t))(X

(i)t+h − µ(t + h))



X(1)1 X

(2)1 . . . X

(n)1

X(1)2 X

(2)2 . . . X

(n)2

...... . . .

...

X(1)T X

(2)T . . . X

(n)T

Under certain conditions, γ(t, t + h) can be estimated byaveraging over time,

γ(t, t + h) =1

T

T−h∑t=1

(Xt(1) − µ)(Xt+h

(1) − µ)


BasicsDefinition

Moment functions cannot be estimated without additionalassumptions since only one path is observed

There are restrictions which allow to estimate the moment functions

Restriction of the time heterogeneity:The distribution of (Xt(ω))t∈T must not be completely different foreach t ∈ TRestriction of the memory:If the values of the process are coupled too closely over time, theindividual observations do not supply any (or only insufficient)information about the distribution


BasicsRestriction of time heterogeneity: Stationarity

Definition: Strong stationarity

Let (Xt)t∈T be a stochastic process, and let t1, . . . , tn ∈ T be an arbitrarynumber of n ∈ N arbitrary time points.

(Xt)t∈T is called strongly stationary if for arbitrary h ∈ Z

P(Xt1 ≤ x1, . . . ,Xtn ≤ xn) = P(Xt1+h ≤ x1, . . . ,Xtn+h ≤ xn)

Implication: all univariate marginal distributions are identical



Definition: Weak stationarity

(Xt)t∈T is called weakly stationary if

1 the expectation exists and is constant: E (Xt) = µ <∞ for all t ∈ T2 the variance exists and is constant: Var(Xt) = σ2 <∞ for all t ∈ T3 for all t, s, r ∈ Z (in admissible range)

γ(t, s) = γ (t + r , s + r)

Simplified notation for covariance and correlation functions

γ(h) = γ(t, t + h)

ρ(h) = ρ(t, t + h)



Strong stationarity implies weak stationarity(but only if the first two moments exist)

A stochastic process is called Gaussian if the joint distributionof Xt1 , . . . ,Xtn is multivariate normal

For Gaussian processes, weak and strong stationarity coincide

Intuition: An observed time series can be regarded as a realization ofa stationary process, if a gliding window of

”appropriate width“

always displays”qualitatively the same“ picture

stationary.R

Examples [2]


BasicsRestriction of memory: Ergodicity

Definition: Ergodicity (I)

Let (Xt)t∈T be a weakly stationary stochastic process with expectation µand autocovariance γ(h); define

µ =1

T

T∑t=1

Xt

(Xt)t∈T is called (expectation) ergodic, if

limT→∞

E[(µT − µ)2

]= 0



Definition: Ergodicity (II)

Let (Xt)t∈T be a weakly stationary stochastic process with expectation µand autocovariance γ(h); define

γ(h) =1

T

T−h∑t=1

(Xt − µ)(Xt+h − µ)

(Xt)t∈T is called (covariance) ergodic, if for all h ∈ Z

limT→∞

E[(γ(h)− γ(h))2

]= 0



Ergodicity is consistency (in quadratic mean) of the estimators µ of µand γ(h) of γ(h) for dependent observations

The process (Xt)t∈T is expectation ergodic if (γ(h))h∈Z isabsolutely summable, i.e.

∞∑h=−∞

|γ(h)| <∞

The dependence between far away observations must be sufficientlysmall



Ergodicity condition (for autocovariance): A stationary Gaussianprocess (Xt)t∈T with absolutely summable autocovariance functionγ(h) is (autocovariance) ergodic

Under ergodicity, the law of large numbers holds even if theobservations are dependent

If the dependence γ(h) does not diminish fast enough,the estimators are no longer consistent

Examples [3]



Summary of estimators electricity.R

µ = XT =1

T

T∑t=1

Xt

γ(h) =1

T

T−h∑t=1

(Xt − µ)(Xt+h − µ)

ρ(h) =γ(h)

γ(0)

Sometimes, γ(h) is defined with factor 1/(T − h)



A closer look at the expectation estimator µ

The estimator µ is unbiased, i.e. E (µ) = µ [4]

The variance of µ is [5]

Var (µ) =γ (0)

T+

2

T

T−1∑h=1

(1− h

T

)γ (h)

Under ergodicity, for T →∞

T · Var (µ)→ γ (0) + 2∞∑h=1

γ (h) =∞∑

h=−∞γ(h)



For Gaussian processes, µ is normally distributed

µ ∼ N (µ,Var(µ))

and asymptotically

√T (µ− µ)→ Z ∼ N

(0, γ (0) + 2

∞∑h=1

γ (h)

)

For non-Gaussian processes, µ is (often) asymptotically normal

√T (µ− µ)→ Z ∼ N

(0, γ (0) + 2

∞∑h=1

γ (h)

)



A closer look at the autocovariance estimators γ(h)

For Gaussian processes with absolutely summable covariance function,(√T (γ (0)− γ (0)) , . . . ,

√T (γ (K )− γ (K ))

)′is multivariate normal with expectation vector (0, . . . , 0)′ and

T · Cov (γ (h1) , γ (h2))

=∞∑

r=−∞(γ (r) γ (r + h1 + h2) + γ (r − h2) γ (r + h1))



A closer look at the autocorrelation estimators ρ(h)

For Gaussian processes with absolutely summable covariance function,the random vector(√

T (ρ (0)− ρ (0)) , . . . ,√T (ρ (K )− ρ (K ))

)′is multivariate normal with expectation vector (0, . . . , 0)′ and acomplicated covariance matrix

Be careful: For small to medium sample sizes the autocovariance andautocorrelation estimators are biased!

autocorr.R



An important special case for autocorrelation estimators:

Let (εt) be a white-noise process with Var (εt) = σ2 <∞, then

E (ρ (h)) = −T−1 + O(T−2)

Cov (ρ (h1) , ρ (h2)) =

T−1 + O(T−2) for h1 = h2

O(T−2

)else

For white-noise processes and long time series, the empiricalautocorrelations are approximately independent normal randomvariables with expectation −T−1 and variance T−1


Mathematical digression (I)Complex numbers

Some quadratic equations do not have real solutions, e.g.

x2 + 1 = 0

Still it is possible (and sensible) to define solutions to such equations

The definition in common notation is

i =√−1

where i is the number which, when squared, equals −1

The number i is called imaginary (i.e. not real)



Other imaginary numbers follow from this definition, e.g.

√−16 =

√16√−1 = 4i√

−5 =√

5√−1 =

√5i

Further, it is possible to define numbers that containboth a real part and an imaginary part, e.g. 5− 8i or a + bi

Such numbers are called complex and the set of complex numbersis denoted as CThe pair a + bi and a− bi is called conjugate complex



Geometric interpretation:

seq(

0, 8

, len

gth

= 1

1)

real axis

imag

inar

y ax

is a+bi

imaginary part b

real part a

absolute value r

θ



Polar coordinates and Cartesian coordinates

z = a + bi

= r · (cos θ + i sin θ)

= re iθ

a = r cos θ

b = r sin θ

r =√a2 + b2

θ = arctan

(∣∣∣∣ba∣∣∣∣)



Rules of calculus:

Addition(a + bi) + (c + di) = (a + c) + (b + d)i

Multiplication (cartesian coordinates)

(a + bi) · (c + di) = (ac − bd) + (ad + bc)i

Multiplication (polar coordinates)

r1eiθ1 · r2e iθ2 = r1r2e

i(θ1+θ2)



Addition:

seq(

−2,

8, l

engt

h =

11)

real axis

imag

inar

y ax

is

a+bi

c+di



Addition:

seq(

−2,

8, l

engt

h =

11)

real axis

imag

inar

y ax

is

a+bi

c+di



Addition:

seq(

−2,

8, l

engt

h =

11)

real axis

imag

inar

y ax

is

a+bi

c+di

(a+c)+(b+d)i



Multiplication:

seq(

−2,

8, l

engt

h =

11)

real axis

imag

inar

y ax

is

r1

θ1

r2

θ2



Multiplication:

seq(

−2,

8, l

engt

h =

11)

real axis

imag

inar

y ax

is

r1

θ1

r2

θ2

r = r1 ⋅ r2

θ = θ1 + θ2



The quadratic equation

x2 + px + q = 0

has the solutions

x = −p

2±√

p2

4− q

If p2

4 − q < 0 the solutions are complex (and conjugate)



Example: The solutions of

x2 − 2x + 5 = 0

are

x = −(−2)

2+

√(−2)2

4− 5 = 1 + 2i

and

x = −(−2)

2−√

(−2)2

4− 5 = 1− 2i


Mathematical digression (II)Linear difference equations

First order difference equation with initial value x0:

xt = c + φ1xt−1

p-th order difference equation with initial value x0:

xt = c + φ1xt−1 + . . .+ φpxt−p

A sequence (xt)t=0,1,... that satisfies the difference equationis called a solution of the difference equation

Examples (diffequation.R)



We only consider the homogeneous case, i.e. c = 0

The general solution of the first-order difference equation

xt = φ1xt−1

isxt = A · φt1

with arbitrary constant A since xt = Aφt1 = φ1Aφt−11 = φ1xt−1

The constant is definitized by the initial condition, A = x0

The sequence xt = Aφt1 is convergent if and only if |φ1| < 1



Solution of the p-th order difference equation

xt = φ1xt−1 + . . .+ φpxt−p

Let xt = Az−t , then

Az−t = φ1Az−(t−1) + . . .+ φpAz

−(t−p)

z−t = φ1z−(t−1) + . . .+ φpz

−(t−p)

and thus1− φ1z

1 − . . .− φpzp = 0

Characteristic polynomial, characteristic equation



There are p (possibly complex, possibly nondistinct) solutionsof the characteristic equation

Denote the solutions (called roots) by z1, . . . , zp

If all roots are real and distinct, then

xt = A1z−t1 + . . .+ Apz

−tp

is a solution of the homogeneous difference equation

If there are complex roots the solution is oscillating

The constants A1, . . . ,Ap can be definitized with p initial conditions(x0, x−1, . . . , xp−1)



Stability condition: The linear difference equation

xt = φ1xt−1 + . . .+ φpxt−p

is stable (i.e. convergent) if and only if all roots of thecharacteristic polynomial

1− φ1z − . . .− φpzp = 0

are outside the unit circle, i.e. |zi | > 1 for all i = 1, . . . , p

In R, the stability condition can be checked easily using thecommands polyroot (base package) or ArmaRoots (fArma package)


ARMA modelsDefinition

Definition: ARMA process

Let (εt)t∈T be a white noise process; the stochastic process

Xt = φ1Xt−1 + . . .+ φpXt−p + εt + θ1εt−1 + . . .+ θqεt−q

with φp, θq 6= 0 is called ARMA(p, q) process

AutoRegressive Moving Average process

ARMA processes are important since every stationary process can beapproximated by an ARMA process


ARMA modelsLag operator and lag polynomial

The lag operator is a convenient notational tool

The lag operator L shifts the time index of a stochastic process

L (Xt)t∈T = (Xt−1)t∈TLXt = Xt−1

Rules

L2Xt = L (LXt) = Xt−2

LnXt = Xt−n

L−1 = Xt+1

L0Xt = Xt



Lag polynomial

A(L) = a0 + a1L + a2L2 + . . .+ apL

p

Example: Let A(L) = 1− 0.5L and B(L) = 1 + 4L2, then

C (L) = A(L)B(L)

= (1− 0.5L)(1 + 4L2

)= 1− 0.5L + 4L2 − 2L3

Lag polynomials can be treated in the same way as ordinarypolynomials



Define the lag polynomials

Φ(L) = 1− φ1L− . . .− φpLp

Θ(L) = 1 + θ1L + . . .+ θqLq

The ARMA(p, q) process can be written compactly as

Φ(L)Xt = Θ(L)εt

Important special cases

MA(q) process : Xt = εt + θ1εt−1 + . . .+ θqεt−q

AR(1) process : Xt = φ1Xt−1 + εt

AR(p) process : Xt = φ1Xt−1 + · · ·+ φpXt−p + εt


ARMA modelsMA(q) process

The MA(q) process is

Xt = Θ(L)εt

Xt = εt + θ1εt−1 + . . .+ θqεt−q

with εt ∼ NID(0, σ2ε)

Expectation function

E (Xt) = E (εt + θ1εt−1 + . . .+ θqεt−q)

= E (εt) + θ1E (εt−1) + . . .+ θqE (εt−q)

= 0



Autocovariance function

γ (s, t)

= E[

(εs + θ1εs−1 + . . .+ θqεs−q) (εt + θ1εt−1 + . . .+ θqεt−q)]

= E[εsεt + θ1εsεt−1 + θ2εsεt−2 + . . .+ θqεsεt−q

+θ1εs−1εt + θ21εs−1εt−1 + θ1θ2εs−1εt−2 + . . .+ θ1θqεs−1εt−q

+ . . .

+θqεs−qεt + θ1θqεs−qεt−1 + θ2θqεs−qεt−2 + . . .+ θ2qεs−qεt−q

]The expectations of the cross products are

E (εsεt) =

0 for s 6= tσ2ε for s = t



Define θ0 = 1, then

γ (t, t) = σ2ε

∑q

i=0θ2i

γ (t − 1, t) = σ2ε

∑q−1

i=0θiθi+1

γ (t − 2, t) = σ2ε

∑q−2

i=0θiθi+2

γ (t − q, t) = σ2εθ0θq = σ2

εθq

γ (s, t) = 0 for s < t − q

Hence, MA(q) processes are always stationary

Simulation of MA(q) processes (maqsim.R)


ARMA modelsAR(1) process

The AR(1) process is

Φ(L)Xt = εt

(1− φ1L)Xt = εt

Xt = φ1Xt−1 + εt


Expectation and variance function [6]

Stability condition: AR(1) processes are stable if |φ1| < 1


ARMA modelsAR(1) process

Stationarity: Stable AR(1) processes are weakly stationary if [7]

E (X0) = 0

Var(X0) =σ2ε

1− φ21

Nonstationary stable processes converge towards stationarity [8]

It is common parlance to call stable processes stationary

Covariance function of stationary AR(1) process [9]


ARMA modelsAR(p) process

The AR(p) process is

Φ(L)Xt = εt

Xt = φ1Xt−1 + . . .+ φpXt−p + εt


Assumption: εt is independent from Xt−1, Xt−2, . . . (innovations)

Expectation function [10]

The covariance function is complicated (ar2autocov.R)


ARMA modelsAR(p) process

AR(p) processes are stable if all roots of the characteristic equation

Φ(z) = 0

are larger than 1 in absolute value, |zi | > 1 for i = 1, . . . , p

An AR(p) process is weakly stationary if the joint distribution of thep initial values (X0,X−1, . . . ,X−(p−1)) is

”appropriate“

Stable AR(p) processes converge towards stationarity;they are often called stationary

Simulation of AR(p) processes (arpsim.R)


ARMA modelsInvertability

AR and MA processes can be inverted (into each other)

Example: Consider the stable AR (1) process with |φ1| < 1


= φ1(φ1Xt−2 + εt−1) + εt

= φ21Xt−2 + φ1εt−1 + εt

...

= φn1Xt−n + φn−11 εt−(n−1) + . . .+ φ2

1εt−2 + φ1εt−1 + εt



Since |φ1| < 1

Xt =∞∑i=0

φi1εt−i

= εt + θ1εt−1 + θ2εt−2 + . . .

with θi = φi1A stable AR(1) process can be written as an MA(∞) process(the same is true for stable AR(p) processes)



Using lag polynomials this can be written as

(1− φ1L)Xt = εt

Xt = (1− φ1L)−1εt

Xt =∞∑i=0

(φ1L)iεt

General compact and elegant notation

Φ(L)Xt = εt

Xt = (Φ(L))−1εt

= Θ(L)εt



MA(q) can be written as AR(∞) if all roots of Θ(z) = 0 are largerthan 1 in absolute value (invertability condition)

Example: MA(1) with |θ1| < 1; from

Xt = εt + θ1εt−1

θ1Xt−1 = θ1εt−1 + θ21εt−2

we find Xt = θ1Xt−1 + εt − θ21εt−2

Repeated substitution of the εt−i terms yields

Xt =∞∑i=1

φiXt−i + εt with φi = (−1)i+1 θi1



Summary

ARMA(p, q) processes are stable if all roots of

Φ(z) = 0

are larger than 1 in absolute value

ARMA(p, q) processes are invertible if all roots of

Θ(z) = 0

are larger than 1 in absolute value



Sometimes (e.g. for proofs), it is useful to write an ARMA(p, q)process either as AR(∞) or as MA(∞)

ARMA(p, q) can be written as AR(∞) or MA(∞)

Φ(L)Xt = Θ(L)εt

Xt = (Φ(L))−1 Θ(L)εt

(Θ(L))−1 Φ(L)Xt = εt


ARMA modelsDeterministic components

Until now we only considered processes with zero expectation

Many processes have both a zero-expectation stochasticcomponent (Yt) and a non-zero deterministic component (Dt)

Examples:

linear trend Dt = a + btexponential trend Dt = abt

saisonal patterns

Let (Xt)t∈Z be a stochastic process with deterministic component Dt

and define Yt = Xt − Dt



Then E (Yt) = 0 and

Cov (Yt ,Ys) = E [(Yt − E (Yt)) (Ys − E (Ys))]

= E [(Xt − Dt − E (Xt−Dt))(Xs − Ds − E (Xs−Ds))]

= E [(Xt − E (Xt)) (Xs − E (Xs))]

= Cov (Xt ,Xs)

The covariance function does not depend on the deterministiccomponent

To derive the covariance function of a stochastic process, simply dropthe deterministic component



Special case: Dt = µt = µ

ARMA(p, q) process with constant (non-zero) expectation

Xt − µ = φ1(Xt−1 − µ) + . . .+ φp(Xt−p − µ)

+εt + θ1εt−1 + . . .+ θqεt−q

The process can also be written as

Xt = c + φ1Xt−1 + . . .+ φpXt−p + εt + θ1εt−1 + . . .+ θqεt−q

where c = µ (1− φ1 − . . .− φp)



Wold’s representation theorem: Every stationary stochastic process(Xt)t∈T can be represented as

Xt =∞∑h=0

ψhεt−h + Dt

with ψ0 = 1,∑∞

h=0 ψ2j <∞ and εt white noise with variance σ2 > 0

Stationary stochastic processes can be written as a sum of adeterministic process and an MA(∞) process

Often, low order ARMA(p, q) processes can approximate MA(∞)processes well


ARMA modelsLinear processes and filter

Definition: Linear process

Let (εt)t∈Z be a white noise process; a stochastic process (Xt)t∈Z is calledlinear if it can be written as

Xt =∞∑

h=−∞ψhεt−h

= Ψ(L)εt

where the coefficients are absolutely summable, i.e.∑∞

h=−∞ |ψh| <∞.

The lag polynomial Ψ(L) is called (linear) filter



Some special filters

Change from previous period (difference filter)

Ψ(L) = 1− L

Change from last year (for quarterly or monthly data)

Ψ(L) = 1− L4

Ψ(L) = 1− L12

Elimination of saisonal influences (quarterly data)

Ψ(L) =(1 + L + L2 + L3

)/4

Ψ(L) = 0.125L2 + 0.25L + 0.25 + 0.25L−1 + 0.125L−2



Hodrick-Prescott filter (important tool in empirical macro economics)

Decompose a time series (Xt) into a long-term growth component(Gt) and a short-term cyclical component (Ct)

Xt = Gt + Ct

Trade-off between goodness-of-fit and smoothness of Gt

Minimize the criterion function

T∑t=1

(Xt − Gt)2 + λ

T−1∑t=2

[(Gt+1 − Gt)− (Gt − Gt−1)]2

with respect to Gt for given smoothness parameter λ



The FOCs of the minimization problem are G1...

GT

= A

X1...

XT

where A = (I + λK ′K )−1 with

K =

1 −2 1 0 0 . . . 0 0 00 1 −2 1 0 . . . 0 0 00 0 1 −2 1 . . . 0 0 0...

......

...... . . .

......

...0 0 0 0 0 . . . 1 −2 1



The HP filter is a linear filter

Typical values for smoothing parameter λ

λ = 10 annual dataλ = 1600 quarterly dataλ = 14400 monthly data

Implementation in R (code by Olaf Posch)

Empirical examples (hpfilter.R)


Estimation of ARMA modelsThe estimation problem

Problem: The parameters φ1, . . . , φp, θ1, . . . , θq, σ2ε of an ARMA(p, q)

process are usually unknown

They have to be estimated from an observed time series X1, . . . ,XT

Standard estimation methods:

Least squares (OLS)Maximum likelihood (ML)

Assumption: the lag orders p and q are known


Estimation of ARMA modelsLeast squares estimation of AR(p) models

The AR(p) model with non-zero constant expectation

Xt = c + φ1Xt−1 + . . .+ φpXt−p + εt

can be writte in matrix notationXp+1

Xp+2...

XT

=

1 Xp Xp−1 . . . X1

1 Xp+1 Xp . . . X2...

......

. . ....

1 XT−1 XT−2 . . . XT−p

cφ1...φp

+

εp+1

εp+2...εT

Compact notation: y = Xβ + u


Estimation of ARMA modelsLeast squares estimation of AR(p) models

The standard least squares estimator is

β =(X′X

)−1X′y

The matrix of exogenous variables X is stochastic−→ usual results for OLS regression do not hold

But: There is no contemporaneous correlation between the error termand the exogenous variables

Hence, the OLS estimators are consistent and asymptotically efficient


Estimation of ARMA modelsLeast squares estimation of ARMA models

Solve the ARMA equation

Xt = c + φ1Xt−1 + . . .+ φpXt−p + εt + θ1εt−1 + . . .+ θqεt−q

for εt ,

εt = Xt − c − φ1Xt−1 − . . .− φpXt−p − θ1εt−1 − . . .− θqεt−q

Define the residuals as functions of the unknown parameters

εt (d , f1, . . . , fp, g1, . . . , gq) = Xt − d − f1Xt−1 − . . .− fpXt−p

−g1εt−1 − . . .− gq εt−q



Define the sum of squared residuals

S (d , f1, . . . , fp, g1, . . . , gq) =T∑t=1

(εt (d , f1, . . . , fp, g1, . . . , gq))2

The least squares estimators are

(c , φ1, . . . , φp, θ1, . . . , θq) = arg min S (d , f1, . . . , fp, g1, . . . , gq)

Since the residuals are defined recursively one needs starting valuesε0, . . . , ε−q+1 and X0, . . . ,X−p+1 to calculate ε1

Easiest way: Set all starting values to zero (”conditional estimation“)



The first order conditions are a nonlinear equation systemwhich cannot be solved easily

Minimization by standard numerical methods(implemented in all usual statistical packages)

Either solve the nonlinear first order conditions equation system orminimize S

Simple special case: ARMA(1, 1)

arma11.R


Estimation of ARMA modelsMaximum likelihood estimation

Additional assumption: The innovations εt are normally distributed

Implication: ARMA processes are Gaussian

The joint distribution of X1, . . . ,XT is multivariat normal

X =

X1...

XT

∼ N (µ,Σ)



Expectation vector

µ = E

X1

...XT

=

c/ (1− φ1 − . . .− φp)...

c/ (1− φ1 − . . .− φp)

Covariance matrix

Σ = Cov

X1

X2...

XT

=

γ(0) γ(1) . . . γ(T − 1)γ(1) γ(0) . . . γ (T − 2)

......

. . ....

γ(T − 1) γ (T − 2) . . . γ(0)



The expectation vector and the covariance matrix contain allunknown parameters ψ =

(φ1, . . . , φp, θ1, . . . , θq, c , σ

2ε

)The likelihood function is

L (ψ; X) = (2π)−T/2 (det Σ)−1/2 exp

(−1

2(X− µ)′Σ−1 (X− µ)

)and the loglikelihood function is

ln L (ψ; X) = −T

2ln (2π)− 1

2ln (det Σ)− 1

2(X− µ)′Σ−1 (X− µ)

The ML estimators are ψ = arg max ln L (ψ; X)



The loglikelihood function has to be maximized by numerical methods

Standard properties of ML estimators:

1 consistency2 asymptotic efficiency3 asymptotically jointly normally distributed4 the covariance matrix of the estimators can be consistently estimated

Example: ML estimation of an ARMA(3, 3) model for the interestrate spread (arma33.R)


Estimation of ARMA modelsHypothesis tests

Since the estimation method is maximum likelihood,the classical tests (Wald, LR, LM) are applicable

General null and alternative hypotheses

H0 : g(ψ) = 0

H1 : not H0

where g(ψ) is an m-valued function of the parameters

Example: If H0 : φ1 = 0 then m = 1 and g(ψ) = φ1



Likelihood ratio test statistic

LR = 2(ln L(θML)− ln L(θR))

where θML and θR are the unrestricted and restricted estimators

Under the null hypothesis

LRd−→ U ∼ χ2

m

and H0 is rejected at significance level α if LR > χ2m;1−α

Disadvantage: Two models must be estimated



For the Wald test we only consider g(ψ) = ψ − ψ0, i.e.

H0 : ψ = ψ0

H1 : not H0

Test statisticW = (ψ − ψ0)′Cov(ψ)(ψ − ψ0)

If the null hypothesis is true then Wd−→ U ∼ χ2

m

The asymptotic covariance matrix can be estimated consistently asCov(ψ) = H−1 where H is the Hessian matrix returned by themaximization procedure



Test example 1:

H0 : φ1 = 0

H1 : φ1 6= 0

Test example 2

H0 : ψ = ψ0

H1 : not H0

Illustration (arma33.R)


Estimation of ARMA modelsModel selection

Usually, the lag orders p and q of an ARMA model are unknown

Trade-off: Goodness-of-fit against parsimony

Akaike’s information criterion for the model with non-zero expectation

AIC = ln σ2︸︷︷︸goodness-of-fit

+ 2 (p + q + 1) /T︸︷︷︸penalty

Choose the model with the smallest AIC



Bayesian information criterion BIC (Schwarz information criterion)

BIC = ln σ2 + (p + q + 1) · lnT/T

Hannan-Quinn information criterion

HQ = ln σ2 + 2 (p + q + 1) · ln (lnT ) /T

Both BIC and HQ are consistent while the AIC tends to overfit

Illustration (arma33.R)



Another illustration: The true model is ARMA(2, 1) withXt = 0.5Xt−1 + 0.3Xt−2 + εt + 0.7εt−1; 1000 samples of size n = 500were generated; the table shows the model’s orders p and q as selected byAIC and BIC

# orders selected by AIC # orders selected by BICq q

p 0 1 2 3 4 5 0 1 2 3 4 50 0 0 0 0 0 0 0 0 0 0 0 01 0 18 64 23 14 6 0 310 167 4 0 02 0 171 21 16 5 7 0 503 3 1 0 03 0 7 35 58 80 45 1 0 2 1 0 04 9 2 12 139 37 44 6 1 0 0 0 05 11 6 12 56 46 56 1 0 0 0 0 0


Integrated processesDifference operator

Define the difference operator

∆ = 1− L,

then∆Xt = Xt − Xt−1

Second order differences

∆2 = ∆(∆) = (1− L)2 = 1− 2L + L2

Higher orders ∆n are defined in the same way; note that ∆n 6= 1− Ln


Integrated processesDefinition

Definition: Integrated process

A stochastic process is called integrated of order 1 if

∆Xt = µ+ Ψ(L)εt

where εt is white noise, Ψ(1) 6= 0, and∑∞

j=0 j |ψj | <∞

Common notation: Xt ∼ I (1)

I (1) processes are also called difference stationary orunit root processes

Stochastic and deterministic trends

Trend stationary processes are not I (1) (since Ψ(1) = 0)



Stationary processes are sometimes called I (0)

Higher order integrations are possible, e.g.

Xt ∼ I (2)

∆2Xt ∼ I (0)

In general, Xt ∼ I (d) means that ∆dXt ∼ I (0)

Most economic time series are either I (0) or I (1)

Some economic time series may be I (2)



Example 1: The random walk with drift, Xt = b + Xt−1 + εt ,is I (1) because

∆Xt = Xt − Xt−1

= b + εt

= b + Ψ(L)εt

where ψ0 = 1 and ψj = 0 for j 6= 0



Example 2: The trend stationary process, Xt = a + bt + εt ,is not I (1) because

∆Xt = b + εt − εt−1

= Ψ(L)εt

with ψ0 = 1, ψ1 = −1 and ψj = 0 for all other j



Example 3: The”AR(2) process“

Xt = b + (1 + φ)Xt−1 − φXt−2 + εt

(1− φL) (1− L)Xt = b + εt

is I (1) if |φ| < 1 because ∆Xt = Ψ(L) (b + εt) with

Ψ(L) = (1− φL)−1 = 1 + φL + φ2L2 + φ3L3 + φ4L4 + . . .

and thus Ψ(1) =∑∞

i=0 φi = 1

1−φ 6= 0. The roots of the characteristicequation are z = 1 and z = 1/φ



Example 4: The process

Xt = 0.5Xt−1 − 0.4Xt−2 + εt

is a stationary (stable) zero expectation AR(2) process; the process

Yt = a + bt + Xt

is trend stationary and I (0) since

∆Yt = b + ∆Xt

with ∆Xt = Ψ(L)εt = (1− L)(1− 0.5L + 0.4L2

)−1εt

and therefore Ψ(1) = 0 (i0andi1.R)



Definition: ARIMA process

Let (εt)t∈T be a white noise process; the stochastic process (Xt)t∈Z iscalled integrated autoregressive moving-average process of the ordersp, d and q, or ARIMA(p, d , q), if ∆dXt is an ARMA(p, q) process

Φ(L)∆dXt = Θ(L)εt

For d > 0 the process is nonstationary (I (d)) even if all roots ofΦ(z) = 0 are outside the unit circle

Simulation of an ARIMA(p, d , q) process (arimapdqsim.R)


Integrated processesDeterministic versus stochastic trends

Why is it important to distinguish deterministic and stochastic trends?

Reason 1: Long-term forecasts and forecasting errors

Deterministic trend: The forecasting error variance is bounded

Stochastic trend: The forecasting error variance is unbounded

Illustrations

i0andi1.R


Integrated processesDeterministic versus stochastic trends

Why is it important to distinguish deterministic and stochastic trends?

Reason 2: Spurious regression

OLS regressions will show spurious relationships betweentime series with (deterministic or stochastic) trends

Detrending works if the series have deterministic trends,but it does not help if the series are integrated

Illustrations

spurious1.R


Integrated processesIntegrated processes and parameter estimation

OLS estimators (and ML estimators) are consistent andasymptotically normal for stationary processes

The asymptotic normality is lost if the processes are integrated

We only look at the very special case


with εt ∼ NID(0, 1) and X0 = 0

The AR(1) process is stationary if |φ1| < 1 and has a unit rootif |φ1| = 1



The usual OLS estimator of φ1 is

φ1 =

∑Tt=1 XtXt−1∑Tt=1 X

2t−1

How does the distribution of φ look like?

Influence of φ and T

Consistency?

Asymptotic normality?

Illustration (phihat.R)



Consistency and asymptotic normality for I (0) processes (|φ1| < 1)

plim φ1 = φ1

√T(φ1 − φ1

)d→ Z ∼ N

(0, 1− φ2

1

)Consistency and asymptotic normality for I (1) processes (φ1 = 1)

plim φ1 = 1

T(φ1 − 1

)d→ V

where V is a nondegenerate, nonnormal random variable

Root-T -consistency and superconsistency


Integrated processesUnit root tests

Importance to distinguish between trend stationarity and differencestationarity

Test of hypothesis that a process has a unit root (i.e. is I (1))

Classical approaches: (Augmented) Dickey-Fuller-Test,Phillips-Perron-Test

Basic tool: Linear regression

Xt = deterministics + φXt−1 + εt

∆Xt = deterministics + (φ− 1)︸︷︷︸=:β

Xt−1 + εt


Integrated processesUnit root tests

Null and alternative hypothesis

H0 : φ = 1 (unit root)

H1 : |φ| < 1 (no unit root)

or, equivalently,

H0 : β = 0 (unit root)

H1 : β < 0 (no unit root)

Unit root tests are one-sided; explosive process are ruled out

Rejecting the null hypothesis is evidence in favour of stationarity

If the null hypothesis is not rejected, there could be a unit root


Integrated processesDF test and ADF test

Dickey-Fuller (DF) and Augmented Dickey-Fuller (ADF) tests

Possible regressions

Xt = φXt−1 + εt or ∆Xt = βXt−1 + εtXt = a + φXt−1 + εt or ∆Xt = a + βXt−1 + εtXt = a + bt + φXt−1 + εt or ∆Xt = a + bt + βXt−1 + εt

Assumption for Dickey-Fuller test: no autocorrelation in εt

If there is autocorrelation in εt , use the augmented DF test



Dickey-Fuller regression, case 1: no constant, no trend

∆Xt = βXt−1 + εt

Null and alternative hypotheses

H0 : β = 0

H1 : β < 0

Null hypothesis: stochastic trend without drift

Alternative hypothesis: stationary process around zero



Dickey-Fuller regression, case 2: constant, no trend

∆Xt = a + βXt−1 + εt


H0 : β = 0 or H0 : β = 0, a = 0

H1 : β < 0 or H0 : β < 0, a 6= 0

Null hypothesis: stochastic trend without drift

Alternative hypothesis: stationary process around a constant



Dickey-Fuller regression, case 3: constant and trend

∆Xt = a + bt + βXt−1 + εt


H0 : β = 0 or β = 0, b = 0

H1 : β < 0 or β < 0, b 6= 0

Null hypothesis: stochastic trend with drift

Alternative hypothesis: trend stationary process



Dickey-Fuller test statistics for single hypotheses

“ρ-test” : T · β“τ -test” : β/σφ

The τ -test statistic is computed in the same way asthe usual t-test statistic

Reject the null hypothesis if the test statistics are too small

The critical values are not the quantiles of the t-distribution

There are tables with the correct critical values(e.g. Hamilton, table B.6)



The Dickey-Fuller test statistics for the joint hypotheses arecomputed in the same way as the usual F -test statistics

Reject the null hypothesis if the test statistic is too large

The critical values are not the quantiles of the F -distribution

There are tables with the correct critical values(e.g. Hamilton, table B.7)

Illustrations (dftest.R)



If there is autocorrelation in εt the DF test does not work (dftest.R)

Augmented Dickey-Fuller test (ADF test) regressions:

∆Xt = γ1∆Xt−1 + . . .+ γp∆Xt−p + βXt−1 + εt∆Xt = a + γ1∆Xt−1 + . . .+ γp∆Xt−p + βXt−1 + εt∆Xt = a + bt + γ1∆Xt−1 + . . .+ γp∆Xt−p + βXt−1 + εt

The added lagged differences capture the autocorrelation

The number of lags p must be large enough to make εt white noise

The critical values remain the same as in the no-correlation case



Further interesting topics (but we skip these)

Phillips-Perron test

Structural breaks and unit roots

KPSS test of stationarity

H0 : Xt ∼ I (0)

H1 : Xt ∼ I (1)


Integrated processesRegression with integrated processes

Spurious regression: If Xt and Yt are independent but both I (1)then the regression

Yt = α + βXt + ut

will result in an estimated coefficient β that is significantly differentfrom 0 with probability 1 as T →∞BUT: The regression

Yt = α + βXt + ut

may be sensible even though Xt and Yt are I (1)

Cointegration


Integrated processesRegression with integrated processes

Definition: Cointegration

Two stochastic processes (Xt)t∈T and (Yt)t∈T are cointegrated if bothprocesses are I (1) and there is a constant β such that the process(Yt − βXt) is I (0)

If β is known, cointegration can be tested using a standard unit roottest on the process (Yt − βXt)

If β is unknown, it can be estimated from the linear regression

Yt = α + βXt + ut

and cointegration is tested using a modified unit root test on theresidual process (ut)t=1,...,T


GARCH modelsConditional expectation

Let (X ,Y ) be a bivariate random variable with a joint densityfunction, then

E (X |Y = y) =

∫ ∞−∞

x fX |Y=y (x)dx

is the conditional expectation of X given Y = y

E (X |Y ) denotes a random variable with realization E (X |Y = y)if the random variable Y realizes as y

Both E (X |Y ) and E (X |Y = y) are called conditional expectation


GARCH modelsConditional variance

Let (X ,Y ) be a bivariate random variable with a joint densityfunction, then

Var(X |Y = y) =

∫ ∞−∞

(x − E (X |Y = y))2fX |Y=y (x)dx

is the conditional variance of X given Y = y

Var(X |Y ) denotes a random variable with realization Var(X |Y = y)if the random variable Y realizes as y

Both Var(X |Y = y) and Var(X |Y ) are called conditional variance


GARCH modelsRules for conditional expectations

1 Law of iterated expectations: E (E (X |Y )) = E (X )

2 If X and Y are independent, then E (X |Y ) = E (X )

3 The condition can be treated like a constant,E (XY |Y ) = Y · E (X |Y )

4 The conditional expecation is a linear operator. For a1, . . . , an ∈ R

E

(n∑

i=1

aiXi |Y

)=

n∑i=1

aiE (Xi |Y )


GARCH modelsBasics

Some economic time series show volatility clusters, e.g. stock returns,commodity price changes, inflation rates, . . .

Simple autoregressive models cannot capture volatility clusters sincetheir conditional variance is constant

Example: Stationary AR(1)-process, Xt = αXt−1 + εt with |α| < 1;then

Var(Xt) = σ2X =

σ2ε

1− α2,

and the conditional variance is

Var(Xt |Xt−1) = σ2ε


GARCH modelsBasics

In the following, we will focus on stock returns

Empirical fact: squared (or absolute) returns are positivelyautocorrelated

Implication: Returns are not independent over time

The dependence is nonlinear

How can we model this kind of dependence?


GARCH modelsARCH(1)-process

Definition: ARCH(1)-process

The stochastic process (Xt)t∈Z is called ARCH(1)-process if

E (Xt |Xt−1) = 0

Var(Xt |Xt−1) = σ2t

= α0 + α1X2t−1

for all t ∈ Z, with α0, α1 > 0

Often, an additional assumption is

Xt | (Xt−1 = xt−1) ∼ N(0, α0 + α1x2t−1)



The unconditional distribution of Xt is a non-normal distribution

Leptokurtosis: The tails are heavier than the tails of the normaldistribution

Example of an ARCH(1)-process

Xt = εtσt

where (εt)t∈Z is white noise with σ2ε = 1 and

σt =√α0 + α1X 2

t−1



One can show that [11]

E (Xt |Xt−1) = 0

E (Xt) = 0

Var(Xt |Xt−1) = α0 + α1X2t−1

Var(Xt) = α0/ (1− α1)

Cov(Xt ,Xt−i ) = 0 for i > 0

Stationarity condition: 0 < α1 < 1

The unconditional kurtosis is 3(1− α21)/(1− 3α2

1) if εt ∼ N(0, 1).If α1 >

√1/3 = 0.57735, the kurtosis does not exist. [12]



Squared returns follow [13]

X 2t = α0 + α1X

2t−1 + vt

with vt = σ2t (ε2

t − 1)

Thus, squared returns of ARCH(1) are AR(1)

The process (vt)t∈Z is white noise

E (vt) = 0

Var(vt) = E (v2t ) = const.

Cov(vt , vt−i ) = 0 (i = 1, 2, . . .)



Simulation of an ARCH(1)-process for t = 1, . . . , 2500

Parameters: α0 = 0.05, α1 = 0.95, start value X0 = 0

Conditional distribution: εt ∼ N(0, 1)

archsim.R

Check whether the simulated time series shows the typical stylizedfacts of return distributions


GARCH modelsEstimation of an ARCH(1)-process

Of course, we do not know the true values of the model parametersα0 and α1

How can we estimate the unknown parameters α0 and α1 ?

Observations X1, . . . ,XT

Because ofX 2t = α0 + α1X

2t−1 + vt

a possible estimation method is OLS



OLS estimator of α1

α1 =

∑Tt=2

(X 2t − X 2

t

)(X 2t−1 − X 2

t−1

)∑T

t=2

(X 2t−1 − X 2

t−1

)2≈ ρ(X 2

t ,X2t−1)

Careful: These estimators are only consistent if the kurtosis exists(i.e. if α1 <

√1/3)

Test of ARCH-effects

H0 : α1 = 0

H1 : α1 > 0



For T large, under H0

√T α1 ∼ N(0, 1)

Reject H0 if√T α1 > Φ−1(1− α)

Second version of this test: Consider the R2 of the regression

X 2t = α0 + α1X

2t−1 + vt ,

then under H0

T α21 ≈ TR2 appr∼ χ2

1

Reject H0 if TR2 > F−1χ2

1(1− α)


GARCH modelsARCH(p)-process

Definition: ARCH(p)-process

The stochastic process (Xt)t∈Z is called ARCH(p)-process if

E (Xt |Xt−1, . . .Xt−p) = 0

Var(Xt |Xt−1, . . . ,Xt−p) = σ2t

= α0 + α1X2t−1 + . . .+ αpX

2t−p

for t ∈ Z, where αi ≥ 0 for i = 0, 1, . . . , p − 1 and αp > 0

Often, an additional assumption is that

Xt |(Xt−1 = xt−1, . . . ,Xt−p = xt−p) ∼ N(0, σ2t )



Example of an ARCH(p)-process

Xt = εtσt

where(εt)t∈Z is white noise with σ2ε = 1 and

σt =√α0 + α1X 2

t−1 + . . .+ αpX 2t−p

An ARCH(p) process is weakly stationary if all roots of1− α1z − α2z

2 − . . .− αpzp = 0 are outside the unit circle

Then, for all t ∈ Z, E (Xt) = 0 and

Var(Xt) =α0

1−∑p

i=1 αi



If (Xt)t∈Z is a stationary ARCH(p) process, then (X 2t )t∈Z is a

stationary AR(p) process

X 2t = α0 + α1X

2t−1 + . . .+ αpX

2t−p + vt

As to the error term,

E (vt) = 0

Var(vt) = const.

Cov(vt , vt−i ) = 0 for i = 1, 2, . . .

Simulating an ARCH(p) is easy


GARCH modelsEstimation of ARCH(p) models

OLS estimation of

X 2t = α0 + α1X

2t−1 + . . .+ αpX

2t−p + vt

Test of ARCH-effects

H0 : α1 = α2 = . . . = αp = 0 vs H1 : not H0

Let R2 denote the coefficient of determination of the regression

Under H0, the test statistic TR2 ∼ χ2p;

thus reject H0 if TR2 > F−1χ2p

(1− α)


GARCH modelsMaximum likelihood estimation

Basic idea of the maximum likelihood estimation method:Choose parameters such that the joint density of the observations

fX1,...,XT(x1, . . . , xT )

is maximized

Let X1, . . . ,XT denote a random sample from X

The density fX (x ; θ) depends on R unknown parametersθ = (θ1, . . . , θR)



ML estimation of θ: Maximize the (log)likelihood function

L (θ) = fX1,...XT(x1, . . . , xT ; θ)

=T∏t=1

fX (xt ; θ)

ln L (θ) =T∑t=1

ln fX (xt ; θ)

ML estimateθ = argmax [ln L (θ)]



Since observations are independent in random samples

fX1,...,XT(x1, . . . , xT ) =

T∏t=1

fXt (xt)

or

ln fX1,...,XT(x1, . . . , xT ) =

T∑t=1

ln fXt (xt)

=T∑t=1

ln fX (xt)

But: ARCH-returns are not independent!



Factorization with dependent observations

fX1,...,XT(x1, . . . , xT ) =

T∏t=1

fXt |Xt−1,...,X1(xt |xt−1, . . . , x1)

or

ln fX1,...,XT(x1, . . . , xT ) =

T∑t=1

ln fXt |Xt−1,...,X1(xt |xt−1, . . . , x1)

Hence, for an ARCH(1)-process

fX1,...,XT(x1, . . . , xT ) = fX1(x1)

T∏t=2

1√

2π√σ2t

exp

(−1

2

(xtσt

)2)



The marginal density of X1 is complicated but becomes negligible forlarge T and, therefore, will be dropped from now on

Log-likelihood function (without initial marginal density)

ln L(α0, α1|x1, . . . , xT )

= −T

2ln 2π − 1

2

T∑t=2

lnσ2t −

1

2

T∑t=2

(xtσt

)2

where σ2t = α0 + α1x

2t−1

ML-estimation of α0 and α1 by numerical maximization ofln L(α0, α1) with respect to α0 and α1


GARCH modelsGARCH(p,q)-process

Definition: GARCH(p,q)-process

The stochastic process (Xt)t∈Z is called GARCH(p, q)-process if

E (Xt |Xt−1,Xt−2, . . .) = 0

Var(Xt |Xt−1,Xt−2, . . .) = σ2t

= α0 + α1X2t−1 + . . .+ αpX

2t−p

+β1σ2t−1 + . . .+ βqσ

2t−q

for t ∈ Z with αi , βi ≥ 0

Often, an additional assumption is that

(Xt |Xt−1 = xt−1,Xt−2 = xt−2, . . .) ∼ N(0, σ2t )



Conditional variance of GARCH(1, 1)

Var(Xt |Xt−1,Xt−2, . . .) = σ2t

= α0 + α1X2t−1 + β1σ

2t−1

=α0

1− β1+ α1

∞∑i=1

βi−11 X 2

t−i

Unconditional variance

Var(Xt) =α0

1−∑p

i=1 αi −∑q

j=1 βj



Necessary condition for weak stationarity

p∑i=1

αi +

q∑j=1

βj < 1

(Xt)t∈Z has no autocorrelation

GARCH-processes can be written as ARMA(max (p, q) , q)-processesin the squared returns

Example: GARCH(1, 1)-process with Xt = εtσt andσ2t = α0 + α1X

2t−1 + β1σ

2t−1


GARCH modelsEstimation of GARCH(p,q)-processes

Estimation of the ARMA(max (p, q) , q)-process in the squared returns

Alternative (and better) method: Maximum likelihood

For a GARCH(1, 1)-process

fX1,...,XT(x1, . . . , xT )

= fX1(x1)T∏t=2

1√

2π√σ2t

exp

(−1

2

(xtσt

)2)



Again, the density of X1 can be neglected

Log-Likelihood function

ln L(α0, α1, β1|x1, . . . , xT )

= −T

2ln 2π − 1

2

T∑t=2

lnσ2t −

1

2

T∑t=2

(xtσt

)2

with σ2t = α0 + α1x

2t−1 + β1σ

2t−1 and σ2

1 = 0

ML-estimation of α0, α1 and β1 by numerical maximization



Conditional h-step forecast of the volatility σ2t+h in a GARCH(1, 1)

model

E(σ2t+h|Xt ,Xt−1, . . .

)= (α1 + β1)h

(σ2t −

α0

1− α1 − β1

)+

α0

1− α1 − β1

If the process is stationary

limh→∞

E (σ2t+h|Xt ,Xt−1, . . .) =

α0

1− α1 − β1

Simulation of GARCH-processes is easy; the estimation can becomputer intensive


GARCH modelsResiduals of an estimated GARCH(1,1) model

Careful: Residuals are slightly different from what you know from OLSregressions

Estimates: α0, α1, β1, µ

From σ2t = α0 + α1X

2t−1 + β1σ

2t−1 and Xt = µ+ σtεt we calculate

the standardized residuals

εt =Xt − µσt

=Xt − µ√

α0 + α1X 2t−1 + β1σ2

t−1

Histogram of the standardized residuals


GARCH modelsAR(p)-ARCH(q)-models

Definition: (Xt)t∈Z is called AR(p)-ARCH(q)-process if

Xt = µ+ φ1Xt−1 + εt

σ2t = α0 + α1ε

2t−1

where εt ∼ N(0, σ2t )

mean equation / variance equation

Maximum likelihood estimation


GARCH modelsExtensions of the GARCH model

There are a number of possible extensions to the GARCH model:

Empirical fact: Negative shocks have a larger impact on volatilitythan positive shocks (leverage effect)

News impact curve

Nonnormal innovations, e.g. εt ∼ tν


Documents

Slides a13 14