Leonardo Rojas-Nandayapa

Preview:

Citation preview

Approximation of Heavy-tailed distributionsvia infinite dimensional phase–type

distributions

Leonardo Rojas-Nandayapa

The University of Queensland

ANZAPW March, 2015.Barossa Valley, SA, Australia.

1 / 36

Summary of Mogens’ talk...

2 / 36

Phase–typeDistributions of first passage times associated to a MarkovJump Process with a finite number of states.

Facts

1. Dense class in the nonnegative distributions.2. Mathematically tractable.3. However, it is a subclass of light–tailed distributions.

3 / 36

Key problemFor most applications, a phase–type approximation will suffice.However, for certain important problems this fail.

A partial solutionConsider Markov Jump processes with infinite number ofstates. This class contains heavy–tailed distribution but alsomany other degenerate cases.

4 / 36

A more refined approach: Infinite mixtures of PHConsider X a random variable with phase–type distribution, andlet S be a nonnegative discrete random variable. Then thedistribution of the random variable

Y := S · X

is an infinite mixture of phase–type distributions.Such a class is mathematically tractable and obviously dense inthe nonnegative distributions.

5 / 36

Let F , G and H be the cdf of Y , S and X respectively. Then

F (y) =

∫ ∞0

G(y/s)dH(s), y ≥ 0.

The right hand side is called the Mellin–Stieltjes transform.

If G is phase–type, the F is absolutely continuous with density

f (y) =

∫ ∞0

g(y/s)dH(s), y ≥ 0.

6 / 36

Some interesting questions related to this class of distributions?

7 / 36

1. What are the possible tail behaviors in this class?2. How to approximate a “theoretic” heavy–tailed distribution?3. Given a sample y1, . . . , yn. How to estimate the parameters

of a distribution in this class?

8 / 36

Heavy Tails

9 / 36

Recall that...

Heavy–tailed distributionsA nonnegative random variable X has a heavy–taileddistribution iff

E [eθX ] =∞, ∀θ > 0.

Equivalently,

lim supx→∞

P(X > x)

e−θx =∞, ∀θ > 0.

Otherwise we say X is light–tailed.

10 / 36

Characterization via Convolutions

Let X1, X2 be nonnegative iid rv with unbounded support,1. It holds that

lim infx→∞

P(X1 + X2 > x)

P(X1 > x)≥ 2.

2. Moreover, X1 is heavy–tailed iff

lim infx→∞

P(X1 + X2 > x)

P(X1 > x)= 2.

3. Furthermore, X is subexponential iff

limx→∞

P(X1 + X2 > x)

P(X1 > x)= 2.

11 / 36

Heavy–tailed random variables arise asI Exponential transformations of some light–tailed random

variables (normal—lognormal, exponential—Pareto).I As weak limits of normalized maxima (also normalized

sums).I As reciprocals of certain random variables.I Random sums of light–tailed distributions.I Products of light–tailed distributions.

12 / 36

Problem 1:What are the possible tail behaviours in this class?

13 / 36

TheoremLet X be a phase–type distribution and S be a nonnegativerandom variable. Define

Y := S · X

Y is heavy–tailed iff S has unbounded support.

14 / 36

ExampleLet X ∼ exp(λ) and S any nonnegative discrete distributionsupported over a countable set {s1, s2, . . . }.

limy→∞

P(S · X > y)

e−θy = limy→∞

∑∞k=1 e−λy/skP(S = sk )

e−θy =∞.

(choose sk large enough and such that λ/sk < θ).

15 / 36

Theorem (generalization)Let S ∼ H1 and X ∼ H2, and let Y := S · X .

1. If there exist θ > 0 and ξ(x) a nonnegative function suchthat

lim supx→∞

eθx(

H1(x/ξ(x)) + H2(ξ(x)))

= 0, (1)

then Y is light–tailed.2. If there exists ξ(x) a nonnegative function such that for all

θ > 0 it holds that

lim supx→∞

eθx H1(x/ξ(x)) · H2(ξ(x)) =∞, (2)

then Y is heavy–tailed.

16 / 36

ExampleLet S ∼Weibull(λ,p) and Y ∼Weibull(β,q). Then Y := S ·X islight–tailed iff

1p

+1q≤ 1.

Otherwise it is heavy–tailed.

17 / 36

What are the possible maximum domains of attraction?

18 / 36

Frechét domain of attraction

Breiman’s LemmaIf S is a α–regularly varying random variable and there existsδ > 0 such that E [Xα+δ] <∞, then

Y := S · X

has an α–regularly varying distribution. Moreover

P(Y > x) ∼ E [X ]P(S > x).

19 / 36

Breiman’s LemmaIf L1/S is a α–regularly varying function then

Y := S · X

has an α–regularly varying distribution. Moreover

P(Y > x) ∼ E [X ]P(S > x).

20 / 36

Gumbel domain of attraction and Subexponentiality

Let V (x) = (−1)η−1L(η−1)1/S (x).1

TheoremIf V (·) is a von Mises function, then F ∈ MDA(Λ). Moreover, if

lim infx→∞

V (tx)V ′(x)

V ′(tx)V (x)> 1, ∀t > 1,

then F is subexponential.

1η is the largest dimension among the Jordan blocks associated to thelargest eigenvalue of the sub–intensity matrix

21 / 36

Problem 2:How to approximate theoretical heavy–tailed distributions?

22 / 36

An approach for approximating theoretic distributions

I Take a sequence of Erlang rvs

Xn ∼ Erlang(n,n).

I Consider increasing sequences

Sn := {sn,0, sn,1, sn,2, . . . }, n ∈ N

such that sn,0 = 0, sn,i →∞ as i →∞ and

∆n := sup{sn,i+1 − sn,i : i ∈ N} → 0,

as n→ 0 For each n, define the cdf’s

Fn(x) = F (sn,i+1), x ∈ [sn,i , sn,i+1).

23 / 36

Construct a sequences of random variables

Sn ∼ Fn, Xn ∼ Erlang(n,n).

Since Xn → 1, then Slutsky’s theorem implies that

Sn · Xn → X ∼ F .

Note: It can be modified and will also work for light–taileddistributions.

24 / 36

0 1 2 3 4 50

0.5

1

1.5

2

2.5

3

3.5

25 / 36

Ruin problem

DefinitionLet F and G be two distribution functions with non-negativesupport. The relative error from F to G from 0 to s is definedby

supx∈[0,s]

|F (x)−G(x)||F (x)|

.

26 / 36

TheoremFix u > 0 and let F and G be such that

supx∈[0,u]

|FI(x)−GI(x)||F I(x)|

≤ ε(u)

where ε : R+ → R+ is some non-decreasing function. Let FI bethe integrated tail of F and let G and approximation of F .Further assume that F dominates G stochastically. Then

|ψF (u)− ψ̂G(u)| ≤ ε(u)E (Number of down-crossings; A),

where A := {Ruin happens in the last down-crossing}.

27 / 36

Problem 3:Statistical inference?

28 / 36

Key IdeaSee the first passages times as incomplete data set from theevolution of the Markov Jump Process. Employ the EMalgorithm for estimating the parameters.

29 / 36

Density of an infinite mixture of PH

fA(y) =∞∑

i=1

πi(θ)αeT y/si t/si , x ≥ 0.

Complete Data Likelihood

`c(θ,α,T )

=∞∑

i=1

Li logπi(θ) +∞∑

i=1

p∑k=1

Bik logαk +

∞∑i=1

∑k 6=`

N ik` log

(Tk`

si

)

−∞∑

i=1

∑k 6=`

Tk`

siZ i

k +∞∑

i=1

p∑k=1

Aik log

(tksi

)−∞∑

i=1

p∑k=1

tksi

Z ik .

30 / 36

EM estimatorsAssume (θ,α,T ) is the current set of parameters in the EMalgorithm. The one step EM-estimators are thus given by

θ̂ = argmaxΘ

∞∑i=1

(αUi t) · logπi(Θ),

α̂ =diag (α)U t

N,

T̂kl = ( diag (R)−1T ◦ R)k`, k 6= `,

t̂ = αU diag (t) diag (R)−1,

where

R = R(θ,α,T ) =∞∑

i=1

N∑j=1

πi(θ)

si

J(yj/si ;α,T , t

)fA(yj ;θ,α,T )

,

Ui = Ui(θ,α,T ) =N∑

j=1

πi(θ)

si

exp(

T yjsi

)fA(yj ;θ,α,T )

,

U = U(θ,α,T )31 / 36

In the above

Ji(y) :=

∫ y

0eTi (y−u)tiαeTi udu.

PropositionLet c = max{Tkk : k = 1, . . . ,p} and set K := c−1T + I . Define

D(s) := D(s;α,T ) =1c

s∑r=0

K r tαK s−r

Then

J(y ; θ,α,T ) =∞∑

s=0

(cy)s+1e−cy

(s + 1)!D(s).

32 / 36

An alternative modelLet r ∈ (0,1).

fB(y) = r ·αeT y t + (1− r)∞∑

i=1

πi(θ)λq

i yq−1e−λi y

Γ(q − 1),

33 / 36

PropositionThe EM estimators take the form

r̂ =rαUt

N

α̂ =diag (α)Ut

αUtT̂k` =

(diag (R)−1T ◦ R′

)k`

t̂ = αU diag (t) · diag (R)−1

θ̂ = argmaxΘ

∞∑i=1

wi logπi(Θ)

λ̂ =∞∑

i=1

wi

/ ∞∑i=1

vi

si.

34 / 36

where

R = R(α,T , λ, θ) =N∑

j=1

J(yj ;α,T , t

)fB(yj ; r ,α,T ,θ, λ)

.

U = U(α,T , λ, θ) =N∑

j=1

exp(T yj)

fB(yj ; r ,α,T ,θ, λ)

wi = wi(α,T , λ, θ) = πi(θ)N∑

j=1

g(yj ; q, λ/si)

fB(yj ; r ,α,T ,θ, λ)

vi = vi(α,T , λ, θ) = πi(θ)N∑

j=1

yj · g(yj ; q, λ/si)

fB(yj ; r ,α,T ,θ, λ)

35 / 36

Advantages

I Fast.I Recovers the parameters of simulated data.I Works reasonable well with real data.I Converges to the MLE estimators most of the times.I Provides a better approximation for the tails.

Disadvantages

I There are some (minor) numerical issues yet to beresolved.

I MLE estimators are not appropriate for estimating “tailparameters”. EVT methods could be implemented.

I Sensitive with respect to the tail parameters.I Kullback-Leibler does not work properly. Though we now

have the first simpler approach.

36 / 36