A Primer on the Exponential Family of Distributions David Clark & Charles Thayer American Re-Insurance GLM Call Paper - 2004

A Primer on the Exponential Family of Distributions

A Primer on the Exponential Family of Distributions

David Clark & Charles Thayer

American Re-Insurance

GLM Call Paper - 2004

2

AgendaAgenda

• Brief Introduction to GLM

• Overview of the Exponential Family

• Some Specific Distributions

• Suggestions for Insurance Applications

3

Context for GLMContext for GLM

Linear Regression

Generalized Linear Models

Maximum Likelihood

XYE ][

Y~ Normal

XhYE ][

Y ~ Exponential Family

,][ XhYE

Y ~ Any Distribution

4

Advantages over Linear Regression

Advantages over Linear Regression

• Instead of linear combination of covariates, we can use a function of a linear combination of covariates

• Response variable stays in original units

• Great flexibility in variance structure

5

Transforming the Response versus

Transforming the Covariates

Transforming the Response versus

Transforming the Covariates

Linear Regression GLM

E[g(y)] = X· E[y] = g-1(X·)

Note that if g(y)=ln(y), then Linear Regression cannot handle any points where y0.

6

Advantages of this Special Case of Maximum LikelihoodAdvantages of this Special

Case of Maximum Likelihood

• Pre-programmed in many software packages

• Direct calculation of standard errors of key parameters

• Convenient separation of Mean parameter from “nuisance” parameters

7

Advantages of this Special Case of Maximum LikelihoodAdvantages of this Special

Case of Maximum Likelihood• GLM useful when theory immature,

but experience gives clues about:How mean response affected by

external influences, covariates

How variability relates to mean

Independence of observations

Skewness/symmetry of response distribution

8

General Form of the Exponential FamilyGeneral Form of the Exponential Family

iiiiii yhgyedyf exp ;

Note that yi can be transformed with any function e().

9

“Natural” Form of the Exponential Family

“Natural” Form of the Exponential Family

,exp , ; iiii

ii yca

byyf

Note that yi is no longer within a function. That is, e(yi)=yi.

10

Specific Members of the Exponential Family

Specific Members of the Exponential Family

• Normal (Gaussian)

• Poisson

• Negative Binomial

• Gamma

• Inverse Gaussian

11

Some Other Members of the Exponential Family

Some Other Members of the Exponential Family

• Natural FormBinomialLogarithmicCompound Poisson/Gamma (Tweedie)

• General Form [use ln(y) instead of y]LognormalSingle Parameter Pareto

12

Normal DistributionNormal Distribution

Natural Form:

2ln

2

12/exp)(

22 y

yyf

The dispersion parameter, , is replaced with 2 in the more familiar form of the Normal Distribution.

13

Poisson DistributionPoisson Distribution

Natural Form:

))!/ln((

ln)ln(exp)(Prob

yyy

yY

“Over-dispersed” Poisson allows 1.

Variance/Mean ratio =

14

Negative Binomial DistributionNegative Binomial Distribution

Natural Form:

/

1ln/lnlnexp)(Prob

)(

yk

k

ky

kyY

yk

The parameter k must be selected by the user of the model.

15

Gamma DistributionGamma Distribution

Natural Form:

)(ln)ln()1()ln(exp)(

y

yyf

Constant Coefficient of Variation (CV):

CV = -1/2

16

Inverse Gaussian DistributionInverse Gaussian Distribution

Natural Form:

3

22ln

2

111

2exp)( y

y

yyf

17

Table of Variance FunctionsTable of Variance Functions

Distribution Variance Function

Normal Var(y) = Poisson Var(y) = ·

Negative Binomial Var(y) = ·+(/k)·2

Gamma Var(y) = ·2

Inverse Gaussian Var(y) = ·3

18

The Unit Variance FunctionThe Unit Variance Function

We define the “Unit Variance” function as

V() = Var(y) / a()

That is, =1 in the previous table.

19

Uniqueness PropertyUniqueness Property

The unit variance function V() uniquely identifies its parent distribution type within the natural exponential family.

f(y) V()

20

Table of Skewness CoefficientsTable of Skewness Coefficients

Distribution Skewness

Normal 0

Poisson CV

Negative Binomial [1+/(+k)]·CV

Gamma 2·CV

Inverse Gaussian 3·CV

21

Graph of Skewness versus CVGraph of Skewness versus CV

0

1

2

3

4

5

6

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2

Coefficient of Variation (CV)

Co

effi

cien

t o

f S

kew

nes

s

NegativeBinomial

LogNormal

InverseGaussianGamma

Poisson

Normal

22

The Big Question:The Big Question:

What should the variance function look like for insurance applications?

23

What is the Response Variable?What is the Response Variable?

• Number of Claims

• Frequency (# claims per unit of exposure)

• Severity

• Aggregate Loss Dollars

• Loss Ratio (Aggregate Loss / Premium)

• Loss Rate (Aggregate Loss per unit of exposure)

24

An Example for Considering Variance Structure

An Example for Considering Variance Structure

Accident OnLevel Trended LossYear Premium Ult. Loss Ratio

1994 290,662 1,275,543 438.84%1995 391,490 47,490 12.13%1996 72,742,613 70,544,925 96.98%1997 265,124,454 161,625,762 60.96%1998 279,159,910 173,569,322 62.18%1999 339,612,341 246,497,223 72.58%2000 439,322,504 290,588,625 66.14%2001 469,582,172 327,742,407 69.79%2002 524,216,086 312,057,030 59.53%2003 869,036,055 689,968,152 79.39%

How would you calculate the mean and variance in these loss ratios?

25

Defining a Variance StructureDefining a Variance Structure

We intuitively know that variance changes with loss volume – but how?

This is the same as asking

“V() = ?”

26


We want CV to decrease with loss size, but not too quickly. GLM provides several approaches:

• Negative Binomial Var(y) = · +(/k)·2

• Tweedie Var(y) = ·p 1<p<2

• Weighted L-S Var(y) = /w

27

The Negative BinomialThe Negative Binomial

The variance function:

Var(y) = · + (/k)·2

random systematic

variance variance

28

The “Tweedie” DistributionThe “Tweedie” Distribution

Tweedie Neg. Binomial

Frequency Poisson Poisson

Severity Gamma Logarithmic (exponential when p=1.5)

Both the Tweedie and the Negative Binomial can be thought of as intermediate cases between the Poisson and Gamma distributions.

29


Negative Binomial

Tweedie

kCV

lim

0lim

CV

30

Defining a Variance StructureDefining a Variance StructureComparison of Negative Binomial and Tweedie CV's

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

100 1,000 10,000 100,000

Logarithm of Expected Loss Size

Co

effi

cien

t o

f V

aria

tio

n (

CV

)

Negative Binomial Tweedie (p=1.5)

Asymptotic to .200

Asymptotic to 0

31

Weighted Least-SquaresWeighted Least-Squares

Use Normal Distribution but set

a() = /wi

such that, variance is proportional to some external exposure weight wi.

This is equivalent to weighted least-squares: L-S = Σ(yi-i)2·wi

32

ConclusionConclusion

A model fitted to insurance data should reflect the variance structure of the phenomenon being modeled.

GLM provides a flexible tool for doing this.

Documents

A Primer on the Exponential Family of Distributions David Clark & Charles Thayer American Re-Insurance GLM Call Paper - 2004