14
1 Distributions, Moments, and Background First, we will start with a few definitions and concepts that we will use throughout the rest of this manual. Also, please note that the level of mathematical sophistication in the first few chapters is a little higher than the rest of the book. The point is to get comfortable with the mathematical knowledge early on. Practice makes perfect, and math is the indispensable toolbox you’ll need to start practicing. 1.1 Introduction We will start by laying down the foundation for the rest of the book. Treat this section as a primer to jog your memory of elementary level statistics. Much of the material should have been covered in Exam P, so we won’t go into all the gory details. Let’s get started! Statistics allows us to model things in real life which occur with some uncertainty. Actuaries use statistics to model things like frequency and severity of car accidents or a retiree’s mortality. Thus, it is important to begin with the building blocks of statistical models and probability. Definition 1.1. An event is a set of outcomes that occur with some probability. Remark. We typically use capital letters in the beginning of the alphabet, like A, B, or C, to denote events. We might write Pr(A) to denote the probability that event A occurs. Definition 1.2. A random variable is a variable which can take on multiple values. The set of values that it can take is called its sample space (often denoted S), and the probabilities corresponding to each value is defined by its distribution. Example 1.3. We toss a coin which lands ‘heads’ with probability p, and ‘tails’ with probability 1 - p. Let X be a random variable which equals 1 if the coin toss resulted in ‘heads’ and 0 otherwise. Define A to be the event that we our coin toss resulted in ‘tails’. Then, p = Pr(X = 1)= 1 - Pr(X = 0)= 1 - Pr(A) The sample space of X is {0, 1}. Remark. A random variable (RV) usually is denoted by a capital letter at the end of the alphabet, such as X , Y , or Z. Unlike for events, we cannot just write Pr(X ) because that would be meaningless. We can only talk about the probability of X being equal to some number, say, x: Pr(X = x). We tend to use the lowercase form of the letter denoting the random variable to represent “realizations” (namely data) for that random variable. Definition 1.4. A distribution describes the relationship between a set of values and their associated probabilities. For a discrete distribution , the sample space S is a set of discrete numbers, e.g. a set of integers, whereas for a continuous distribution , S is comprised of continuous interval(s), such as the positive real line. Definition 1.5. A probability mass function (PMF) f (x) for a discrete random variable X is defined as: f (x) := Pr(X = x) for all x in the sample space of X . A probability density function (PDF) f (x) for a continuous random variable X expresses a relative probability of x, and Pr(x 1 X x 2 )= Z x 2 x 1 f (x)dx A cumulative distribution function (CDF) F (x) applies to both continuous and discrete random variables and is defined as follows for a random variable X : F (x) := Pr(X x) Remark. We will interchangeably use the term PDF to denote both the probability mass function and the probability density function. 8

1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

1 Distributions, Moments, and BackgroundFirst, we will start with a few definitions and concepts that we will use throughout the rest of this manual. Also, pleasenote that the level of mathematical sophistication in the first few chapters is a little higher than the rest of the book.The point is to get comfortable with the mathematical knowledge early on. Practice makes perfect, and math is theindispensable toolbox you’ll need to start practicing.

1.1 IntroductionWe will start by laying down the foundation for the rest of the book. Treat this section as a primer to jog your memoryof elementary level statistics. Much of the material should have been covered in Exam P, so we won’t go into all thegory details. Let’s get started!

Statistics allows us to model things in real life which occur with some uncertainty. Actuaries use statistics tomodel things like frequency and severity of car accidents or a retiree’s mortality. Thus, it is important to begin withthe building blocks of statistical models and probability.

Definition 1.1. An event is a set of outcomes that occur with some probability.

Remark. We typically use capital letters in the beginning of the alphabet, like A, B, or C, to denote events. We mightwrite Pr(A) to denote the probability that event A occurs.

Definition 1.2. A random variable is a variable which can take on multiple values. The set of values that it cantake is called its sample space (often denoted S), and the probabilities corresponding to each value is defined by itsdistribution.

Example 1.3. We toss a coin which lands ‘heads’ with probability p, and ‘tails’ with probability 1� p. Let X be arandom variable which equals 1 if the coin toss resulted in ‘heads’ and 0 otherwise. Define A to be the event that weour coin toss resulted in ‘tails’. Then,

p = Pr(X = 1) = 1�Pr(X = 0) = 1�Pr(A)

The sample space of X is {0,1}.

Remark. A random variable (RV) usually is denoted by a capital letter at the end of the alphabet, such as X , Y , orZ. Unlike for events, we cannot just write Pr(X) because that would be meaningless. We can only talk about theprobability of X being equal to some number, say, x: Pr(X = x). We tend to use the lowercase form of the letterdenoting the random variable to represent “realizations” (namely data) for that random variable.

Definition 1.4. A distribution describes the relationship between a set of values and their associated probabilities.For a discrete distribution , the sample space S is a set of discrete numbers, e.g. a set of integers, whereas for acontinuous distribution , S is comprised of continuous interval(s), such as the positive real line.

Definition 1.5. A probability mass function (PMF) f (x) for a discrete random variable X is defined as:

f (x) := Pr(X = x)

for all x in the sample space of X .A probability density function (PDF) f (x) for a continuous random variable X expresses a relative probability

of x, andPr(x1 X x2) =

Z x2

x1f (x)dx

A cumulative distribution function (CDF) F(x) applies to both continuous and discrete random variables and isdefined as follows for a random variable X :

F(x) := Pr(X x)

Remark. We will interchangeably use the term PDF to denote both the probability mass function and the probabilitydensity function.

8

Page 2: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

The most important thing to remember about probabilities–the total probability over the entire sample space is 1.Below are some properties that either reiterate, or, follow from this fact.

1. If X is a continuous random variable, Pr(X = x) = 0 for any x.

2. If X is a discrete random variable with sample space S = {x1, . . . ,xk}, then Âx2S f (x) = Âki=1 f (xi) = 1.

3. If X is a continuous random variable with sample space S, where S is possibly the real line, or any subset of it,then

RS f (x)dx = 1.

4. The CDF is a function that always increases monotonically from 0 to 1.

Example 1.6. Let X be a discrete random variable with a sample space of {1,2,3,4}. Then, we can use the PDF tocompute the the probability that X is between 2 and 4, inclusive, by summing up f (x) for values in this given interval.

Pr(2 X 4) =4

Âx=2

f (x)

Note that if X was, instead, a continuous random variable which takes on values in the interval [1,4], then we wouldhave computed the same probability as

Pr(2 X 4) =Z 4

2f (x)dx

Most times, discrete distributions differ from continuous distributions in that we use a summation (because we have afinite number of values in our sample space) as opposed to integrals (where in the continuous case, we have intervalsto deal with).

If, for some reason, we had F(x), the CDF, at our disposal, we could compute the same probability (in the discretecase) via

Pr(2 X 4) = Pr(X 4)�Pr(X 1) = F(4)�F(1)

and (in the continuous case) via

Pr(2 X 4) = Pr(X 4)�Pr(X 2) = Pr(X 4)�Pr(X < 2) = F(4)�F(2)

The example illustrates the fact that both the PDF and CDF can be used to compute probabilities. Moreover, notice thatbecause continuous distributions give assign 0 probability to specific points, namely Pr(X = 2) = 0, our calculationusing the CDF is slightly different than for the discrete distribution.Remark. We will interchangeably use Pr(X = x) and P(X = x) to denote the probability that a random variable Xtakes on the value x. Both are frequently used in texts!

Example 1.3 demonstrates one of the simplest distributions–a Bernoulli distribution, which is used to model 0-1outcomes. The Exam 4 Tables contains a long list of distributions (along with their PDFs and CDFs!) that will comeup on the exam. You should print it out and keep it handy as you go through this book. Our problems and exampleswill require you to work with formulas that are included in the tables, and the sooner you can become familiarizedwith where to find what, the faster you can be on the exam.

When dealing with continuous distributions, there is a nice relationship between PDFs and CDFs, given to us bythe Fundamental Theorem of Calculus. Let X be a random variable with a continuous distribution. Then,

f (x) =ddx

F(x) (1.1)

or equivalently,

F(x) =Z x

�•f (t)dt (1.2)

These relationships are important because knowing them means you can compute the PDF on the fly if you onlyremember the CDF, or vice versa.

Let us now look at an even more concrete example which makes use of a distribution which is not in the Exam 4Tables, yet extremely easy to remember. We’ll define the distribution first.

9

Page 3: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

Definition 1.7. Let X be a discrete uniform random variable over integers in the interval {a, . . . ,b} (often denotedas X ⇠Unif({a, . . . ,b})). Then,

fX (x) =1

b�a+1x 2 {a, . . . ,b}

FX (x) =x�a+1b�a+1

x 2 {a, . . . ,b}

Let Y be a continuous uniform random variable over the interval (a,b) (often denoted as X ⇠Unif(a,b)). Then,

fY (y) =1

b�ay 2 (a,b)

FY (y) =

8><

>:

0 y < ax

b�a a y b1 y > b

Intuitively, a uniform distribution assigns to each x 2 S equal probability in the discrete case and equal density inthe continuous case. The expression for F(x) can be derived via (1.2). As will often be the case, you simply need tomemorize the PDF or the CDF if you remember how to go back and forth between them.

Also note that in Definition 1.7, we used subscripts on the PDFs and CDFs to distinguish between the functionsfor the 2 different random variables. This is good to do when there is room for ambiguity. Get into the habit of writingyour equations clearly to avoid making careless mistakes.

Example 1.8. Let X ⇠Unif({1,2,3,4}). Let Y ⇠Unif(1,4). Compute Pr(2 X 4) and Pr(2 Y 4).

Answer The formulas we use will come straight out of Example 1.6.

Pr(2 X 4) =4

Âx=2

fX (x) =4

Âx=2

14= 0.75

Pr(2 Y 4) =Z 4

2fY (y)dy =

Z 4

2

13

dy = 2/3

The uniform distribution is an example of a parametric distribution, because its definition contains the parametersa and b. All other distributions presented in the Exam 4 Tables are parametric as well.

Sometimes our data does not come from a parametric distribution. In fact, this is often the case when we go out tocollect real data, say a sample of size n, denoted {x1, . . . ,xn}. Yes, actuaries do actually go out in the real world fromtime to time! In the absence of any other information, we assign an equal probability to each of the n sample points(specifically 1/n).

Definition 1.9. An empirical model or empirical distribution is a discrete distribution derived from an observeddataset of n data points {x1, . . . ,xn}, where each observed value is assigned a probability of 1/n. Formally, theempirical PDF and CDF are given by,

fn(x) =# {data points equal to x}

n

Fn(x) =# {data points x}

n(1.3)

The PDF or CDF gives all that is needed to define a distribution. However, both functions can be hard to visualizewithout a graphing calculator, and in some instances, the CDF does not even have a closed form! Oftentimes, wemight want information that captures the important details of the distribution. This motivates our definition of somesummary statistics, like the mean. Let’s begin there and more summary statistics will follow throughout the remainderof this chapter.

10

Page 4: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

Definition 1.10. The mean or expectation of a random variable (also known as the expected value ) of a distribution(or of a random variable X following the distribution) is given by

E(X) = Âx2S

x f (x) (for a discrete distribution)

E(X) =Z

Sx f (x)dx (for a continuous distribution)

The mean is a way to visualize where the center of the distribution lies, and is one of the most important summarystatistics for a distribution.

In addition to computing the expectation of a random variable, we can also compute the expectation of a functionof a random variable.

Definition 1.11. The expectation of a function of a random variable g(X) is given by

E(g(X)) = Âx2S

g(x) f (x) (for a discrete distribution)

E(g(X)) =Z

Sg(x) f (x)dx (for a continuous distribution)

Note that in this definition, g(X) can contain variables other than X , but it can only contain one random variable,X .

Finally, we will round off our discussion with the topic of percentiles and the median.

Definition 1.12. The 100p-th percentile of a distribution is any value pp such that the CDF F(p�p ) p F(pp).

The median is defined to be the 50th percentile, p0.5.

The median is another useful measure of centrality. Compared to the mean, it is more robust (less sensitive) to thepresence of outliers.

Remark. If the CDF F(x) is continuous at pp, then the 100p-th percentile is simply a value pp that satisfies F(pp) = p.Any continuous distribution will be continuous at pp, regardless of p.

If the CDF F(x) is not continuous at pp, then we may need to use limits as in our definition. The minus sign in theexponent of F(p�

p ) signifies the limit of F(x) as x approaches pp from the left.Note that while a continuous distribution can have only 1 value for the 100p-th percentile, a discrete distribution

may have an interval of values corresponding to the 100p-th percentile.The following example illustrates all three points above for the case of p0.5, the median, but the idea extends to all

percentiles.

Example 1.13. Examine closely the 3 CDFs in the figure below. To satisfy your curiosity, Distribution 1 is a con-tinuous distribution, namely the standard normal distribution (which we define in Section 1.2). The other two arediscrete distributions, which you should recognize because of the discontinuous nature of their CDFs. Distribution 2is a Unif({1,2}) and Distribution 3 is a Unif({1,3,5}).

11

Page 5: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

Distribution 1

x

F(x)

(a) A continuous distribution. Median= 0.

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

Distribution 2

x

F(x)

(b) A discrete distribution. The set ofmedians is an interval [1,2).

0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

Distribution 3

x

F(x)

(c) A discrete distribution. Median = 3.

The graphical procedure for computing pp is to plot F(x), draw a horizontal line at F(x) = p, and read off thecorresponding values of x where the horizontal line intersects the CDF. Using this procedure, we find that the medianfor Distribution 1 is 0 and that the medians for Distribution 2 are the entire set [1,2). (Note that the ‘)’ to the right of‘2’ indicates that the value of 2 is not included in this interval, since F(2) = 1 > 0.5).

For Distribution 3, there is no point of intersection for the horizontal line, so we need to consider the limit. Wesee that as x approaches 3 from the left, F(x) is equal to 1/3, which, in the fancier notation of the definition, meansF(3�) = 1/3. Also, we see that F(3) = 2/3. These two values sandwich 0.5, so by the definition, our median is 3.

Now, as an exercise, convince yourself that our graphical procedure for computing percentiles is consistent withthe definition.

Note that the summary statistics (mean, percentile, and median) we have presented thus far can also be computedfor empirical models using the empirical PDF and CDF.

The Exam 4 Tables contain even fancier information on distributions other than PDFs, CDFs, and means. Theseinclude things like moments and VaR. We will cover those next, so that you can have a working understanding of themas we head on to Exam 4 territory!

1.2 MomentsRaw moments (sometimes just called moments) and central moments are examples of summary statistics. Means aregiven in the formula sheet in a general form–raw moments. Let’s see a formal definition for raw moments. In each ofthe definitions presented in this section, assume X to be the random variable of interest.

Definition 1.14. The k-th raw moment (of a random variable X) is defined to be E[Xk], and is sometimes denotedµ 0

k.

Note that the mean is equal to E[X1] or simply µ (no 0 symbol is used for the first raw moment).In order to calculate a k-th raw moment we need to integrate if the model is continuous and sum if the model is

discrete. If the model is mixed (both continuous and discrete) we need to integrate over the continuous portions andsum over the discrete portions.

Applying Definition 1.11, we have the following:

E[Xk] =Z •

�•xk f (x)dx if X is continuous (1.4)

or

E[Xk] = Âx2S

xk p(x) if X is discrete (1.5)

Example 1.15. Suppose X is uniformly distributed on (a,b). Calculate E(X).

12

Page 6: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

Answer Since X is uniformly distributed on (a,b), we know that f (x) = 1/(b�a). Thus,

E(X) =Z b

ax · 1

b�adx

=1

b�a

Z b

axdx

=1

b�ax2

2

����b

a

=b+a

2

Observe that the expected value for the uniform distribution is just the midpoint of the interval.

Definition 1.16. The k-th central moment (of a random variable X) is µk = E[(X �µ)k]. One example of a centralmoment is variance s2 or Var(X) (and rarely denoted µ2), is given by

Var(X) = s2 = E[(X �µ)2)]

Standard deviation is defined by s =p

Var(X).

Intuitively, the variance gives us a measure of how spread out a distribution is. The formula can be interpreted as:on average, how far away does a value x deviate from the mean µ? Unfortunately, the variance is still a little awkwardto visualize, because it is not in the same unit of measurement as X . For example, if X is the random variable denotingthe number of dollars you win in a bet, then E(X) is easily interpretable to be the average number of dollars youwould expect to win. Var(X) actually has units of dollars squared. To make that interpretable, we often use standarddeviation s =

pVar(X) to bring the measurement back to our familiar unit of dollars.

As promised, we now deliver some more summary statistics:

Definition 1.17.coefficient of variation =

skewness = g1 =E[(X �µ)3]

s3 =µ3

s3

kurtosis = g2 =E[(X �µ)4]

s4 =µ4

s4

Before we continue, for completeness, we would like to present the formulas to compute a k-th central moment forboth continuous and discrete distributions. These are just extensions of (1.4) and (1.5).

E[(X �µ)k] =Z •

�•(x�µ)k f (x)dx if X is continuous (1.6)

or

E[(X �µ)k] = Âx2S

(x�µ)k p(x) if X is discrete (1.7)

Example 1.18. Suppose X is uniformly distributed. Calculate the second central moment.

13

Page 7: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

Answer As in Example 1.15, f (x) = 1/(b�a). Thus,

Eh(X �µ)2

i=

Z b

a

✓x� b+a

2

◆2·✓

1b�a

◆dx

=1

b�a

Z b

a

x2 �2x

b+a2

+(b+a)2

4

�dx

=1

b�a

2

4✓

x3

3

◆����b

a�✓(b+a)x2

2

◆����b

a+

(b+a)2

4x

!�����

b

a

3

5

=(b�a)2

12

Next, we show an important relationship between second central and raw moments.

Variance =Var(X) = E[(X �µ)2)] = µ2 = E[X2]�E[X ]2 (1.8)

This relationship is often easier to compute since there is no shifting within the expectation calculation. If this isunclear, do not worry as yet since there will be many examples where we will use exactly this relationship.

Example 1.19. Redo Example 1.18 using the formula Var(X) = E(X2)�E(X)2.

Answer Recall from Example 1.15 that E(X) = (a+ b)/2. Thus, E(X)2 = (a2 + 2ab+ b2)/4. We now calculateE(X2).

E(X2) =Z b

ax2 f (x)dx

=1

b�a

Z b

ax2dx

=a2 +ab+b2

3Finally,

Var(X) = E(X2)�E(X)2

=a2 +ab+b2

3� a2 +2ab+b2

4

=(b�a)2

12Observe that the answer calculated here and in Example 1.18 are exactly the same.

We now take a moment’s detour to talk about the normal distribution, another common distribution for which noformulas appear in the Exam 4 Tables. We will present that here, but apart from knowing the mean and standarddeviation, and how to make use of the standard normal distribution table, you shouldn’t need to memorize anything.

Definition 1.20. Let X ⇠ Normal(µ,s2) . Then,

f (x) =1p

2psexp⇢� (x�µ)2

2s2

E(X) = µVar(X) = s2

A standard normal distribution is a Normal(0,1) distribution . A standard normal random variable is often denotedas Z.

14

Page 8: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

−5 0 5 10

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Normal(2,9) Distribution

z

f(z)

(a) PDF of X ⇠ Normal(2,9)

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Standard Normal Distribution

z

f(z)

(b) PDF of Z

Note that a normal distribution is symmetric about its mean. This symmetry implies that the mean and the medianare identical for a normal distribution.

Another fact to notice is that the two plots above appear very similar. Ignoring the axes, you might think that theyare in fact the same distribution. Had we plotted both on the same set of axes, you will notice that X has a flatterdistribution (larger variance means values are more spread out) and that its peak is to the right of the peak for Z. We’llformalize this notion below.

Remark. Let X ⇠ Normal(µ,s2), for some parameters µ and s2. Then, we can standardize X in such a way that weget a standard normal distribution. To do so, we define Z as follows:

Z =X �µ

s(1.9)

Then, Z ⇠ Normal(0,1).

You’ll notice that we did not give the CDF of a normal distribution. In fact, it is impossible to write the CDF inclosed form. (Try integrating f (x) to convince yourself this is the case.) However, we can transform every normal dis-tribution into a standard normal distribution using the above. The CDF for a standard normal distribution (sometimesdenoted F(z) or f(z)) is presented in your Exam 4 Tables, so that is what we use as a substitute for the CDF of X .

Now, back to moments. Using (1.6) and (1.7) we can now compute any central moment, including the skewnessand kurtosis. Skewness is used to measure symmetry and kurtosis is used to measure flatness.

Distributions that are left-skewed have a negative skewness value, while distributions that are right-skewed havea positive skewness value. The vast majority of distributions we will be working with will have positive skewness.This implies that small losses are more likely to occur than large losses; in other words, it is more likely to have a lowimpact collision than a catastrophic four car pile up. It should be obvious that from an insurer’s point of view, it ismuch better to more often pay lower valued claims than higher valued claims. Luckily, reality agrees. This is why themajority of our statistical models will have positive skewness. The following graphs of two PDFs show both positiveand negative skewness .

15

Page 9: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

����

������� ���

(a) The right side has an elongated tail.

����

������� ���

(b) The left side has an elongated tail.

To summarize what we have accomplished thus far, we have found several ways of visualize a distribution. Themean and median gives us a measure of centrality. Variance gives us a measure of spread. Percentiles give us infor-mation about various points on the CDF. Skewness and kurtosis give us different ways of visualizing the shape of thedistribution.

Let us go one step further and look at generating functions, which are presented for some of the distributions inyour Exam Tables.

Definition 1.21. For a random variable X , the moment generating function (MGF) of X (denoted MX (t)) is equal toE(etx). Furthermore, for discrete variables, the probability generating function (PGF) (denoted by PX (z)) is equalto E(zX ).

Note that if the MGF of a random variable is given, then the PGF can be calculated, and vice versa via:

MX (t) = E(etX ) = E�(et)X�= PX (et). (1.10)

Similarly,PX (z) = E(zX ) = E

⇣(elnz)X

⌘= E(e(lnz)X ) = MX (lnz). (1.11)

Two useful facts to remember about PGFs and MGFs:

Pr(S = k) =dk

dzkPS(z)

k!

����z=0

E(Xk) =dk

dtk MX (t)����t=0

There exists a one-to-one correspondence between the MGF or PGF of a random variable X with the CDF of therandom variable (although the proof is outside the scope of this book). This is a very useful property because one canoften easily deduce the distribution from the form of the MGF (or PGF).

Now, to end the section on MGFs and PGFs let us illustrate the relationship between them through an example.

Example 1.22. Observe that your formula sheet does not include a MGF for the Poisson distribution. The easy wayto get it would be to simply use the PGF and plug in et for z. For the sake of illustration, let us work the entire examplestarting with how to compute the PGF. Observe from the formula sheet that if X ⇠ Poisson(l ), then

P(X = x) =e�l l x

x!.

Thus to compute the PGF we observe:

PX (z) = E(zX ) =•

Âx=0

zx e�l l x

x!

= e�l•

Âx=0

zx l x

x!

= e�l•

Âx=0

(zl )x

x!

16

Page 10: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

A useful fact to know is: ex = •n=0

xn

n! . Using this in our last step from above we see:

e�l•

Âx=0

(zl )x

x!= e�l ezl

= el (z�1)

Again, we see that our answer matches the formula sheet exactly (so use the formula sheet!). Now, to find theMGF of X we simply use (1.10) and observe the following:

MX (t) = PX (et) = el (et�1)

1.3 Sums of Random VariablesNow, let’s get into something that actuaries use all the time, sums of random variables. Let Xi is a random variableused to model losses for the i-th individual (or the i-th risk in a policy). An insurer needs to be able to aggregate theindividual risks to model the total risk. For this, we define the random variable Sk = Âk

i=1 Xi, the total losses for kindividuals.

First, a nice property of expectations is that it is linear. This means that:

E(Sk) =k

Âi=1

E(Xi)

If all Xi are independent, then furthermore:

Var(Sk) =k

Âi=1

Var(Xi)

Now, we present a related result in the form of MGFs and PGFs.

Theorem 1.23. Let Sk =Âki=1 Xi where the Xi’s are independent. Then MSk(t)=’k

j=1 MXj(t) and PSk(z)=’kj=1 PXj(z).

Using the tools provided in Theorem 1.23, we can begin to analyze the distribution of various Sk random variables.

Example 1.24. Suppose Xi ⇠ gamma(a,q ), for i 2 {1, . . . ,k}. Compute the MGF for Sk = Âki=1 Xi. Repeat this

exercise for the case where each Xi ⇠ gamma(ai,q ), where the ai parameters may vary for each Xi.

Answer First, we will compute the MGF for one Xi. Although you should never do this on the test because the processis too time consuming, we show a derivation from first principles to illustrate the theory.

Then we will use the MGF property defined in Theorem 1.23 to find the distribution of Sk, the sum of k gammavariables. Pulling the PDF from the Exam Tables, we get,

f (x) =xa�1e

�xq

G(a)q a

E(etX ) =Z •

0

etxxa�1e�xq

G(a)q a dx

=

R •0 etxxa�1e

�xq dx

G(a)q a

=

R •0 ex(t� 1

q )xa�1dxG(a)q a

=

R •0 e�x( 1

q �t)xa�1dxG(a)q a

17

Page 11: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

Now, we do a substitution of variables. Let

y = x✓

1q� t◆) dy

dx=

1q� t and x =

y1q � t

.

Also observe that as x goes from 0 ! •, y goes from 0 ! • (we are assuming t 1q .) We then continue the compu-

tation from above:

R •0 e�x( 1

q �t)xa�1dxG(a)q a =

R •0 e�yxa�1

✓1

1q �t

◆dy

G(a)q a

=

R •0 e�y

✓y

1q �t

◆a�1✓1

1q �t

◆dy

G(a)q a

=

R •0 e�y ya�1dy

G(a)q a� 1

q � t�a

Now, recall that the general formula for a gamma function: G(z) =R •

0 e�t tz�1dt. You should also know thatG(n) = (n�1)! when n is an integer. Continuing on, we see:

R •0 e�y ya�1dy

G(a)q a� 1

q � t�a =

G(a)

G(a)q a� 1

q � t�a

=1

( qq �q t)a

=

✓1

1�q t

◆a

= (1�q t)�a

If you look through the formulas on your Exam Tables, you will see that the MGF presented is M(t) = (1�q t)�a

which is exactly what we got. Thus, the MGF for Sk = Ânk=1 Xi is

n

’i=1

(1�q t)�a = (1�q t)�na

On the test, you should never derive a MGF if it is provided in the tables! MGFs and PGFs are contained theExam Tables. If you don’t remember the MGF or PGF, simply refer to the Exam Tables.

Now suppose we wanted to find the distribution of Sk where each Xi ⇠ gamma(ai,q) and all the Xi’s are indepen-dent. Thus they all have the same q but different ai. Deriving such a formula directly is quite difficult. However, wecan apply the properties of moments for Sk to help us. Referencing Theorem 1.23 we see:

MSk(t) =k

’i=1

MXi(t) =k

’i=1

1(1�q t)ai

=1

(1�q t)Âai

Finally, we see that the MGF of Sk is simply the MGF of a gamma distribution with parameters (Âai,q). Sincethere is a one-to-one function between MGF’s and the distribution function of a random variable, we can now concludethat Sk is distributed as a gamma random variable with parameters (Âai,q).

Next, we give one of the most important and crucial ideas of modern statistics, the central limit theorem.

18

Page 12: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

Theorem 1.25. Central Limit Theorem (CLT)

Let {X1, . . .Xk} be a sequence of random variables, and let Sk denote their sum, i.e. Sk = X1 +X2 + ...+Xk. Undercertain nice conditions (which usually can be assumed for the actuarial exam!),

limk!•

Sk �E(Sk)pVar(Sk)

; N(0,1)

where ; means convergence in distribution.

Let’s spend a little time deciphering the notation in this theorem. On the left hand side, we have a random variable(Sk �E(Sk))/

pVar(Sk) which has been standardized (subtract the mean, then divide by the standard deviation). The

theorem states that as k increases (as we collect more and more data), this standardized random variable converges toa standard normal distribution. That means we can compute probabilities for the standardized random variable, andin turn, we can compute probabilities for Sk, assuming a large enough sample size k. Note that the theorem does notmention any requirements on how each Xi is distributed (gamma, Poisson, exponential, etc.). Their sums inevitablybecome something we can model with a normal distribution!

Corollary 1.26. Assume that the Xi are independent and identically distributed (iid), such that E(Xi) = µ andVar(Xi) = s2 for all i. Then the Central Limit Theorem implies that:

1. Sk approximately follows a Normal(kµ,ks2) distribution.

2. X̄ = Sk/k approximately follows a Normal(µ,s2/k) distribution.

Now we present an example that makes use of the Central Limit Theorem. This example will also show how wecompute normal distribution probabilities. Since it is the first such example in this book, we will go through it in quitesome detail.

Example 1.27. Suppose we ask 100 actuarial students about whether they believe that they will pass Exam 4 on theirfirst sitting. Let S100 denote the number of people who say yes. In fact, we could write out S100 = Â100

i=1 Xi, where eachXi ⇠ Binomial(m = 1,q) independently. If the true q is 0.4 (40% of all actuarial students sitting for Exam 4 for thefirst time believe they will pass it the first time), then what is the probability that our poll resulted in at least 50 peoplebelieving they will pass?

Answer First, we know that Xi are iid. This is because we are given that each Xi follows the same distribution asevery other Xi, by assumption, and furthermore, each person is assumed to be independent of all others. The mean andvariance of Xi can be found directly from the Exam Tables: µ = q = 0.4 and s2 = q(1�q) = 0.4(0.6) = 0.24.

By Corollary 1.26, X̄ approximately follows a Normal(µ ,s2/k) = Normal(0.4,0.24/100) distribution. k = 100 islarge enough for us to make this approximation, in case you were worried. Thus, we can standardize as in (1.9) to getZ = (X̄ �0.4)/

p0.24/100 to do our computations.

P(X̄ > 0.5) = P

X̄ �0.4p0.24/100

� 0.5�0.4p0.24/100

!

= P(Z � 2.04)= 1�P(Z < 2.04)= 1�F(2.04)

We have now written our desired probability as a function of F(z), the CDF of a standard normal random variable.Looking it up from the standard normal distribution table, we find that F(2.04) = 0.9793. This implies that P(X̄ >0.5) = 1�0.9793 = 0.0207, a very small probability!

The above example ignored an extra step in dealing with normal approximations to binomial distributions. Becausethe binomial distribution is a discrete distribution, modeling it as a normal distribution requires some sort of continuityadjustment. We’ll overlook this for now.

19

Page 13: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

1.4 Problems for Section 11. Claim sizes are 90, 110, 300, 600 with probabilities 0.5, 0.2, 0.05, 0.25. Compute the skewness and kurtosis.

2. Losses have an exponential distribution with the 75th percentile equal to 1,000. What is q?

3. Losses have a inverse Weibull distribution with the 25th percentile equal to 5,000 and 50th percentile equal to50,000. What is t?

4. Suppose a company writes claims where losses follow a Pareto distribution with a = 3 and q = 10,000. Usethe Central Limit Theorem to approximate the probability that the sum of 100 claims exceeds $700,000.

5. Suppose a company writes claims where losses follow a gamma distribution with a = 10,q = 1,000. Use theCentral Limit Theorem to approximate the probability that the sum of 100 claims exceeds $1 million.

6. A company wrote 100 contracts with a sample mean of 2,000 and a standard deviation of 500. Next year, it willwrite 1,500 contracts. What is the probability that payout will exceed 102% of expected losses?

1.5 Solutions for Section 11. First we compute the mean µ and variance s2.

µ = 0.5(90)+0.2(110)+0.05(300)+0.25(600) = 232

s2 = 0.5(90�232)2 +0.2(110�232)2 +0.05(300�232)2 +0.25(600�232)2 = 47,146

The standard deviation s is thenp

47146 = 217.1. Next, we calculate the following central moments:

µ3 = 0.5(90�232)3 +0.2(110�232)3 +0.05(300�232)3 +0.25(600�232)3 = 10,679,916

µ4 = 0.5(90�232)4 +0.2(110�232)4 +0.05(300�232)4 +0.25(600�232)4 = 4,833,584,152

Skewness = µ3/s3 = 10,679,916/217.13 = 1.04.Kurtosis = µ4/s4 = 4,833,584,152/217.14 = 2.17.

2. We know that the cumulative distribution function for exp(q) is

F(x) = 1� e�x/q

Hence, substituting in the given information, we have

0.75 = F(1000) = 1� e�1000/q

0.25 = e�1000/q

ln0.25 = �1000q

q = � 1000ln0.25

⇡ 721.3

3. Recall that for an inverse Weibull(t,q ) distribution, we have F(x) = e�(q/x)t . Hence, given the two percentiles,we know that

0.25 = e�(q/5000)t ) ln0.25 =�✓

q5000

◆t

0.5 = e�(q/50,000)t ) ln0.5 =�✓

q50,000

◆t

20

Page 14: 1 Distributions, Moments, and BackgroundThe most important thing to remember about probabilities–the total probability over the entire sample space is 1. Below are some properties

Dividing the two equations at the right, we get

ln0.5ln0.25

=

✓�✓

q50,000

◆t◆·✓�✓

5000q

◆t◆

12

=

✓1

10

◆t

t =ln0.5ln0.1

t ⇡ 0.301

4. Recall that for a Pareto(a,q ) distribution, the moments can be calculated using the following formula (on theequation sheet):

E(Xk) =q kk!

(a �1)(a �2)...(a � k)for integer k < a.

Applying this, we have that for a single claim X ⇠ Pareto(a,q ), we have

E(X) =10,000

2= 5000

and

s2 =Var(X) = E(X2)� [E(X)]2 =10,0002(2)

2�50002 = 10,0002 �50002

By the CLT, we know that S, the sum of 100 claims, follows a N(100(5000),100s2) distribution. Hence,

P(S > 700,000) = 1�F

700,000�5000(100)p(100)[10,0002 �50002]

!

= 1�F(2.31)= 0.0104

5. Using gamma distribution formulas, each loss has a mean and variance of

µ = 10(1000) = 10,000 and s2 = 10(1000)2 = 10,000,000

For 100 claims, we get the total loss mean equal to 100(10,000)= 1,000,000 and variance equal to 100(10,000,000)=1,000,000,000. The probability of total claims exceeding $1 million is thus:

1�F✓

1,000,000�1,000,000p1,000,000,000

◆= 1�F(0)

= 0.5

Hence, the probability that the sum of 100 claims exceeds $1 million is 0.5.

6. For 1500 contracts, the total sum of contracts has a mean of 1500(2000) = 3,000,000 and standard deviation ofp1500s2 =

p1500(5002) = 19,365, and has an approximately normal distribution.

102% of the mean is 1.02(3,000,000) = 3,060,000, so we need to compute

P(X > 3,060,000) = 1�F✓

3,060,000�3,000,00019,365

◆= 1�F(3.10) = 1�0.999 = 0.001

21