Week 1 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics

c© Katja Ignatieva

School of Risk and Actuarial StudiesAustralian School of BusinessUniversity of New South Wales

[email protected]

Week 1 Video Lecture NotesProbability: Week 1 Week 2 Week 3 Week 4

Estimation: Week 5 Week 6 Review

Hypothesis testing: Week 7 Week 8 Week 9

Linear regression: Week 10 Week 11 Week 12

Video lectures: Week 2 VL Week 3 VL Week 4 VL Week 5 VL

mailto:[email protected]


Links to UNSW TV

Click on the topic to go to the online recording:

Week 1: Probability space, Calculating with probabilities,Counting.

Week 2: Bernoulli distribution, Binomial distribution,Geometric distribution, Negative Binomial distribution.

Week 3: Numerical methods to summarize data, Graphicalprocedures to summarize data.

Week 4: Sampling with and without replacement, Propertiesof sample mean and variance.

Week 5: Chi-squared distribution, Student-t distribution,Snecdor’s F distribution, Distribution of sample mean andvariance.

http://tv.unsw.edu.au/video/week-1-probability-and-statistics

http://tv.unsw.edu.au/video/week-1-probability-space

http://tv.unsw.edu.au/video/week-1-calculating-with-probabilities

http://tv.unsw.edu.au/video/week-1-Counting


http://tv.unsw.edu.au/video/week-2-Bernoulli-distribution

http://tv.unsw.edu.au/video/week-2-binomial-distribution

http://tv.unsw.edu.au/video/week-2-geometric-distribution

http://tv.unsw.edu.au/video/week-2-negative-binomial-distribution


http://tv.unsw.edu.au/video/week-3-numerical-methods-to-summarize-data

http://tv.unsw.edu.au/video/week-3-graphical-procedures-to-summarize-data

http://tv.unsw.edu.au/video/week-3-graphical-procedures-to-summarize-data


http://tv.unsw.edu.au/video/week-4-sampling-with-and-without-replacement

http://tv.unsw.edu.au/video/week-4-properties-of-the-sample-mean-and-variance

http://tv.unsw.edu.au/video/week-4-properties-of-the-sample-mean-and-variance


https://tv.unsw.edu.au/video/week-5-chi-squared-distribution

https://tv.unsw.edu.au/video/week-5-student-t-distribution

https://tv.unsw.edu.au/video/week-5-snecdors-f-distribution

https://tv.unsw.edu.au/video/week-5-fundamental-sampling-distributions

https://tv.unsw.edu.au/video/week-5-fundamental-sampling-distributions


Probability space

Sample Space & σ-algebra

Introduction in Probability

Probability spaceSample Space & σ-algebraProbability Measure

Calculating with probabilitiesProperties of the Probability MeasureConditional ProbabilityIndependenceBayes Theorem

CountingCounting PrinciplesComputing Probabilities


Probability space



In a random experiment, the outcomes cannot be predictedwith certainty in advance.

The set of all possible outcomes is called the sample space,denoted by Ω. An element ω of Ω is a sample point.

A family F of subsets of the sample space Ω is said to be aσ-algebra (σ-field) if the following conditions hold:

1. E ∈ F implies E c ∈ F ;2. E1,E2, . . . are pairwise disjoint sets in F , that is, Ei ∩ Ej = ∅

(null/empty set) for all i 6= j , then∞⋃k=1

Ek ∈ F .

An element E of F is called an event. It is nothing but asubset of Ω.Note: A sample point is a simple event.

2/33


Probability space


If there are N elements of the sample space, a σ-algebra F ofevents may consist of 2N elements.

Example:

Consider the experiment of selecting a card at random from adeck of four cards with four different suits, and noting its suit:hearts (H), diamonds (D), spades (S), or clubs (C ).

The sample space is: Ω = H,D, S ,C.

All the sets of possible events are a σ-algebra of F .

3/33


Probability space


The following collection of 24 = 16 sets is a σ-algebra F :

(H) (H,D) (H,D,S) (H,D, S ,C )(= Ω) ∅(D) (H, S) (H,D,C )(S) (H,C ) (H, S ,C )(C ) (D,S) (D,S ,C )

(D,C )(S ,C )

Each element in the σ-algebra is called an event.

Note that a standard deck has 52 elements in the samplespace, so one σ-algebra has 252 elements.

4/33


Probability space


Sample Spaces

5/33


Probability space

Probability Measure






Probability space

Probability Measure

Probability Measure

A probability measure on ω is a function Pr from the subsetsof ω to < that satisfies the following axioms:

1. For all events Ei , 0 ≤ Pr (Ei ) ≤ 1;2. Pr (Ω) = 1;3. For any events E1,E2, . . . where Ei ∩ Ej = ∅ for every i 6= j ,

then:

Pr

( ∞⋃k=1

Ek

)= Pr (E1) + Pr (E2) + . . . =

∞∑k=1

Pr (Ek) .

A random experiment is therefore described as a probabilitytriple (or probability space): Ω,F ,P.

6/33


Calculating with probabilities

Properties of the Probability Measure








Definitions

Union of two events: C = A ∪ Bis an event that both A and/or B occurs, i.e., ω ∈ C iff ω ∈ Aor ω ∈ B.

Intersection of two events: C = A ∩ Bis an event that both A and B occurs, i.e., ω ∈ C iff ω ∈ Aand ω ∈ B.

Complement of an event: B = Ac

is the event that A does not occur, i.e., ω ∈ Ac iff ω /∈ A.

Events are disjoint: if they have no outcomes in communi.e., A and C are disjoint iff A ∩ C = ∅.

7/33




Useful laws

Commutative laws:A ∪ B = B ∪ Aand A ∩ B = B ∩ A.

Associative laws:(A ∪ B) ∪ C = A ∪ (B ∪ C )and (A ∩ B) ∩ C = A ∩ (B ∩ C ).

Distributive laws:(A ∪ B) ∩ C = (A ∩ C ) ∪ (B ∩ C )and (A ∩ B) ∪ C = (A ∪ C ) ∩ (B ∪ C ).

DeMorgan’s laws:(A ∩ B)c = Ac ∪ Bc and (A ∪ B)c = Ac ∩ Bc .

8/33





Complement: Pr(E c) = 1− Pr(E ).

Null / Empty Set: Pr(∅) = 0.

Subsets: If E1 and E2 are two events such that E1 ⊆ E2, thenPr (E1) ≤ Pr (E2).

Additive: If E1 and E2 are any two events, then:

Pr (E1 ∪ E2) = Pr (E1) + Pr (E2)− Pr (E1 ∩ E2) .

9/33





10/33




In the case of three events, we have:

Pr (E1 ∪ E2 ∪ E3) = Pr ((E1 ∪ E2) ∪ E3)∗= Pr (E1 ∪ E2) + Pr (E3)

−Pr ((E1 ∪ E2) ∩ E3)∗∗= Pr (E1 ∪ E2) + Pr (E3)

−Pr ((E1 ∩ E3) ∪ (E2 ∩ E3))∗∗∗= Pr (E1) + Pr (E2) + Pr (E3)

−Pr (E1 ∩ E2)− Pr (E1 ∩ E3)

−Pr (E2 ∩ E3) + Pr (E1 ∩ E2 ∩ E3) .

* using additive rule: Pr (A ∪ B) = Pr (A) + Pr (B)− Pr (A ∩ B) withA = (E1 ∪ E2) and B = E3.** using distributive law: (A ∪ B) ∩ C = (A ∩ C ) ∪ (B ∩ C ) with A = E1,B = E2, C = E3.*** using additive rule: Pr (A ∪ B) = Pr (A) + Pr (B)− Pr (A ∩ B) withA = E1 and B = E2 & A = (E1 ∩ E3) and B = (E2 ∩ E3).11/33




Inequality rules

Using the definition of probability measure one can prove thefollowing inequalities:

Boole’s Inequality: If E1,E2, . . . ,En are any n events, then

Pr

(n⋃

k=1

Ek

)≤

n∑k=1

Pr (Ek) ,

which follows the additive law of probability.

Bonferroni’s Inequality: If E1,E2, . . . ,En are any n events,then:

Pr (E1 ∩ . . . ∩ En) ≥ 1−n∑

k=1

Pr (E ck ) .

12/33




Inequality rules

13/33



Conditional Probability









The conditional probability of A, given B, as:

Pr (A |B ) =Pr (A ∩ B)

Pr (B)

provided Pr (B) > 0, otherwise Pr (A |B ) = 0.

The multiplication rule immediately follows:

Pr (A ∩ B) = Pr (A |B ) · Pr (B) .

The following properties are also immediate:1. Pr (A |B ) ≥ 0;

2. Pr (A |A ) = 1;

3. If A1,A2, . . . are mutually disjoint events, then

Pr

( ∞⋃k=1

Ak |B)

=∞∑k=1

Pr (Ak |B ).

14/33





15/33




Law of total probabilityLaw of total probability: If E1,E2, . . . are mutually disjoint(Ei ∩ Ej = ∅ for i 6= j) and comprise the entire sample space(⋃∞

k=1 Ek = Ω), then for any event A ∈ F , we have:

Pr (A) =∞∑k=1

Pr (A |Ek ) · Pr (Ek) .

16/33



Independence







Independence

Independence

Events A and B are said to be independent if:

Pr (A ∩ B) = Pr (A) · Pr (B) .

Equivalently, we have events A and B independent if:

Pr (A |B ) = Pr (A) and Pr (B |A) = Pr (B) .

Note that we say the collection of events E1,E2, . . . ,En areindependent if:

Pr (E1 ∩ E2 ∩ . . . ∩ En) = Pr (E1) · Pr (E2) · . . . · Pr (En) .

For a collection of several events E1,E2, . . . ,En, we say thatthe events are mutually independent if for any sub-collectionEi1 ,Ei2 , . . . ,Eim , we have:

Pr (Ei1 ∩ Ei2 ∩ . . . ∩ Eim) = Pr (Ei1) · Pr (Ei2) · . . . · Pr (Eim) .17/33



Independence

18/33



Independence

Mutually Exclusive Events: A and B are said to be mutuallyexclusive if they are mutually disjoint, i.e., A ∩ B = ∅ so thatPr (A ∩ B) = 0 and

Pr (A ∪ B) = Pr (A) + Pr (B) .

We can generalize this to several events as follows: eventsE1,E2, . . . ,En are said to be mutually exclusive events if notwo have an element in common (mutually disjoint) and that

Pr (E1 ∪ E2 ∪ . . . ∪ En) = Pr (E1) + Pr (E2) + . . .+ Pr (En) .

19/33



Independence

Example

Consider the experiment of selecting n cards at random from adeck of 52 cards with 13 hearts (H), 13 diamonds (D), 13 spades(S), and 13 clubs (C ) cards.

False For n = 2, the second card is independent of the first card.

False For n = 14, the 14th card is mutually independent of the first13 cards.

True Previous two questions, but now when the card is put back inthe stock before the next one is selected.

True For n = 3, A=“at least 2 H” and B=“at least 2 C” aremutually exclusive events.

False For n = 4, A=“at least 2 H” and B=“at least 2 C” aremutually exclusive events.

20/33



Bayes Theorem







Bayes Theorem

Bayes TheoremBayes Theorem: Suppose E1,E2, . . . represent a completepartitioning of the sample space Ω, then for any non-emptyevent A ∈ F , we have:

Pr (Ek |A) =Pr (A |Ek ) · Pr (Ek)∑∞j=1 Pr (A |Ej ) · Pr (Ej)

,

for any k = 1, 2, . . ..

21/33



Bayes Theorem

ExampleAn insurance company classifies its policyholders according tothree risk classes: L (low risk), M (medium risk) and H (highrisk). The proportion of H policyholders is 20% and theproportion of L policyholders is 50%. For each of the riskclasses, the probability of a claim is 0.01 for L, 0.02 for M,and 0.04 for H.

a. Question: If a claim occurs, what is the probability that it isfrom a L (low risk) policyholder?

a. Solution: Let C = “be the event that there is a claim”, thenwe have:Pr (L|C ) = Pr(L∩C)

Pr(C) = 26%, where

Pr (L ∩ C ) = Pr (C |L) · Pr (L) = 0.01 · 0.5 = 0.005Pr (C ) = Pr (C |L)·Pr (L)+Pr (C |M)·Pr (M)+Pr (C |H)·Pr (H)= 0.01 · 0.5 + 0.02 · 0.3 + 0.04 · 0.2 = 0.019 (using LTP).22/33



Bayes Theorem

Example (cont.)

b. Question: If a claim occurs, what is the probability that it isfrom a M (medium risk) policyholder?

c. Question: If a claim occurs, what is the probability that it isfrom a H (high risk) policyholder?

b. Solution: Similar to a.,Pr (M|C ) = Pr(M∩C)

Pr(C) = Pr(C |M)·Pr(M)Pr(C) = 0.3·0.02

0.019 = 32%.

c. Solution: Similar to a. and b.,Pr (H|C ) = Pr(H∩C)

Pr(C) = Pr(C |H)·Pr(H)Pr(C) = 0.2·0.04

0.019 = 42%.

23/33


Counting

Counting Principles






Counting

Counting Principles

Counting Principles

Multiplication Rule: Suppose S1, S2, . . . ,Sm are m sets withrespective number of elements n1, n2, . . . , nm. The number ofways of choosing one element from each set is given by:

n1 · n2 · . . . · nm.

Permutation: The number of ways of arranging n distinctobjects is given by:

n! ≡n · (n − 1) · (n − 2) · . . . · 2 · 1n! =n · (n − 1)!

0! ≡1.

24/33


Counting

Counting Principles

Combination: The number of ways of choosing r objects fromn distinct objects, where r ≤ n, is given by:(

n

r

)≡ n!

r ! · (n − r)!.

Multinomial: The number of ways that n objects can begrouped into r classes with nk in the kth class, where

k = 1, 2, . . . , r andr∑

k=1

nk = n, is given by:

(n

n1, n2, . . . , nr

)≡ n!

n1! · n2! · . . . · nr !.

25/33


Counting

Counting Principles

Example

Question: An airline has 6 flights daily from Sydney toHonolulu, and 8 flights daily from Honolulu to Los Angeles.The airline offers no direct flight from Sydney to Los Angeles.If the flights are to be made on separate days, how manydifferent flight arrangements can the airline offer from Sydneyto Los Angeles?

Solution: Use the multiplication rule. Let:

S1 = “Flight from Sydney to Honolulu”, with n1 = 6;S2 = “Flight from Honolulu to Los Angeles”, with n2 = 8,

then we have n1 · n2 = 6 · 8 = 48 different flight arrangements.

26/33


Counting

Counting Principles

Question: A committee is to consist of 4 academics and 2practitioners, to be selected from a larger group of 8academics and 5 practitioners. How many ways can you forma committee if:

a. there are no additional restrictions.b. two of the chosen academics must be the two female members

of the group of 8.

a. Solution: We use combinations & multiplication.For the academics (“S1”) we have n = 8 distinct ones and weneed to choose r = 4 academics, possible number of ways:(nr

)=(84

)= 70.

For the practitioners (“S2”) we have n = 5 distinct ones andwe need to choose r = 2 practitioners, possible number ofways:

(nr

)=(52

)= 10.

Total ways of forming a committee: n1 · n2 = 70 · 10 = 700.27/33


Counting

Counting Principles

b. Solution: We use combinations & multiplication again.

- For the female academics (“S1”) we have n = 2 distinct onesand we need to choose r = 2 female academics, possiblenumber of ways:

(nr

)=(22

)= 1.

- For the male academics (“S2”) we have n = 6 distinct onesand we need to choose r = 2 male academics, possiblenumber of ways:

(nr

)=(62

)= 15.

- For the practitioners (“S3”) we have n = 5 distinct ones andwe need to choose r = 2 practitioners, possible number ofways:

(nr

)=(52

)= 10.

Total ways of forming a committee:n1 · n2 · n3 = 1 · 15 · 10 = 150.

Note: probability of having two female academic members is:150/700 = 3/14 ≈ 0.21428/33


Counting

Counting Principles

Question: How many different letter arrangements can beobtained from the letters of the word mississippi, using allthe letters?

Solution: Let- class one be “letter m”, with n1 = 1;

- class two be “letter i”, with n2 = 4;

- class three be “letter s”, with n3 = 4;

- class four be “letter p”, with n4 = 2.

Note: we have n = n1 + n2 + n3 + n4 = 11.

We use multinomial, hence there are(n

n1, n2, n3, n4

)=

n!

n1! · n2! · n3! · n4!= 34650

different letter arrangements.29/33


Counting

Counting Principles

Question: An actuarial consulting company has four projectsto do. Two of the projects require 3 actuaries, one requires 2actuaries, and the other requires 4 actuaries. The companycurrently has 15 actuaries, of which 5 are females. How manydifferent ways are there to assign the actuaries to the projects?

Solution: We have 15 distinct actuaries. Use combinations.For the first project we need to choose r = 3 actuaries fromn = 15 distinct actuaries, hence

(nr

)=(153

)= 455 ways. For

the second project we need to choose r = 3 actuaries from then = 12 left distinct actuaries, hence

(nr

)=(123

)= 220 ways.

For the third project we need to choose r = 2 actuaries fromthe n = 9 left distinct actuaries, hence

(nr

)=(92

)= 36 ways.

For the last project we need to choose r = 4 actuaries fromthe n = 7 left distinct actuaries, hence

(nr

)=(74

)= 35 ways.

The total number of ways to assign the actuaries to theprojects is 455 · 220 · 36 · 35 = 126, 126, 000.

30/33


Counting

Computing Probabilities






Counting


Computing ProbabilitiesIf the elements of the sample space Ω all have equalprobability and assuming: (a) there are N elements in Ω and(b) the event A can occur in any of n mutually exclusive ways,then:

Pr (A) =n

N.

Question: A drawer contains 10 pairs of socks. If 6 socks aretaken at random and without replacement, compute theprobability that there is at least one matching pair amongthese 6 socks.

Solution: Let M = “at least one matching pair among 6socks”. We have Pr (M) = 1− Pr (Mc), where Mc = “nomatching pair among 6 socks”.Pr (Mc) = 1 · 1819 ·

1618 ·

1417 ·

1216 ·

1015 = 0.3467. Hence,

Pr (M) = 1− Pr (Mc) = 1− 0.3467 = 0.6533.31/33


Counting


Question: A committee of four individuals is to be formedfrom a group of 5 males and 6 females. What is theprobability that the committee formed has both sexes equallyrepresented?

Solution: Let E = “sexes are equally represented in thecommittee”.

Number of combinations: E =MMFF, MFMF, MFFM, FMMF, FMFM, FFMM,i.e., n = 4 and r = 2:

(nr

)=(42

)= 6.

Each combination has equal probability, i.e.,Pr (MMFF ) = . . . = Pr (FFMM) = 5

11 ·410 ·

69 ·

58 = 0.0758.

Combing we have:Pr (E ) =

(42

)· 511 ·

410 ·

69 ·

58 = 6 · 0.0758 = 5

11 .

32/33


Counting


Odds and probabilities

Sometimes, we get confused between odds and probabilities.The odds that an event will occur is the ratio of theprobability that the event will occur to the probability it willnot occur, provided neither probability is zero. That is, if E isthe event, the odds that E will occur is:

Pr (E )

Pr (E c)=

Pr (E )

1− Pr (E ).

Often, odds are quoted in terms of positive integers. Forexample, the odds are a to b that an event will occur. Thenthey mean that the probability it will occur is:

Pr (E ) =a

a + b.

33/33

ACTL2002/ACTL5101 Probability and Statistics: Week 1

ACTL2002/ACTL5101 Probability and Statistics

c© Katja Ignatieva

School of Risk and Actuarial StudiesAustralian School of BusinessUniversity of New South Wales

[email protected]

Week 1Probability: Week 2 Week 3 Week 4

Estimation: Week 5 Week 6 Review

Hypothesis testing: Week 7 Week 8 Week 9

Linear regression: Week 10 Week 11 Week 12

Video lectures: Week 1 VL Week 2 VL Week 3 VL Week 4 VL Week 5 VL

mailto:[email protected]


101/144


Introduction

Course introduction

Moments and measures of dispersion

IntroductionCourse introduction

Introduction in probabilityExercise

Mathematical methodsRandom variables and distributionsMeasures of location: probability distributionsMeasures of dispersion

r th central/non-central momentsGenerating functions

SummarySummary


Introduction

Course introduction

Course overview

Week 1 General introduction in probability.Week 2-4 Distribution functions:

- univariate & multivariate special distributions;- joint distributions;- functions of distributions.

Week 5-6 Parameter estimation.Week 7-9 Hypothesis testing.Week 10-12 Linear regression.

102/144


Introduction

Course introduction

This week

Describing distribution using measures of location anddispersion:

- mean;- variance;- skewness;- kurtosis.

Calculating these measures:

- central moments;- non-central moments;- generating functions.

103/144


Introduction in probability

Exercise






SummarySummary



Exercise

Sample spaces

An insurer offers health insurance. An individual can have fileeither one, two, three, or no claim in a given quarter. Forsimplicity, it is not possible to issue more than three claims.

Questions:a. State all possible events.

b. Give the sample space.

c. Give the σ−algebra.

Solutions:a. E ∈ 0, 1, 2, 3, 0, 1, 0, 2, 0, 3, 1, 2, 1, 3,2, 3, 0, 1, 2, 0, 1, 3, 0, 2, 3, 1, 2, 3, 0, 1, 2, 3, ∅.

b. Ω = 0, 1, 2, 3.

c. 0, 1, 2, 3, 0, 1, 0, 2, 0, 3, 1, 2, 1, 3, 2, 4,0, 1, 2, 0, 1, 3, 0, 2, 3, 1, 2, 3, 0, 1, 2, 3, ∅.

104/144



Exercise


We have the following probabilities: Pr(C = 0) = 0.8,Pr(C = 1) = 0.1, Pr(C = 2) = 8/90, and Pr(C = 3) = 1/90.

Questions:a. What is the probability of 2 or 3 claims given that there are

odd number of claims. Same for even number of claims.

b. Using a., Pr (even) = 8/9, Pr (odd) = 1/9, and the law oftotal probability, find Pr(C ≥ 2).

c. Are “odd number of claims” and “two or three claims”independent?

d. Using a. and Bayes theorem find the probability of odd numberof claims given that the number of claims is two or three.

105/144



Exercise

Solutions:a. Pr (C = 2, 3|C = 1, 3) = Pr(C=2,3∩C=1,3)

Pr(C=1,3) =Pr(C=3)

(Pr(C=1)+Pr(C=3)) = 1/10, and

Pr (C = 2, 3|C = 0, 2) = Pr(C=2)(Pr(C=0)+Pr(C=2)) = 1/10.

b. Pr (2, 3) = Pr (2, 3| odd) · Pr (odd) + Pr (2, 3| even) ·Pr (even) = 1

9 ·110 + 8

9 ·110 = 1

10 .

c. Pr (C = 2, 3|C = 1, 3) = 1/10 = Pr (2, 3), thusindependent.

d. Pr (C = 1, 3|C = 2, 3) =Pr(C=2,3|C=1,3)·Pr(C=1,3)

Pr(C=2,3|C=1,3)·Pr(C=1,3)+Pr(C=2,3|C=0,2)·Pr(C=0,2) =110 ·

19

110 ·

19+

110 ·

89

= 19 .

106/144



Exercise

Counting principlesThe history of an individual’s claiming record is had he had 3quarters with 2 claims, 2 quarters with 1 claim and 15quarters without a claim.

Questions:a. What is the probability that the insured had first 15 quarters

without a claim and then 5 quarters with at least one claim?b. What is the probability that the insured had first 15 quarters

without a claim and then 2 quarters with one claim and then 3quarters with two claims?

c. Comment on your results.

Solutions:a. Use Combination, n = 20, r = 5, number of ways choosing

objects:(205

)= 15, 504. Thus, probability is 1/15,504.

b. Use Multinomial, n = 20, r1 = 15, r2 = 2, r3 = 3, number ofways choosing objects: 20!

15!·2!·3! = 155, 040. Thus, probability is1/155,040.107/144


Mathematical methods

Random variables and distributions






SummarySummary





A random variable is generally a quantity X whose valuedepends on the outcome of a random experiment. It is amapping from the sample space Ω to the set of real numbers<. That is,

X : Ω→ <.

The cumulative distribution function, abbreviated c.d.f., of Xis defined by:

FX (x) = Pr (X ≤ x) , for all x .

The survival function of X is defined by:

SX (x) = 1− FX (x) = Pr(X > x).

108/144




Distribution function

Properties of a distribution function:1. FX (·) is a non-decreasing function, i.e., FX (x1) ≤ FX (x2)

whenever x1 ≤ x2;

2. FX (·) is right-continuous, that is for all x ,limε→0+

FX (x + ε) = FX (x);

3. FX (−∞) = 0;

4. FX (+∞) = 1.

Types of Random Variables:- continuous;

- discrete;

- mixed.109/144




−10 −5 0 5 100

0.2

0.4

0.6

0.8

1

1.2

x

F(x

)

Which one is a distribution function?

ABCDEF

Which oneis/aredistributionfunctions?Solution:B and F.Not non-decreasing:A;Not right-continuous:C;NotF (−∞) = 0:D;

Not

F (∞) = 1:

E.110/144




The probability mass function (p.m.f.) is defined by:

pX (xk) = Pr (X = xk) = FX (xk)− FX (xk−1) .

The probability density function (p.d.f.) is defined by:

fX (x) =∂

∂xFX (x) .

Note that:

p.m.f. satisfies∞∑k=0

pX (xk) = 1;

p.m.f. requires the right-continuous property;p.d.f. satisfies

∫∞−∞ fX (x) dx = 1.

111/144



Measures of location: probability distributions






SummarySummary





Let X be a r.v. with pdf pX (if discrete) or fX (if continuous).

The expected value of X is:

µX = E[X ] =

∑all x

x · pX (x) ,

if X is discrete;∫ ∞−∞

x · fX (x) dx ,

if X is continuous.

Exercise: Calculate E [X ] when X is a r.v. with pdf:

fX (x) =

1, if 0 ≤ x ≤ 1;0, else.

Solution:E [X ] =

∫∞−∞ x · fX (x)dx =

∫ 10 x · fX (x)dx =

[12x

2]10

= 12 .

112/144




Mathematical expectation of functions of a r.v.

Let X be a r.v. with pdf pX (if discrete) or fX (if continuous)and let h(X ) be a real-valued function.

E[h(X )] =

∑all x

h (x) · pX (x) ,


h (x) · fX (x) dx ,

if X is continuous.

Exercise: Let X be a r.v. with pdf: pX (x) = 110 for

x = 0, 1, 2, . . . , 9 and zero otherwise, let h(X ) = X 2.Calculate E [h(X )].Solution:E [h(X )] =

∑all x h(x) · pX (x) =

∑9x=0 x

2 110 = 28.5.

113/144




Properties of the expected value operator:Let X and Y be random variables, and m, b ∈ <, we have:

E[mX + b] = mE[X ] + b; E[X + Y ] = E[X ] + E[Y ];E[X · Y ] = E[X ] · E[Y ], only if X ,Y independent.

Proof for discrete r.v.: (* using X and Y independent)

E[mX + b] =∑

x (mx + b) pX (x) =m∑

x (xpX (x)) + b =mE[X ] + b

E[X + Y ] =∑x ,y

(x + y) pX ,Y (x , y) =∑x

∑y

xpX ,Y (x , y) + ypX ,Y (x , y)

=∑x ,y

xpX ,Y (x , y) +∑x ,y

ypX ,Y (x , y) = E[X ] + E[Y ]

E[X · Y ] =∑x ,y

(x · y) · pX ,Y (x , y)∗=∑x

∑y

(x · pX (x)) · (y · pY (y))

=∑x

(x · pX (x)) ·∑y

(y · pY (y)) = E[X ] · E[Y ].

114/144




ExampleAn insurance company offers motor vehicle insurance. Theprobability that an insured files a claim is 20%. Assume thatthe insured files not more than one claim.

a. Question: What is the probability mass function?

b. Question: What is the expected number of claims.

c. Question: Assumes that there are 150 insured. What is theexpected number of claims.

The claims will be paid at the end of the year. The requiredcapital will depends on investment return. A 10% increase inasset value occurs w.p. 20% and a decrease of 5% occursw.p. 15%. The claim value is $1,000 for each claim.

d. Question: What is the expected value the insurer needs tocover the claims?115/144




Solutiona. pX (x) =

0.2, if x = 1;0.8, if x = 0,

and zero otherwise.

b. E [X ] =∑

all x pX (x) · x = 0.2 · 1 + 0.8 · 0 = 0.2.

c. E [150 · X ] = 150 · E [X ] = 150 · 0.2 = 30.

d. Let Y be the random variable of the asset value. We have:

pY (y) =

0.2, if y = 1/1.1;0.65, if y = 1;0.15, if y = 1/0.95,

and zero otherwise.

E [1000 · Y ] =1000 · E [Y ]

=1000 ·(

0.2 · 1

1.1+ 0.65 · 1 + 0.15 · 1

0.95

)=1000 · 0.99689 = 996.89.

E [(150 · X ) · (1000 · Y )]∗=E [150 · X ] · E [1000 · Y ]

=30 · 996.89 = 29, 907.

* using independence between X and Y .

116/144



Measures of dispersion






SummarySummary





X − µX gives deviation from the expected value.

Question: Can we use expected deviation as measure ofdispersion: E [X − µX ]?

Solution: No, we have that:E [X − µX ] = E [X ]− E [µX ] = µX − µX = 0.

One measure of dispersion: Mean absolute deviation:

MAD(X ) = E [|X − µX |] .

Note: the mean absolute deviation minimized when µX is themedian of the distribution.

117/144




Variance of X

MAD is not easily to calculate, and has not the niceproperties (not related to moments of distribution).

Another measure of dispersion: Let X be a random variable,the variance is given by:

σ2 = Var (X ) =E[(X − µX )2

]=E

[X 2]− µ2X .

The function E[(X − α)2

]is minimized when α = µX .

The standard deviation of X is given by:

σ =√

Var (X ).118/144




Properties of the variance:Let X and Y be random variables, and m, b, c ∈ <, we have:

Var (c) = 0; Var (mX + b) = m2 · Var (X ) ;Var (X + Y ) = Var (X ) + Var (Y ) , only if X ,Y independent.

Proof:

Var (c) =E[(c − µc)2

]= E

[(c − c)2

]= 0

Var (mX + b) =E[(mX + b − µmX+b)2

]= E

[(mX + b −mµX − b)2

]=m2 · E

[(X − µX )2

]= m2 · Var (X )

Var (X + Y ) =E[((X − µX ) + (Y − µY ))2

]=E

[(X − µX )2 + (Y − µY )2 + 2(X − µX ) · (Y − µY )

]∗=Var (X ) + Var (Y ) ,

* using independence between X and Y .119/144




Exercises

The claims of the motor vehicle insurance are itself stochastic.The distribution of the claim value of an insured is:

fX (x) =

0, if x < 0;λ5 · exp (−λ · x) , if x > 0;

pX (0) =0.8.

a. Question: Find the expected value of the claims size.

b. Question: Find the variance of the claims size.

c. Question: The price of an insurance contract is the expectedvalue plus half the standard deviation. Find the price of theMVI contract.

d. Question: Same as c., but now for the 150 contract together.120/144




Solution

a. E [X ] = 0 · 0.8 +∫∞0

λ5 exp (−λ · x) · xdx

= 0 + λ5 ·[exp(−λ·x)(−λ)2 · (−λ · x − 1)

]∞0

= λ5 ·(0−

(1λ2· (−1)

))= 1

5·λ .

b. E[X 2]

=02 · 0.8 +

∫ ∞0

λ

5exp (−λ · x) · x2dx

=0 +λ

5·[

exp(−λ · x) ·(

x2

−λ− 2x

(−λ)2+

2

(−λ)3

)]∞0

=λ

5·(

0−(

2

−λ3

))=

2

5 · λ2.

Var(X ) =E[X 2]− (E [X ])2 =

2

5λ2− 1

25λ2=

9

25λ2.

c. Price = E [X ] + 0.5 ·√Var(X ) = 1

5·λ + 0.5 · 35·λ = 1

2·λ .

d. Price = E[∑150

i=1 Xi

]+ 0.5 ·

√Var(

∑150i=1 Xi ) =

150·E [X ]+0.5·√

150 · Var(X ) = 1505·λ +0.5· 3·

√150

5·λ =√913.5λ ≈ 30.22

λ .121/144




Skewness

Let X be a random variable. The skewness of X is given by:

γX = E

[(X − µXσX

)3].

Coefficient of skewness:

α3 =E[X 3]

(E [X 2])3/2.

Properties of skewness (σX+Y is the s.d. of X + Y ):

γX = 0, if the distribution of X is symmetric;γm·X+b = sign(m) · γX ;

γX+Y =γX ·σ3

X+γY ·σ3Y

σ3X+Y

, if X ,Y are independent.

122/144




Which statements are true?Note: E[X ] = 0 and σ = 1 for i) and ii)

and E[X ] = −4 and σ = 3 for iii).

−1 0 20

1/61/31/2

x

Pr(

X=

x)

distribution i)

−2 0 10

1/61/31/2

x

Pr(

X=

x)

distribution ii)

−10 −4 −10

1/61/31/2

x

Pr(

X=

x)

distribution iii)

a. Question: Distribution i) has apositive skewness.

a. Solution: True: skewness is 1.

b. Question: Distribution ii) has apositive skewness.

b. Solution: False: skewness is -1.

c. Question: Distribution iii) has asmaller skewness than distribution ii).

c. Solution: False: both have the SAMEskewness.

123/144




0 2 40

0.2

0.4

0.6

0.8

1

x

prob

abilit

y den

sity f

uncti

on

Snedecor‘s F p.d.f.

v1= 20, v

2= 40

0 2 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

cumu

lative

dens

ity fu

nctio

n

Snedecor‘s F c.d.f.

v1= 20, v

2= 40

Question: Positive/negative skew (skewed to the right/left)?

Solution: Positive skew (skewed to the right).124/144




0 0.5 10

0.5

1

1.5

2

x

prob

abilit

y den

sity f

uncti

on

Beta p.d.f.

a= 4, b= 2

0 0.5 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

cumu

lative

dens

ity fu

nctio

n

Beta c.d.f.

a= 4, b= 2

Question: Positive/negative skew (skewed to the right/left)?Solution: Negative skew (skewed to the left).125/144




Kurtosis

Let X be a random variable. The (excess) kurtosis of X isgiven by:

κX = E

[(X − µXσX

)4]− 3.

Measures the peakedness (positive) or flatness (negative) of arandom variable.

Kurtosis coefficient:

α4 =E[X 4]

(E [X 2])2.

126/144




Which statements are true?Note: E[X ] = 0, σ = 1, and γ = 0 for i) and ii)

and E[X ] = −4, σ = 3 and γ = 0 for iii).

−3 0 31/18

8/9

x

Pr(

X=

x)

distribution i)

−2 0 21/8

3/4

x

Pr(

X=

x)

distribution ii)

−10 −4 21/8

3/4

x

Pr(

X=

x)

distribution iii)

a. Question: Distribution ii) has asmaller excess kurtosis thandistribution i).

a. Solution: True: excess kurtosis for i)is 6 and for ii) is 1.

c. Question: Distribution iii) has asmaller excess kurtosis thandistribution ii).

c. Solution: False: both have the SAMEexcess kurtosis.

127/144




−2 0 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

x

proba

bility

dens

ity fu

nctio

n

p.d.f.

−2 0 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

cumu

lative

dens

ity fu

nctio

n

c.d.f.

Uniform(−√ 3,√ 3)Normal(0,1)Laplace(0,1/√ 2)

Question: Positive/none/negative excess kurtosis?Solution: Positive: green; none: red; negative: blue excess kurtosis.

Note: E[X ] = 0, Var(X ) = 1 and γX = 0 for all distributions.128/144




Exercises

An insurance company is pricing its policies using thestandard deviation pricing principle.

The regulator requires that insurers have enough reserves inorder to reduce the probability of ruin to 0.5%.

Holding capital is a cost.

a. Would the insurer company prefer claims with a distributionI. with mean $100 and standard deviation of $5.

II. with mean $50 and standard deviation of $10.

b. Would the insurer company prefer claims with a distributionI. with mean $100 and standard deviation of $5 and a skewness

of $5.II. with mean $100 and standard deviation of $5 and a skewness

of -$5.129/144




Exercises

c. Would the insurer company prefer claims with a distributionI. with mean $100 and standard deviation of $5 and a skewness

of $0 and a kurtosis of $5.II. with mean $100 and standard deviation of $5 and a skewness

of $0 and a kurtosis of $2.

d. Would the insurer company prefer claims with a distributionI. with mean $100 and standard deviation of $5, a positive

skewness and a negative kurtosis.II. with mean $100 and standard deviation of $5, a negative

skewness and a positive kurtosis.

Solution:a. II.b. II.c. II.d. Cannot say from the question.

130/144



r th central/non-central moments






SummarySummary




r th central moments

Let X be a r.v., the rth central moment is given by:

E [(X − µX )r ] =

∑all x

(x − µX )r · pX (x) ,


(x − µX )r · fX (x) dx ,

if X is continuous.

Relation central & non-central moments:

E [(X − µX )r ]∗=

r∑k=0

(r

k

)· E[X k]· (−µX )r−k .

* using binomial expansion.131/144




(Non-central) MomentsLet X be a r.v., the rth (non-central) moment is given by:

E [X r ] =

∑all x

x r · pX (x) ,


x r · fX (x) dx ,

if X is continuous.

Consider the Motor Vehicle insurer from slide 115.

Recall the mean of the claim size from b. on slide 116.

a. Question: Find the second central moment for an insured.

a. Solution: Var(X ) =E[X 2]− (E [X ])2 = (0.2 · 12 + .08 · 02)− 0.22 = 0.16 or

E[(X − µX )2

]= (1− 0.2)2 · 0.2 + (0− 0.2)2 · 0.8 = 0.16.

132/144




Exercisesb. Question: Find the skewness using the third central moment.b. Solution: Start with the third central moment:

E[(X − µX )3

]=0.2 · (1− 0.2)3 + 0.8 · (0− 0.2)3=0.1024− 0.0064=0.096.

γX =E

[(X − µX )3

σ3

]=

E[(X − µX )3

]σ3

=0.096

0.163/2= 1.5.

c. Question: Find the second and third non-central moments.c. Solution: E

[X 2]

= 12 · 0.2 + 02 · 0.8 = 0.2 andE[X 3]

= 13 · 0.2 + 03 · 0.8 = 0.2.d. Question: Find the skewness using only non-central moments.d. Solution:

E[(X − µX )3

]= E

[X 3 − 3µXX

2 + 3µ2XX − µ3X]

=

E[X 3]− 3µXE

[X 2]

+ 3µ2XE [X ]− 3µ3X =0.2− 0.12 + 0.024− 0.008=0.096.133/144




Knowledge of the mean, variance, skewness and kurtosis canprovide useful knowledge of the distribution.

Theorem: The complete set of all moments is required tocharacterize an arbitrary distribution, i.e., every distinctdistribution function has an unique set of moments.

Proof: Using Taylor series:

1 + E [X ] · t + E[X 2]· t

2

2!+ E

[X 3]· t

3

3!+ . . .

= E[

1 + X · t + X 2 · t2

2!+ X 3 · t

3

3!+ . . .

]∗= E

[eXt],

* using exp(x) = 1 + x + x2

2! + x3

3! + . . ., with x = X · t.134/144



Generating functions






SummarySummary




Moment generating function (mgf) of a r.v.

The moment generating function of a r.v. X is defined as:

MX (t) =E[eX ·t

]=1 + E [X ] · t + E

[X 2]· t

2

2!+ E

[X 3]· t

3

3!+ . . . .

Properties of m.g.f.:

M(r)X (0) = E [X r ] , for r = 0, 1, 2, 3, . . . ;

Mm·X+b(t) = MX (m · t) · eb·t , for constants m, b;MX+Y (t) = MX (t) ·MY (t), only if X ,Y are independent.

135/144




Use of moment generating function

Relation m.g.f. and non-central moments: we can write them.g.f. as an infinite series of the moments as follows:

MX (t) = E[eX ·t

]=∞∑k=0

µk ·tk

k!.

Generating non-central moments using the m.g.f.: we cangenerate the moments from the m.g.f. using the relationship:

µr = E [X r ] = M(r)X (t)

∣∣∣t=0

.

Function of random variables: week 4.

136/144




Proof: To prove the above result, consider the continuouscase (similar proof in the discrete case):

M(r)X (t) =

∂r

∂trE[eX ·t

]=

∂r

∂tr

∫ ∞−∞

ex ·t · fX (x) dx

=

∫ ∞−∞

(∂r

∂trex ·t)· fX (x) dx

=

∫ ∞−∞

(x r · ex ·t

)· fX (x) dx

=E[X r · eX ·t

].

Set t = 0 and you get the desired result.

Remark: If the m.g.f. exists for t in an open intervalcontaining zero, then it uniquely determines the probabilitydistribution.

137/144




Example: Consider the MVI from slide 115. The m.g.f. is:

MX (t) = E[eXt]

=∑all x

pX (x) · ext = 0.2 · e1·t + 0.8 · e0·t = 0.8 + 0.2et .

The first k non-central moments are:

E [X ] =∂MX (t)

∂t

∣∣∣∣t=0

= 0.2et∣∣t=0

= 0.2

E[X k]

=∂kMX (t)

∂tk

∣∣∣∣t=0

= 0.2et∣∣t=0

= 0.2.

The kurtosis is given by:

Var(X ) =E[X 2]− (E [X ])2 = 0.2− 0.22 = 0.16.

E[(X − µX )4

]=E

[X 4]− 4µXE

[X 3]

+ 6µ2XE[X 2]− 4µ3XE [X ] + µ4X

=0.2− 0.16 + 0.048− 0.0064 + 0.00032 = 0.08192.

κX =E[(X − µX )4

]/Var(X )2 − 3 = 0.08192/0.162 − 3 = 0.2.

138/144




ExercisesConsider the Motor Vehicle insurer from slide 120.

a. Question: Determine the m.g.f. of the claim size.

b. Question: Use the m.g.f. to determine the skewness.

a. Solution: We have:

MX (t) = E[et·X

]=0.8 · e0·t +

∫ ∞0

λ

5· e−λ·x · et·xdx

=0.8 +

∫ ∞0

λ

5· e(t−λ)·xdx

=0.8 +

[λ

5 · (t − λ)· e(t−λ)·x

]∞0

∗=0.8− λ

5 · (t − λ).

* note m.g.f. exists if t − λ < 0.139/144




b. Solution: Using MX (t) = 0.8− λ5·(t−λ) .

Find the non-central moments using the derivatives:

E [X ] =∂MX (t)

∂t

∣∣∣∣t=0

=−λ5· −1 · (t − λ)−2

∣∣∣∣t=0

=1

5λ

E[X 2]

=∂2MX (t)

∂t2

∣∣∣∣t=0

=λ

5· −2 · (t − λ)−3

∣∣∣∣t=0

=2

5λ2

E[X 3]

=∂3MX (t)

∂t3

∣∣∣∣t=0

=−2λ

5· −3 · (t − λ)−4

∣∣∣∣t=0

=6

5λ3.

Var(X ) =E[X 2]− (E [X ])2 =

2

5λ2− 1

(5λ)2=

9

25λ2

γX =E[(X − µX )3

]Var(X )3/2

=E[X 3]− 3µXE

[X 2]

+ 3µ2XE [X ]− µ3XVar(X )3/2

=6

5λ3− 3 · 1·2

25λ3+ 2 · 1·1

125λ3

(3/(5λ))3=

6 · 52

33− 3 · 2 · 5

33+ 2 · 1

33= 122/27.

140/144




Probability generating function (p.g.f.) of a r.v.

Let Y be an integer-valued random variable withPr(Y = i) = pi for i = 0, 1, 2, . . ., the p.g.f. is defined as:

PY (t) = E[tY]

=∞∑i=1

pY (i) · t i .

Properties of p.g.f.:

- The relationship between p.g.f. and m.g.f. is as follows:

PY (t) = MY (log(t)) .

- Probabilities: Pr(Y = r) = P(r)Y (t)

∣∣∣t=0

/r !

- Take the k th derivative:P

(k)Y (1) = E [Y · (Y − 1) · (Y − 2) · . . . · (Y − k + 1)] ≡ µ[k].

141/144




Cumulant generating function (c.g.f.) of a r.v.The cumulant generating function CX (t) for a randomvariable is given by:

CX (t) = log(MX (t)) =∞∑i=1

1

i !· hi · κi or MX (t) = eCX (t),

where κi = C(i)X (0) is the i th cumulant.

Properties of c.g.f.:

κi = i th central moment, for r = 2, 3;Cm·X+b(t) = CX (m · t) + b · t, for constants m, b;CX+Y (t) = CX (t) + CY (t), X ,Y independent.

Note that C(r)X (0) 6= E [(X − µX )r ] , for r = 4, 5, 6, . . ..

142/144


Summary

Summary






SummarySummary


Summary

Summary

Mathematical Expectation

The mathematical expectation of h (X ) is:

E [h (X )] =

∑all x

h (x) · pX (x) , if X is discrete;

∫ ∞−∞

h (x) · fX (x) dx , if X is continuous.

Mean: µ = E [X ].

Moments: µr = E [X r ] refers to the r th (non-central)moment.

Central moments: E [(X − µx)r ] refers to the r th centralmoment.

143/144


Summary

Summary

Dispersion

Variance: σ2 = Var(X ) = E[(X − µ)2

]= E

[X 2]− µ2.

Skewness: γX = E

[(X − µXσX

)3]

refers to the skewness. It

measures the lack of symmetry in the p.d.f..

Kurtosis: κX = E

[(X − µXσX

)4]− 3 refers to the kurtosis.

It measures the peakedness or flatness of the p.d.f..

Moment generating function: MX (t) = E[eX ·t

], it

uniquely defines a density function. It is useful to calculate

moments: µr = E [X r ] = M(r)X (t)

∣∣∣t=0

.

144/144

Documents

Week 1 Annotated