MTH3105/MTH4105 SemesterI, 2015/16 Probability …7 14 + 7 15 × 8 15 = 8 15 8 15 × 7 14 + 7 ... Chapter 7. Joint Distribution of Two Random Variables Joint probability density function

MTH3105 / MTH4105 Semester I, 2015/16

Probability

Worked Examples

Dr. Tony Yee

Department of Mathematics and Information Technology

The Hong Kong Institute of Education

October 3, 2015

AMENDMENTS.

Chapter 1.

Pages 11. dices dice

Chapter 2.

Pages 27. 0.3518 0.3519

Pages 29. (e) P (E) ∩ P (F ) ∩ P (G) P (E ∩ F ∩G)

Chapter 3.

Pages 39, 41, 46. Baye’s Bayes’

Pages 42. 0.43 0.4375

Pages 47. 0.0467 0.0468

Chapter 4.

Pages 51. 0.267 0.2676

Pages 51. C6

0

(1

6

)0(5

6

)3

C6

0

(1

6

)0(5

6

)6

Pages 52. (i = 1, 2) (i = 1, 2, 3)

Chapter 5.

Pages 69. 0.7357 0.2643 73.57% 26.43%

Pages 72. 1010− 1000 = 5 1010− 1005 = 5

Chapter 6.

Pages 82.8

15×

7

14+

7

15×

8

15=

8

15

8

15×

7

14+

7

15×

8

14=

8

15

Chapter 7.

Chapter 8.

Contents

Table of Contents ii

1 Combinatorial Analysis 3

2 Axioms of Probability 25

3 Conditional Probability and Independence 35

4 Discrete Random Variables 49

5 Continuous Random Variables 61

6 Mathematical Expectation 77

7 Joint Distribution of Two Random Variables 87

8 Markov Chains and Applications 97

1

CONTENTS

Table of Contents (Keywords for your reference)

Chapter 1. Combinatorial Analysis

✷ Basic principles of counting ✷ Addition rule ✷ Multiplication rule

✷ Permutation without repetition ✷ Combination without repetition

✷ Permutation with repetition ✷ Combination with repetition

Chapter 2. Axioms of Probability

✷ Sample space and events ✷ Axioms of probability ✷ Inclusion-exclusion principle

✷ Venn diagram ✷ Equally likely outcomes ✷ Simple probability

Chapter 3. Conditional Probability and Independence

✷ Reduced sample space ✷ Conditional probability

✷ Multiplication rule for conditional probability ✷ Conditioning ✷ Total probability

✷ Bayes’ theorem / formula ✷ Dependent and independent events

Chapter 4. Discrete Random Variables

✷ Random variable ✷ Probability density function (pdf) ✷ Bernoulli distribution

✷ Binomial distribution ✷ Geometric distribution ✷ Hypergeometric distribution

✷ Poisson distribution ✷ Negative binomial distribution

✷ Approximating Binomial by Poisson ✷ Approximating Hypergeometric by Binomial

Chapter 5. Continuous Random Variables

✷ Continuous random variable ✷ Cumulative distribution function ✷ Uniform distribution

✷ Normal distribution ✷ Approximating Binomial by Normal

✷ Approximating Poisson by Normal ✷ Continuity correction factor

✷ Exponential distribution ✷ Relationship between Exponential and Poisson

Chapter 6. Mathematical Expectation

✷ Expected value ✷ Favourable / unfavourable game

✷ Mean and variance of a random variable ✷ Expectation rules ✷ Variance rules

Chapter 7. Joint Distribution of Two Random Variables

✷ Joint probability density function (as a table) ✷ Marginal probability density functions

✷ Necessary and sufficient condition for independent random variables

✷ Sum of two independent Binomial random variables

✷ Sum of two independent Poisson random variables

Chapter 8. Markov Chains and Applications

✷ Markov chain ✷ State space ✷ Stationary transition probability

✷ Transition probability matrix ✷ Matrix multiplication ✷ A simple random walk

2

Chapter 1

Combinatorial Analysis

� Example 1.1 (Multiplication rule ⋆ )

How many ways are there to place 10 identical balls in 10 boxes of all different colors so that exactly one boxis empty?

Solution The key point is “exactly one box is empty”. That is to say, among the 10 given boxes, onebox is empty, one box contains two balls and the remaining 8 boxes are non-empty each with one ball inside.There are 10 ways of choosing the empty box and 9 ways of choosing the box with two balls. In summary,there are

10× 9 = 90

ways of placing the 10 indistinguishable balls in 10 boxes of all different colors so that exactly one box isempty. ✷

Remark We generalize the given question. How many ways are there to place n identical balls in n boxesof all different colors so that exactly one box is empty where n is an integer larger than 1? Borrowing thesame idea in the above, there are

n× (n− 1) = n(n− 1)

ways of placing n indistinguishable balls in n boxes of all different colors so that exactly one box is empty.


(a) In how many ways can 6 people be lined up to get a bus?

(b) If 2 specific persons, among 6, insist on following each other, how many ways are possible?

(c) If 2 specific persons, among 6, refuse to follow each other, how many ways are possible?

Solution

(a) Required number of ways = 6! = 6× 5× 4× 3× 2× 1 = 720.

(b) Required number of ways = 5!× 2! = 240.

(c) Required number of ways = 6!− 5!× 2! = 480.

✷

Remarks (1) Remember that “people must all be different”. We are counting permutations withoutrepetition. (2) If 2 specific persons insist on following each other, then treat them as “one”.

3

1. Combinatorial Analysis

� Example 1.3 (Addition rule ⋆ ) Select 3 digits from 0, 1, 2, 3, 4, 5 and 6.

(a) How many three-digit numbers can be formed?

(b) How many of these are odd numbers?

(c) How many are greater than 330?

Solution

(a) The digit in the hundreds position cannot be zero.

Required number of three-digit numbers = 6× 6× 5 = 180.

(b) The digit in the units position is odd and the digit in the hundreds position is not zero.

Required number of three-digit numbers = 5× 5× 3 = 75.

(c) Case 1. The digit in the hundreds position is greater than 3.

Number of three-digit numbers = 3× 6× 5 = 90.

Case 2. The digit in the hundreds position is 3 and the digit in the tens position is greater than 3.

Number of three-digit numbers = 1× 3× 5 = 15.

In the above, the two cases that we considered are mutually exclusive and exhaustive.

Required number of three-digit numbers = 90 + 15 = 105.

✷


12321, 234432, 11511 are examples of palindromic numbers. How many 5-digit numbers which are palin-dromic?

Solution The number of all possible 5-digit palindromic numbers is given by

9 × 10 × 10 × 1 × 1 = 900.

✷


Consider a 3-digit combination lock with digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. How many choices of the 3-digitpasswords of the combination lock if consecutive repetition is not allowed? For examples, 454 is an allowedpassword while 445 is not.

Solution The number of all possible 3-digit passwords is given by

10 × 9 × 9 = 810.

✷

� Example 1.6 (Multiplication rule ⋆⋆ )

Find the number of ways in which 5 boys and 4 girls can be seated alternatively in a row and if in particularJohn and Mary have to sit next to each other (note that only one boy is named as John and only one girl isnamed as Mary among the 9 people).

4

Solution The seating plan must be in this form (G=girl, B=boy):

B G B G B G B G B.

Imagine that Case 1. John is either sitting in the 1st seat (then what is the position of Mary?) orCase 2. John in the 3rd seat or Case 3. John in the 5th seat or Case 4. John in the 7th seat orCase 5. John in the 9th seat (i.e., the last seat). In the above, Case 1 (i.e., John sitting in the 1st seat)introduces

3!× 4!

different seating arrangements, while each case from Case 2 to Case 4 introduces

(

3!× 4!)

× 2

different seating arrangements. The last case, Case 5, introduces

3!× 4!

different seating arrangements. Hence, the total of different such arrangements is given by

(

1 + 3× 2 + 1)

× 3!× 4! = 1,152.

✷

Remark In the above, the five cases that we considered are mutually exclusive and exhaustive.


A bookshelf contains 3 German books, 4 French books and 5 Chinese books in a row. Each book is differentfrom one another. What is the number of arrangements that no two Chinese books must be next to eachother?

Solution Align the German books and French books first. Putting these 3+4 = 7 books creates 7+1 = 8spaces (we count the space before the first book, the spaces between books and the space after the last book):

1st book 2nd book 3rd book 4th book 5th book 6th book 7th book.

To guarantee that no two Chinese books are next to each other, we put them into these spaces. The firstChinese book can be put into any of 8 spaces, the second into any of 7 spaces, etc., the fifth Chinese bookcan be put into any of 4 spaces. Now, the non-Chinese books can be permuted in 7! ways. Thus the totalnumber of permutations is

(

8× 7× 6× 5× 4)

× 7! = 33,868,800.

There are more than 33 million arrangements of the books. ✷


Two German, three French and four Chinese are to be seated in a row. What is the number of differentseating arrangements that a Chinese will not sit next to another Chinese but the two German must sit nextto each other?

Solution Treat the two German as “a single people”. Then align German and French first. Putting these1 + 3 = 4 “people” together in a row creates 5 spaces (we count the space before the first , the spacesbetween them and the space after the last). To assure that no two Chinese are seated next to each other, weput them into these spaces. The first Chinese can be seated into any of 5 spaces, the second into any of 4spaces, the third into any of 3 spaces, the fourth into any of 2 spaces. Now, the non-Chinese can be seatedin 4!× 2! different ways. Thus, based on the given rules, the number of different seating arrangements is

(

5× 4× 3× 2)

× 4!× 2! = 5,760.

5


The LHS of the above equation can be rewritten as

P 54 × P 4

4 × P 22 = 5,760.

✷

Remark Compare the similarities and differences between Example 1.7 and Example 1.8. Could youcatch the difference between the usage of “German books” and “German people”, respectively, in the twoexamples?

� Example 1.9 (Multiplication rule ⋆⋆⋆ )

5 red marbles and 5 white marbles are to be placed in a row. All marbles are identical except for colors. At nopoint in the row may three or more consecutive marbles have the same color. How many such arrangementsare possible?

Solution Let R and W denote red and white marbles respectively.

permutation(s)

Case 1. R : 1 1 1 1 1 1

(i): W : 1 1 1 1 1 ×1× 2 = 2

(ii): 2 1 1 1 ×4× 1 = 4

Case 2. R : 2 1 1 1 4

(i): W : 1 1 1 1 1 ×1× 1 = 4

(ii): 2 1 1 1 ×4× 2 = 32

(iii): 2 2 1 ×3× 1 = 12

Case 3. R : 2 2 1 3

(i): W : 2 1 1 1 ×4× 1 = 12

(ii): 2 2 1 ×3× 2 = 18

The above 7 cases(Case 1 (i); Case 1 (ii); Case 2 (i); Case 2 (ii); Case 2 (iii); Case 3 (i); Case 3 (ii)

)

are mutually exclusive and exhaustive. The total number of such arrangements will simply be given by thesum of all numbers:

2 + 4 + 4 + 32 + 12 + 12 + 18 = 84.

✷

Remark We may change the given question: 5 red marbles and 4 white marbles are to be placed ina row. All marbles are identical except for colors. At no point in the row may three or more consecutivemarbles have the same color. How many such arrangements are possible? What is the answer to thischanged question?

Answer: 45

� Example 1.10 (Inclusion-exclusion principle ⋆ )

Consider an experiment that consists of six horses, numbered 1 through 6, running a race and suppose thatthe sample space consists of the 6! possible orders in which the horses finish. Let A be the event that thenumber 1 horse is among the top three finishers, and let B be the event that the number 2 horse comes insecond. How many outcomes are in the event A ∪B ?

6

Solution Since there are 5! = 120 outcomes in which the position of number 1 horse is specified, it followsthat n(A) = 3× 120 = 360, the number 1 horse is among the top three finishers. Similarly, n(B) = 120,and n(A ∩B) = 2× 4! = 48. It follows from the inclusion-exclusion principle that

n(A ∪B) = n(A) + n(B)− n(A ∩B).

We obtain thatn(A ∪B) = 360 + 120− 48 = 432.

✷

� Example 1.11 (Round table ⋆ )

12 people are randomly seated at a round table. How many seating arrangements that John and Mary willsit next to each other?

Solution Let us assume that only one male is named as John and only one female is named as Maryamong the 12 people. Let us also assume that the 12 people are randomly and regularly located in a circle.We “cut” the circle at the location of John into a straight row as shown in the figure below. There exist 2possible cases of seating arrangements that John and Mary will sit next to each other.

Case 1.

John Mary

Case 2.

John Mary

The above two cases are mutually exclusive and exhaustive. The total number of seating arrangements istherefore given by

10! + 10! = 10!× 2 = 7,257,600

which is a large number (larger than 7 million). ✷

� Example 1.12 (Round table ⋆⋆ )

Assume that 2 married couples and one single man (five people in total) are seated randomly at a roundtable. How many seating arrangements can be made if no wife sits next to her husband?

Solution Denote the married couples and the single man as (A1, B1), (A2, B2) and B3, respectively.There are two possible seating arrangements respectively shown in the following: The above two cases are

Case 1.A1 B1

A2 B2 ×2

A2 B2 ×2

Case 2.

A1 B1

A2 B2 ×2

A2 B2 ×2

mutually exclusive and exhaustive. The required number of different seating arrangements is given by

2× 2 + 2× 2 = 8.

✷

7


� Example 1.13 (Permutation vs. combination ⋆ )

A password code consists of six digits. How many different password codes may be formed from

(a) six digits chosen from {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} allowing repetition?

(b) six different digits chosen from {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}?

(c) six different digits chosen from {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, restricted to be in ascending order?

Solution

(a) There are10× 10× 10× 10× 10× 10 = 106 = 1,000,000

different password codes.

(b) There are10× 9× 8× 7× 6× 5 = P 10

6 = 151,200

different password codes.

(c) Since each selection (without order) of six different digits will correspond to only one password code.There are

C106 =

151200

6!= 210

different password codes. For an illustrative example, for the selected six numbers: {4, 6, 9, 2, 1, 5},the corresponding (one and only one) password code is: 124569.

✷

� Example 1.14 (Combination rule ⋆ )

A shipment of 12 computer monitors contains 3 defective ones. In how many ways can an office purchase 5of these monitors and receive at least 2 of the defective monitors?

Solution

Required number of ways = n(two defective among five) + n(three defective among five)

= C32 × C9

3 + C33 × C9

2

= 252 + 36 = 288.

✷


From a group of 7 women and 9 men, a committee consisting of 4 women and 5 men is being formed. Howmany different committees can be formed if two of the women in the group do not really like each other andrefuse to serve on the committee together?

Solution It follows from the multiplication rule that there are

C74 × C9

5 = 35× 126 = 4,410

possible committees consisting of 4 women and 5 men in total. However, according to the given question,two of the women in the group refuse to serve on the committee together, then there are

C20 × C5

4 + C21 × C5

3 = 5 + 20 = 25

groups of 4 women not containing both of the feuding women. Since there are C95 = 126 ways to choose

the 5 men, it follows that, in this case, there are

25× 126 = 3,150

possible committees. ✷

8


A six-digit password code is palindromic if reversing it gives the same code. For example, both 321123 and142241 are palindromic password codes, but 134413 is not. It follows that a palindromic password code caninvolve at most three different digits (i.e., abccba). How many palindromic six-digit password codes can beformed using some of the nine digits: 1, 2, 3, 4, 5, 6, 7, 8, 9, if it is further required that one digit of thepalindromic password code is used four times.

Solution If one digit is used 4 times in a palindromic code, another one must be used twice. These twodigits may be chosen in {1, 2, 3, 4, 5, 6, 7, 8, 9} and there are

C92 = 36

ways of selections. Once the two digits have been chosen, say 1 and 2 for an example, only 6 patterns arepossible, which are

1 1 2 1 2 1 2 1 1

2 2 1 2 1 2 1 2 2

which correspond to the 6 password codes:

1 1 2 2 1 1 1 2 1 1 2 1 2 1 1 1 1 2

2 2 1 1 2 2 2 1 2 2 1 2 1 2 2 2 2 1

The total number of six-digit password codes satisfying the given conditions is given by

6× 36 = 216.

✷

� Example 1.17 (Combination rule ⋆⋆ )

A certain bank assigns each credit card holder a four-digit PIN (personal identity number). Each PIN iscomposed of 4 digits using any of the following digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. How many differentPINs are there if the first digit cannot be zero and no digit may occur more than twice.

Solution All of the 9× 10× 10× 10 = 9,000 possible PINs are allowed (first digit is nonzero) except:

(i) the 9 PINs where all four digits are identical: 1111, 2222, 3333, · · · , 9999.

(ii) those where one digit occurs three times and another just one. There are C102 = 45 ways of choosing

any two digits, say 1 and 2. Note that there are 8 different PINs if the two digits are fixed:

1222 2122 2212 2221

2111 1211 1121 1112

But be careful that the PINs of the following 4 patterns should be omitted since the first digit maynot be zero (where a can be any digit from 1, 2, 3, 4, 5, 6, 7, 8, 9):

0aaa 0a00 00a0 000a

There are 45× 8− 4× 9 = 324 PINs of this type.

The required number of all possible PINs is given by

9,000− 9− 324 = 8,667.

✷

9


� Example 1.18 (Combinations rule ⋆ )

A standard pack of 52 poker cards consists of 4 suits (Spades ♠, Hearts ♥, Diamonds ♦ and Clubs ♣). Awell-shuffled pack is dealt to 4 players so that each receives 13 cards. What is the number of ways that eachplayer receives at least 3 Spades? You may have your answer in terms of factorials.

Solution The division of Spades must be 3, 3, 3, 4 between players, say one of whom can be the personwho receives 4, there are 4 ways for this. The number of ways that each player receives at least 3 Spades isgiven by

4× C133 C39

10 × C103 C29

10 × C73C

1910 × C4

4C99

= 4× 13!

3! 10!

39!

10! 29!× 10!

3! 7!

29!

10! 19!× 7!

3! 4!

19!

10! 9!× 1

=13! 39!

9! (3!)4 (10!)3.

We have no idea about the numerical value of the above expression without the help of computer software.But it is clear enough that it must be a (large) integer. In fact I have used the computer softwareMathematica

to successfully find its value:

5,652,079,478,333,572,557,297,024,000 ≈ 5.652 octillion

= 56.52 trillion trillion.

How much is a trillion (dollars)? Check this out: http://www.pagetutor.com/trillion ✷

Remark We may change the given question: · · · What is the number of ways that each player receivesat least 3 Spades and at least 3 Hearts? What is the answer to this changed question?

Answer: C134 C

134 C

265 × C

93C

93C

217 × C

63C

63C

147 × 4 + C

134 C

133 C

266 × C

93C

104 C

206 × C

63C

63C

147 × 12.

� Example 1.19 (Combination rule ⋆⋆ )

Ants love sweets. The figure below shows a network of routes from an ant’s initial position (labeled A) to theplace of sweets (labeled C). Assume that the ant can only go either east or north at each junction (such asB).

��

��

��

North

East

C

B

A

(a) How many routes from A to C can the ant choose?

(b) Once the ant arrives at a junction, the probability that it goes north is 0.4.

(i) Find the probability that the ant will arrive at C.

(ii) Find the probability that the ant will arrive at C without passing through the junction B.

Solution Let N denote “North” and E denote “East”.

10

(a) Number of routes = n(permutations of 4 E’s and 3 N’s) = C73 = 35.

(b) (i) P (Ant will arrive at C) = C73 × (0.4)3(0.6)4 = 0.2903.

(ii) P (Ant will arrive at C without passing B) =(C7

3 − C31 × C4

2

)× (0.4)3(0.6)4 = 0.1410.

✷

� Example 1.20 (Permutations with repetition ⋆ )

How many different letter arrangements can be made from the letters in the word of PROBABILITY?

Solution By permutations with repetition, there are

11!

1! 2! 2! 1! 1! 1! 1! 1! 1!= 9,979,200

different letter arrangements. Here we have total 11 letters, while 2 letters (B, I) appear twice, and allremaining letters (A, L, O, P , R, T , Y ) appear once each. ✷


In how many ways can 8 graduate students be assigned to one double and two triple hotel rooms during aconference?

Solution By permutations with repetition, there are

8!

2! 3! 3!= 560

different hotel-room arrangements. ✷


Six identical fair dice are rolled and subsequently arranged in a row. How many different arrangements ofgetting three pairs? (“Three pairs” means for example “a pair of 1, a pair of 2 and a pair of 5”.)

Solution “Three pairs” means a choice of 3 numbers out of the 6 numbers from 1 to 6. One can now aska question “which three pairs”? The answer is given by sampling: C6

3 = 20. Now we can focus on one ofthe 20 cases, say {1, 2, 5}, and figure out the probability of getting “a pair of 1, a pair of 2 and a pair of 5”.The number of ways that the 6 dice can show the pattern (1, 1, 2, 2, 5, 5) is given by

6!

2! 2! 2!= 90.

Finally, we multiply this by the number of choices of the 3 numbers to get

C63 × 6!

2! 2! 2!= 20× 90 = 1,800.

✷

Remark Six identical fair dice are rolled and subsequently arranged in a row. There are a total of

66 = 46,656

different arrangements. Among all these arrangements, there are 1,800 belonging to the class of “threepairs”. However, “three pairs” is only one class of arrangements. In the following we would like to find allmutually exclusive and exhaustive classes of arrangements.

11


Class An example No. of arrangements

Three pairs: 2 2 3 3 5 5 C63 ×

(C6

2 × C42 × C2

2

)= 1,800.

Two pairs: 2 2 3 3 4 5(C6

2 × C42

)×(C6

2 × C42 × 2!

)= 16,200.

One pair: 2 2 3 4 5 6(C6

1 × C54

)×(C6

2 × 4!)

= 10,800.

No pair: 1 2 3 4 5 6 6! = 720.

Three of a kind: 2 2 2 3 4 5(C6

1 × C53

)×(C6

3 × 3!)

= 7,200.

Four of a kind: 2 2 2 2 4 5(C6

1 × C52

)×(C6

4 × 2!)

= 1,800.

Five of a kind: 2 2 2 2 2 5(C6

1 × C51

)×(C6

5 × 1)

= 180.

Six of a kind: 2 2 2 2 2 2 C61 = 6.

Three and Two: 2 2 2 3 3 4(C6

1 × C51 × C4

1

)×(C6

3 × C32 × 1

)= 7,200.

Four and Two: 2 2 2 2 4 4(C6

1 × C51

)×(C6

4 × 1)

= 450.

Three and Three: 2 2 2 3 3 3 C62 × C6

3 = 300.

Adding the numbers of arrangements altogether gives

1,800 + 16,200 + 10,800 + 720 + 7,200 + 1,800 + 180 + 6 + 7,200 + 450 + 300 = 46,656.

� Example 1.23 (Permutations with repetition ⋆⋆ )

A Personal Identification Number (PIN) consists of five digits in order, each of which may be any one of 0,1, 2, 3, 4, 5, 6, 7, 8, 9. Find the number of PINs satisfying each of the following requirements.

(a) All five digits are different.

(b) There are exactly four different digits being used.

(c) There are exactly three different digits being used, two of which occurs twice.

(d) Exactly one of the digits occurs three times.

Solution

(a) n(PINs with all five digits different) = P 105 = 30,240.

(b) n(PINs with exactly four different digits being used) = C101 × C9

3 × 5!

2! 1! 1! 1!= 50,400.

(c) n(PINs with exactly three different digits being used, two of which occurs twice)

= C102 × C8

1 × 5!

2! 2! 1!= 10,800.

(d) n(PINs with one digit occurs three times) = C103 × C3

1 × 5!

3! 1! 1!+ C10

2 × C21 × 5!

3! 2!= 8,100.

✷

12

� Example 1.24 (Combination with repetition ⋆ )

Let n, x1, x2, · · · , xr be positive integers. How many distinct integer-valued solutions of

x1 + x2 + · · ·+ xr = n (n > r)

are possible?

Solution We rewrite the given equation as

x1 + x2 + · · ·+ xr = n =

summing n number of “1”’s︷︸︸︷

1 + 1 + · · ·+ 1 .

On the rightmost of the above equation, there are (n−1) number of the plus-sign “+” whereas on the leftmost,there are (r − 1) number of the plus-sign “+”. The total number of distinct integer-valued solutions to thegiven equation will be given by

Cn−1r−1 .

✷

Remark You may get stuck on why the answer is given by Cn−1r−1 . Now we would like to give the

reasoning and the details through an illustrative example. Let x1, x2, x3 be positive integers. How manydistinct integer-valued solutions of

x1 + x2 + x3 = 6

are possible? The answer to this (simple) question is 10. For illustration, of course, we can list all theseinteger-valued solutions in the table below.

Solution No. x1 x2 x3 x1 + x2 + x3 = 6

1. 1 1 4(1)

+(1)

+(1 + 1 + 1 + 1

)= 6

2. 1 2 3(1)

+(1 + 1

)+(1 + 1 + 1

)= 6

3. 1 3 2(1)

+(1 + 1 + 1

)+(1 + 1

)= 6

4. 1 4 1(1)

+(1 + 1 + 1 + 1

)+(1)

= 6

5. 2 1 3(1 + 1

)+(1)

+(1 + 1 + 1

)= 6

6. 2 2 2(1 + 1

)+(1 + 1

)+(1 + 1

)= 6

7. 2 3 1(1 + 1

)+(1 + 1 + 1

)+(1)

= 6

8. 3 1 2(1 + 1 + 1

)+(1)

+(1 + 1

)= 6

9. 3 2 1(1 + 1 + 1

)+(1 + 1

)+(1)

= 6

10. 4 1 1(1 + 1 + 1 + 1

)+(1)

+(1)

= 6

Apart from carefully listing all these solutions (so that after counting we know the answer is 10) we mayhave a quicker and more elegant method to “count” the total number of solutions (= 10). This can be doneby selecting two plus-signs “+” from the available five “+”. Look at the column “x1+x2+x3” in the abovetable for details. Then the total number of solutions to the equation x1 +x2 +x3 = 6 is given by the totalnumber of such selections. Counting the number of selections when in general you select 2 from 5 distinctobjects gives C5

2 which is equal to 10.


Let n be a positive integer. Let x1, x2, · · · , xr be non-negative integers (i.e., either positive or zero). Howmany distinct integer-valued solutions of

x1 + x2 + · · ·+ xr = n

are possible?

13


Solution Denote yi = xi + 1 for all i. Next rewrite the equation x1 + x2 + · · ·+ xr = n as

(x1 + 1

)+(x2 + 1

)+ · · ·+

(xr + 1

)= n+ r,

y1 + y2 + · · ·+ yr = n+ r.

Our focus is on the last equation of which the RHS can be rewritten as

y1 + y2 + · · ·+ yr = n+ r =

summing (n+ r) number of “1”’s︷︸︸︷

1 + 1 + 1 + · · ·+ 1 .

On the rightmost of the above equation, there are (n+ r − 1) number of the plus-sign “+” whereas on theleftmost, there are (r−1) number of the plus-sign “+”. The total number of distinct integer-valued solutionsto the given equation will be given by

Cn+r−1r−1 .

✷

Remark We revisit the last illustrative example. Let x1, x2, x3 be non-negative integers. How manydistinct integer-valued solutions of

x1 + x2 + x3 = 6

are possible? The answer to this question is 28. We may list all these integer-valued solutions in the tablebelow.

Solution No. x1 x2 x3 Solution No. x1 x2 x3

1. 0 0 6 15. 2 1 3

2. 0 1 5 16. 2 2 2

3. 0 2 4 17. 2 3 1

4. 0 3 3 18. 2 4 0

5. 0 4 2 19. 3 0 3

6. 0 5 1 20. 3 1 2

7. 0 6 0 21. 3 2 1

8. 1 0 5 22. 3 3 0

9. 1 1 4 23. 4 0 2

10. 1 2 3 24. 4 1 1

11. 1 3 2 25. 4 2 0

12. 1 4 1 26. 5 0 1

13. 1 5 0 27. 5 1 0

14. 2 0 4 28. 6 0 0

Again, the number of solutions can be deduced by selecting two “+” from the available eight “+”. Foran example,

Solution No. x1 x2 x3

y1︷︸︸︷(x1 + 1

)+

y2︷︸︸︷(x2 + 1

)+

y3︷︸︸︷(x3 + 1

)= 9

1. 0 0 6(1)

+(1)

+(1 + 1 + 1 + 1 + 1 + 1 + 1

)= 9

The total number of solutions to the equation x1+x2+x3 = 6 (where x1, x2, x3 are non-negative integers)which is equal to the total number of solutions to the equation y1 + y2 + y3 = 9 (where y1, y2, y3 arepositive integers) will be given by the total number of such selections. Counting the number of selectionswhen in general you select 2 from 8 distinct objects gives C8

2 = 28.

14


You have a box with red sweets, a box with yellow sweets and a box with black sweets. In how many ways canyou choose 10 sweets from these 3 boxes provided that you can taste the sweets of all three colors? Assumethat each box has a lot of sweets.

Solution The question is equivalent to finding the number of positive integer-valued solutions to

x1 + x2 + x3 = 10

where x1, x2, x3 are all positive integers. By Example 1.24 (page 13), we know that

total number of ways = C10−13−1 = C9

2 = 36.

Note that this question is a distribution problem and thus belonging to combination with repetition problem.✷


There are five flavors of ice cream: banana, chocolate, lemon, strawberry and vanilla. We can have threescoops. How many variations will there be?

Solution Denote the five flavors as: b, c, l, s, v. Examples of selected three scoops include{c, c, c

}means 3 scoops of chocolate,

{b, l, v

}means one each of banana, lemon and vanilla,

{b, v, v

}means one of banana, two of vanilla.

Note that in the above notation, for example,{b, l, v

}={l, v, b

}={v, b, l

}, they all mean one each of

banana, lemon and vanilla. The five favors ice cream problem is equivalent to finding the number of non-negative integer-valued solutions to

x1 + x2 + x3 + x4 + x5 = 3

where xi are all non-negative integers. This question is also equivalent to distributing 3 identical ping-pongballs into 5 different colored containers (some containers can be empty). We are interested in how manydifferent ways. Each way represents a possible combination (repetition is allowed). For examples,

(x1, x2, x3, x4, x5) = (0, 3, 0, 0, 0) is equivalent to{c, c, c

},

(x1, x2, x3, x4, x5) = (1, 0, 1, 0, 1) is equivalent to{b, l, v

},

(x1, x2, x3, x4, x5) = (1, 0, 0, 0, 2) is equivalent to{b, v, v

}.

By Example 1.25 (page 13), we know that

total number of variations = C3+5−15−1 = C7

4 = 35.

✷


Find the number of integer-valued solutions to

x1 + x2 + x3 + x4 = 100,

where x1 > 30, x2 > 21, x3 > 1 and x4 > 0.

Solution Let y1 = x1 − 29 > 1, y2 = x2 − 21 > 1, y3 = x3 > 1, y4 = x4 + 1 > 1. Next rewrite thegiven equation x1 + x2 + x3 + x4 = 100 as

(x1 − 29

)+(x2 − 21

)+(x3

)+(x4 + 1

)= 100− 29− 21 + 1,

y1 + y2 + y3 + y4 = 51.


total number of ways = C51−14−1 = C50

3 = 19,600.

✷

15



How many non-negative integer valued solutions to (the inequality)

x1 + x2 + x3 + x4 + x5 + x6 < 10?

Solution The question is equivalent to finding the number of integer valued solutions to the equation

x1 + x2 + x3 + x4 + x5 + x6 + x7 = 10,

where x1, x2, x3, x4, x5, x6 > 0 and x7 > 0. Let yi = xi > 0, 1 6 i 6 6 and y7 = x7 − 1 > 0. Nextrewrite the above equation as

x1 + x2 + x3 + x4 + x5 + x6 +(x7 − 1

)= 10− 1,

y1 + y2 + y3 + y4 + y5 + y6 + y7 = 9.


total number of ways = C9+7−17−1 = C15

6 = 5,005.

✷

� Example 1.30 (Combination with repetition ⋆⋆ )

In how many ways can we distribute 12 identical folders into 5 distinct drawers such that the last drawer hasat most 3 folders in it? (Assume that the drawers can be empty.)

Solution The question is equivalent to finding the number of distinct integer solutions to

x1 + x2 + x3 + x4 + x5 = 12 in which “x5 = 0, 1, 2, 3” and “x1, x2, x3, x4 > 0”.

It follows from Example 1.25 (page 13) that when

x5 = 0: x1 + x2 + x3 + x4 = 12, x1x2, x3, x4 > 0. The number of solutions is C12+4−14−1 = C15

3 .

x5 = 1: x1 + x2 + x3 + x4 = 11, x1, x2, x3, x4 > 0. The number of solutions is C11+4−14−1 = C14

3 .


3 .


3 .

Total number of ways = C153 + C14

3 + C133 + C12

3

= 455 + 364 + 286 + 220 = 1,325.

✷

� Example 1.31 (Combination with repetition ⋆⋆ )

In how many ways can we distribute eight identical balls into four distinct containers so that the fourthcontainer has an odd number of balls in it? (Assume that the other containers can be empty.)

Solution This question is equivalent to finding the number of distinct integer solutions to

x1 + x2 + x3 + x4 = 8 in which x1, x2, x3 > 0 and x4 = 1, 3, 5 or 7.

It follows from Example 1.25 (page 13) that when

x4 = 1: x1 + x2 + x3 = 7, x1, x2, x3 > 0. The number of solutions is C7+3−13−1 = C9

2 .


2 .


2 .


2 .

Total number of ways = C92 + C7

2 + C52 + C3

2

= 36 + 21 + 10 + 3 = 70.

✷

16

� Example 1.32 (Sudoku counting ⋆⋆⋆ )

Sudoku puzzles usually start with some hint entries. If you start with a completely empty 4× 4 grid Sudokuboard, how many different ways are there to fill it in?

The figure above which shows a completed Sudoku board is counted as one way of filling.

Solution Method 1 (Wrong answer just for illustration)

Total number of distinct Sudoku =16!

4! 4! 4! 4!= 63,063,000.

Method 1 is incorrect and the answer is wrong. (Why?)

Method 2 (Wrong answer just for illustration)

1st Row −→ 4× 3× 2× 1

2nd Row −→ 2× 1× 2× 1

3rd Row −→ 2× 2× 1× 1

4th Row −→ 1× 1× 1× 1

Total number of distinct Sudoku = 24× 4× 4 = 384.

Method 2 is incorrect and the answer is wrong. (Why?)

Method 3 (Suggested method)

Step 1.

First row: 4! = 24 possibilities.

Step 2.

To fill the first block there are now 2 possibilities (i.e., 3 4 or 4 3).

17


Step 3.

(a) (b)

To fill the second block there are 2 possible cases (a) and (b) (respectively, 1 2 and 2 1).

Step 4.

(a) (b)

For case (a), there are 4 possibilities for the third row:

2 1 4 3, 2 3 4 1, 4 1 2 3, 4 3 2 1.

For case (b), there are only 2 possibilities for the third row:

2 1 4 3, 4 3 1 2.

Step 5 (final step).

Total number of distinct Sudoku = 4!× 2×(4 + 2

)= 288.

Method 4 (Alternative method)

4! ×2 × 2 ×3

In the third Sudoku board, the symbol ∗ can be either 1, 2 or 3 (3 possibilities).

Total number of distinct Sudoku = 4!× 2× 2× 3 = 288.

✷

Remark How many 9× 9 distinct Sudoku puzzles are there? There are Many. Many more than you canimagine! There are

6,670,903,752,021,072,936,960 distinct Sudoku puzzles.

And, how many of them are essentially distinct (inequivalent)? We can create many Sudoku puzzles outof a given one by: transposing / flipping it, interchanging “stacks”, interchanging “bands”, interchangingcolumns in a stack, interchanging rows in a band, relabeling of the digits, rotating it. All these changescreate group actions on the set of all Sudoku. In fact,

5,472,730,538 of the above are essentially distinct.

Felgenhauer & Jarvis, Mathematics of Sudoku I (2006), Russel & Jarvis, Mathematics of Sudoku II (2006)

18

� Example 1.33 (The sock drawer ⋆⋆⋆ )

A drawer contains red socks and black socks. When two socks are drawn at random, the probability that both

are red is1

2.

(a) How small can the number of socks in the drawer be?

(b) How small if the number of black socks is even?

Solution This example was taken from the book “Fifty Challenging Problems in Probability with Solutions,

by Frederick Mosteller”.

Let there be r red and b black socks. The probability of the first sock’s being red is r/(r+ b); and if thefirst sock is red, the probability of the second’s being red now that a red has been removed is (r−1)/(r+b−1).Then we require the probability that both are red to be 1/2, or

r

r + b× r − 1

r + b− 1=

1

2.

One could just start with b = 1 and try successive values of r, then go to b = 2 and try again, and so on.That would get the answers quickly. Or we could play along with a little more mathematics. Notice that

r

r + b>

r − 1

r + b− 1, for b > 0.

Therefore we can create the inequalities (how?)( r

r + b

)2

>1

2>

( r − 1

r + b− 1

)2

.

Taking square roots, we have, for r > 1,

r

r + b>

1√2

>r − 1

r + b− 1.

From the first inequality we get

r >1√2

(r + b

)

or

r >1√2− 1

b = (√2 + 1) b.

From the second we get(√2 + 1) b > r − 1

or all told(√2 + 1) b+ 1 > r > (

√2 + 1) b.

(a) For b = 1, r must be greater than 2.414 and less than 3.414, and so the candidate is r = 3. Forr = 3, b = 1, we get

P (2 red socks) =3

4× 2

3=

1

2.

And so the smallest number of socks is 4.

(b) Beyond this we investigate even values of b.

b r is between eligible r P (2 red socks)

2 5.8, 4.8 55

7× 4

66= 1

2

4 10.7, 9.7 1010

14× 9

136= 1

2

6 15.5, 14.5 1515

21× 14

20=

1

2

And so 21 socks is the smallest number when b is even. ✷

19


� Example 1.34 (Mark Six ⋆⋆⋆ )

In Mark Six, 6 numbers are drawn out of a possible 49 (the “extra number” has been neglected here).

(a) What is the number of all possible outcomes?

(b) In which how many consisting of consecutive three numbers?

Solution

(a)

The number of all possible outcomes = C496 = 13,983,816.

Note. Before we can answer part (b) we have to state clearly the meaning of the keyword in thisexample. The following are some examples of “consecutive-three”:

{

22, 27, 28, 29, 42, 46}

,{

22, 23, 28, 41, 42, 43}

,{

22, 38, 39, 41, 42, 43}

whereas some examples of “not-consecutive-three”:

{

26, 27, 28, 29, 42, 46}

,{

25, 26, 27, 28, 29, 48}

,{

26, 27, 28, 41, 42, 43}

.

(b) Let x1, x2, x3, x4, x5, x6 be the drawn numbers such that

1 6 x1 < x2 < x3 < x4 < x5 < x6 6 49.

We further define

x0 := 0, x7 := 50 and ci := xi − xi−1 for i = 1, 2, 3, 4, 5, 6, 7.

Note that in the above definition of ci, for examples, c2 is the difference between the smallest two drawnnumbers, c6 is the difference between the largest two drawn numbers, basically, ci is the differencebetween two consecutive drawn numbers. Again by the above definitions,

c1 + c2 + c3 + c4 + c5 + c6 + c7 = 50, where ci are all positive integers.

Now we consider the following two cases: Case 1: with a consecutive three and without a consecutivetwo; Case 2: with a consecutive three and a consecutive two.

Case 1: with a consecutive three and without a consecutive two.

(i) x1, x2, x3 are consecutive. Then

c2 = 1, c3 = 1, c4, c5, c6 > 1 and this sub-case is denoted by[1 1 >1 >1 >1

].

(ii) x2, x3, x4 are consecutive. Then

c3 = 1, c4 = 1, c2, c5, c6 > 1 and this sub-case is denoted by[>1 1 1 >1 >1

].

(iii) x3, x4, x5 are consecutive. Then

c4 = 1, c5 = 1, c2, c3, c6 > 1 and this sub-case is denoted by[>1 >1 1 1 >1

].

(iv) x4, x5, x6 are consecutive. Then

c5 = 1, c6 = 1, c2, c3, c4 > 1 and this sub-case is denoted by[>1 >1 >1 1 1

].

20

The above four sub-cases are symmetric so that the numbers of possible solutions for each sub-caseare the same. Take (i) for an example in the following (c2 = c3 = 1). Denoting

c′1 = c1, c′4 = c4 − 1, c′5 = c5 − 1, c′6 = c6 − 1, c′7 = c7

simplifies the equation c1 + c2 + c3 + c4 + c5 + c6 + c7 = 50 to

(c′1)+(1)+(1)+(c′4 + 1

)+(c′5 + 1

)+(c′6 + 1

)+(c′7)

= 50

orc′1 + c′4 + c′5 + c′6 + c′7 = 45, where c′i are all positive integers.

By Example 1.24 (page 13) we know that the number of possible solutions for this simplified equationis C45−1

5−1 = C444 and hence the total number of possible solutions for Case 1 is

4× C444 .

Case 2: with a consecutive three and a consecutive two. There are a total of 6 sub-cases which aresymmetric.

Sub-case An example for reference

[1 1 >1 1 >1

] {

7, 8, 9, 19, 20, 36}

[1 1 >1 >1 1

] {

9, 10, 11, 29, 42, 43}

[>1 1 1 >1 1

] {

12, 17, 18, 19, 22, 23}

[1 >1 1 1 >1

] {

14, 15, 18, 19, 20, 42}

[1 >1 >1 1 1

] {

18, 19, 26, 31, 32, 33}

[>1 1 >1 1 1

] {

22, 38, 39, 41, 42, 43}

Take the first sub-case for an example in the following (c2 = c3 = c5 = 1). Denoting

c′1 = c1, c′4 = c4 − 1, c′6 = c6 − 1, c′7 = c7

simplifies the equation c1 + c2 + c3 + c4 + c5 + c6 + c7 = 50 to

c′1 + c′4 + c′6 + c′7 = 45, where c′i are all positive integers.

By Example 1.24 (page 13) we know that the number of possible solutions for this simplified equationis C45−1

4−1 = C443 and hence the total number of possible solutions for Case 2 is

6× C443 .

Combining Case 1 and Case 2, the total number of all combinations is given by

4× C444 + 6× C44

3 = 543,004 + 79,464

= 622,468.

✷

21


� Example 1.35 (Hong Kong mahjong ⋆⋆⋆ )

What is the number of different combinations of “13 wans” in a Hong Kong mahjong game?

Solution The question is equivalent to counting how many integer-solutions (k1, k2, k3, k4, k5, k6, k7, k8, k9)satisfying the following equation

k1 + k2 + k3 + k4 + k5 + k6 + k7 + k8 + k9 = 13,

where all ki are non-negative integers such that 0 6 ki 6 4. Examples of some integer-solutions include:(compare the third one with the figure above)

(4, 4, 4, 1, 0, 0, 0, 0, 0

),

(2, 2, 2, 2, 2, 2, 1, 0, 0

),

(3, 1, 1, 1, 1, 1, 1, 1, 3

).

Below we consider the mutually exclusive and exhaustive “classes”. For each class we may count the numberof permutations with repetitions:

Class No. Integer-solution No. of permutations

with repetitions

1.(4, 4, 4, 1, 0, 0, 0, 0, 0

) 9!

3! 1! 5!= 504

2.(4, 4, 3, 2, 0, 0, 0, 0, 0

) 9!

2! 1! 1! 5!= 1,512

3.(4, 4, 3, 1, 1, 0, 0, 0, 0

) 9!

2! 1! 2! 4!= 3,780

4.(4, 4, 2, 2, 1, 0, 0, 0, 0

) 9!

2! 2! 1! 4!= 3,780

5.(4, 4, 2, 1, 1, 1, 0, 0, 0

) 9!

2! 1! 3! 3!= 5,040

6.(4, 4, 1, 1, 1, 1, 1, 0, 0

) 9!

2! 5! 2!= 756

7.(4, 3, 3, 3, 0, 0, 0, 0, 0

) 9!

1! 3! 5!= 504

8.(4, 3, 3, 2, 1, 0, 0, 0, 0

) 9!

1! 2! 1! 1! 4!= 7,560

9.(4, 3, 3, 1, 1, 1, 0, 0, 0

) 9!

1! 2! 3! 3!= 5,040

10.(4, 3, 2, 2, 2, 0, 0, 0, 0

) 9!

1! 1! 3! 4!= 2,520

11.(4, 3, 2, 2, 1, 1, 0, 0, 0

) 9!

1! 1! 2! 2! 3!= 15,120

12.(4, 3, 2, 1, 1, 1, 1, 0, 0

) 9!

1! 1! 1! 4! 2!= 7,560

13.(4, 3, 1, 1, 1, 1, 1, 1, 0

) 9!

1! 1! 6! 1!= 504

14.(4, 2, 2, 2, 2, 1, 0, 0, 0

) 9!

1! 4! 1! 3!= 2,520

22

Class No. Integer-solution No. of permutations

with repetitions

15.(4, 2, 2, 2, 1, 1, 1, 0, 0

) 9!

1! 3! 3! 2!= 5,040

16.(4, 2, 2, 1, 1, 1, 1, 1, 0

) 9!

1! 2! 5! 1!= 1,512

17.(4, 2, 1, 1, 1, 1, 1, 1, 1

) 9!

1! 1! 7!= 72

18.(3, 3, 3, 3, 1, 0, 0, 0, 0

) 9!

4! 1! 4!= 630

19.(3, 3, 3, 2, 2, 0, 0, 0, 0

) 9!

3! 2! 4!= 1,260

20.(3, 3, 3, 2, 1, 1, 0, 0, 0

) 9!

3! 1! 2! 3!= 5,040

21.(3, 3, 3, 1, 1, 1, 1, 0, 0

) 9!

3! 4! 2!= 1,260

22.(3, 3, 2, 2, 2, 1, 0, 0, 0

) 9!

2! 3! 1! 3!= 5,040

23.(3, 3, 2, 2, 1, 1, 1, 0, 0

) 9!

2! 2! 3! 2!= 7,560

24.(3, 3, 2, 1, 1, 1, 1, 1, 0

) 9!

2! 1! 5! 1!= 1,512

25.(3, 3, 1, 1, 1, 1, 1, 1, 1

) 9!

2! 7!= 36

26.(3, 2, 2, 2, 2, 2, 0, 0, 0

) 9!

1! 5! 3!= 504

27.(3, 2, 2, 2, 2, 1, 1, 0, 0

) 9!

1! 4! 2! 2!= 3,780

28.(3, 2, 2, 2, 1, 1, 1, 1, 0

) 9!

1! 3! 4! 1!= 2,520

29.(3, 2, 2, 1, 1, 1, 1, 1, 1

) 9!

1! 2! 6!= 252

30.(2, 2, 2, 2, 2, 2, 1, 0, 0

) 9!

6! 1! 2!= 252

31.(2, 2, 2, 2, 2, 1, 1, 1, 0

) 9!

5! 3! 1!= 504

32.(2, 2, 2, 2, 1, 1, 1, 1, 1

) 9!

4! 5!= 126

Total = 93,600

The total number of permutations (= 93,600) gives the answer to the original question.✷

Remark

C3613 = 2,310,789,600 ≈ 2.31 billion

is simply a wrong answer (why?). Think carefully what is C3613 and compare its value with 93,600.

23


Alternative method

Class No. 4 of a kind 3 of a kind a pair a single No. of combinations

1. 3 0 0 1 C93 × C6

1 = 504

2. 2 1 1 0 C92 × C7

1 × C61 = 1,512

3. 2 1 0 2 C92 × C7

1 × C62 = 3,780

4. 2 0 2 1 C92 × C7

2 × C51 = 3,780

5. 2 0 1 3 C92 × C7

1 × C63 = 5,040

6. 2 0 0 5 C92 × C7

5 = 756

7. 1 3 0 0 C91 × C8

3 = 504

8. 1 2 1 1 C91 × C8

2 × C61 × C5

1 = 7,560

9. 1 2 0 3 C91 × C8

2 × C63 = 5,040

10. 1 1 3 0 C91 × C8

1 × C73 = 2,520

11. 1 1 2 2 C91 × C8

1 × C72 × C5

2 = 15,120

12. 1 1 1 4 C91 × C8

1 × C71 × C6

4 = 7,560

13. 1 1 0 6 C91 × C8

1 × C76 = 504

14. 1 0 4 1 C91 × C8

4 × C41 = 2,520

15. 1 0 3 3 C91 × C8

3 × C53 = 5,040

16. 1 0 2 5 C91 × C8

2 × C65 = 1,512

17. 1 0 1 7 C91 × C8

1 × C77 = 72

18. 0 4 0 1 C94 × C5

1 = 630

19. 0 3 2 0 C93 × C6

2 = 1,260

20. 0 3 1 2 C93 × C6

1 × C52 = 5,040

21. 0 3 0 4 C93 × C6

4 = 1,260

22. 0 2 3 1 C92 × C7

3 × C41 = 5,040

23. 0 2 2 3 C92 × C7

2 × C53 = 7,560

24. 0 2 1 5 C92 × C7

1 × C65 = 1,512

25. 0 2 0 7 C92 × C7

7 = 36

26. 0 1 5 0 C91 × C8

5 = 504

27. 0 1 4 2 C91 × C8

4 × C42 = 3,780

28. 0 1 3 4 C91 × C8

3 × C54 = 2,520

29. 0 1 2 6 C91 × C8

2 × C66 = 252

30. 0 0 6 1 C96 × C3

1 = 252

31. 0 0 5 3 C95 × C4

3 = 504

32. 0 0 4 5 C94 × C5

5 = 126

Total = 93,600

The total number of combinations (= 93,600) gives the answer to the original question. ✷

24

Chapter 2

Axioms of Probability

� Example 2.1 (Simple probability ⋆ )

A bookshelf contains 3 German books, 4 French books and 5 Chinese books in a row. Each book is differentfrom one another. What is the probability that no two Chinese books must be next to each other?

Solution In Example 1.7 (page 5), the total number of permutations has been calculated as

(

8× 7× 6× 5× 4)

× 7! = 33,868,800.

The required probability is therefore

33868800

12!=

33868800

479001600≈ 0.0707.

The probability is approximately equal to 7.07%. ✷

Remark The total number of arrangements of the books (≈ 33.9 million) sounds a huge number but,interestingly, the corresponding probability is however a very small number.


Two German, three French and four Chinese are to be seated in a row. What is the probability that a Chinesewill not sit next to another Chinese but the two German must sit next to each other?

Solution In Example 1.8 (page 5), the total number of permutations has been calculated as

(

5× 4× 3× 2)

× 4!× 2! = 5,760.

The required probability is therefore

5760

9!=

5760

362880=

1

63≈ 0.0159.



Six fair dices are rolled. What is the probability of getting three pairs? (“Three pairs” means for example “apair of 1, a pair of 2 and a pair of 5”. )

25

2. Axioms of Probability

Solution In Example 1.22 (page 11), the total number of different arrangements has been calculated as

C63 × 6!

2! 2! 2!= 1800.

The required probability is given by

P (Three pairs) =1800

66=

25

648≈ 0.03858.

✷

Remark An alternative way of computing the probability is

P (Three pairs) =C6

3 × C62 · C4

2 · C22

66≈ 0.03858.

� Example 2.4 (Round table ⋆ )

12 people are randomly seated at a round table. What is the probability that John and Mary will sit next toeach other?

Solution In Example 1.11 (page 7), the total number of seating arrangements has been calculated as

10! + 10! = 10!× 2.

The required probability is10!× 2

11!=

2

11≈ 0.1818.

✷

Remark Alternatively, we may use a more elegant method in the following. Assume the position of Johnis fixed, then there are 11 available seats for Mary in which only two of them will meet the requirement ofthe question. We can quickly write the required probability as

2

11≈ 0.1818.


2 red balls and 13 green balls are randomly put into five identical boxes, so that each box contains 3 balls.Find the probability that the 2 red balls are put in different boxes.

Solution Denote R as a red ball and X as a ball of any color (including red).

{

R,X,X} {

X,X,X} {

X,X,X} {

X,X,X} {

X,X,X}

The required probability is given by12

14=

6

7≈ 0.8571.

✷


A fair six-sided die is tossed n times. Let P (n) be the probability that the total number of times of obtaininga “2” in the n tosses is an odd number. Find P (1), P (2) and P (3).

26

Solution

P (1) = P (toss once and one “2”) =1

6≈ 0.1667,

P (2) = P (toss twice and one “2”)

=1

6× 5

6+

5

6× 1

6=

5

18≈ 0.2778,

P (3) = P (toss three times and one “2”) + P (toss three times and three “2”s)

= C31

(1

6

) (5

6

)2+(1

6

)3

=75

216+

1

216=

19

54≈ 0.3519.

✷


Roll six fair dices. What is the probability that the outcome of the rolled dices is an “one pair”? (Forexample, {2, 2, 3, 4, 5, 6} is called an “one pair”, or generally in symbols {a, a, b, c, d, e}, where a, b, c, d, eare all unequal.)

Solution Refer to Example 1.22 (page 11) for the class “One pair”. The required probability is given by

(

C61 × C5

4

)

×(

C62 × 4!

)

66=

10800

46656=

25

108≈ 0.2315.

✷


We are playing with a selected deck of 16 poker cards, as shown below:

J♥ A♦ J♣ A♠

Q♥ 2♦ Q♣ 2♠

K♥ 3♦ K♣ 3♠

A♥ 4♦ A♣ 4♠

Let H be the event that the drawn card is a heart (♥); D be the event that the drawn card is a diamond (♦);A be the event that the drawn card is an ace (A).

(a) What is the probability P (H ∩ D)?

(b) What is the probability P (H ∩ A)?

(c) What is the probability P (H ∪ D)?

(d) What is the probability P (H ∪ A)?

(e) Are H and D independent events? Why?

(f) Are H and A independent events? Why?

(g) If three cards are drawn from the deck, one at a time, what is the probability that an ace will appearfor the first time at the third drawn?

Solution

(a) P (H ∩ D) = 0.

(b) P (H ∩ A) =1

16.

(c) P (H ∪ D) = P (H) + P (D) =1

4+

1

4=

1

2.

27


(d) P (H ∪ A) = P (H) + P (A)− P (H ∩ A) =1

4+

1

4− 1

16=

7

16.

(e) The events H and D are not independent because they are mutually exclusive.

(f) The events H and A are independent because P (H ∩ A) = P (H)× P (A).

(g) P (Ace appears first time at the third drawn) =12

16× 11

15× 4

14=

11

70.

✷


We are playing with a selected deck of 16 poker cards, as shown below:

♠A ♥3 ♦J ♣A

♠ 2 ♥ 5 ♦ Q ♣ 3

♠ 4 ♥ 7 ♦ K ♣ 5

♠ 6 ♥ 9 ♦ A ♣ 7

If three cards are randomly drawn from the deck, one at a time, what is the probability that

(a) one and only one card is an ace?

(b) at least one card is an ace?

(c) exactly two cards are spades (♠)?

Solution

(a)

P (one and only one ace) =3

16× 13

15× 12

14× 3 =

1404

3360=

117

280≈ 0.4179.

(b)

P (at least one ace) = 1− P (no ace) = 1− 13

16× 12

15× 11

14=

1644

3360=

137

280≈ 0.4893.

(c)

P (exactly two spades) =4

16× 3

15× 12

14× 3 =

432

3360=

9

70≈ 0.1286.

✷

Remark Alternatively,

(a)

P (one and only one ace) =C3

1 × C132

C163

=234

560=

117

280.

(b)

P (at least one ace) = 1− P (no ace) = 1− C30 × C13

3

C163

= 1− 286

560=

137

280.

(c)

P (exactly two spades) =C4

2 × C121

C163

=72

560=

9

70.

� Example 2.10 (Simple probability ⋆⋆ )

The events E, F and G are such that E is independent of F , E is independent of G, and

P (E) =5

9, P (F ) =

2

5, P (G) =

1

2, P (E ∩ F ∩G) =

1

4, P (E ∩ F ∩G) =

1

6,

where E means the complement of E. Find

(a) P (E ∩ F ).

28

(b) P (E ∩ F ∩G).

(c) P (F ∩G).

(d) P (F∣∣ G).

(e) P (E∣∣ F ∩G).

(f) P (E ∩ F∣∣ E ∩G).

(g) Are E ∩ F and E ∩G independent? Why?

Solution

(a)

P (E ∩ F ) = P (E)× P (F ) =(

1− P (E))

×(

1− P (F ))

=(

1− 5

9

)(

1− 2

5

)

=4

15.

(b)

P (E ∩ F ∩G) = P (E ∩ F )− P (E ∩ F ∩G) =4

15− 1

6=

1

10.

(c)

P (F ∩G) = P (E ∩ F ∩G) + P (E ∩ F ∩G)

= P (E ∩ F ∩G) +[P (E ∩G)− P (E ∩ F ∩G)

]

= P (E ∩ F ∩G) +(1− P (E)

)× P (G)− P (E ∩ F ∩G)

]

=1

4+

4

9× 1

2− 1

6=

11

36.

(d)

P (F∣∣ G) =

P (F ∩G)

P (G)=

11

361

2

=11

18.

(e)

P (E∣∣ F ∩G) =

P (E ∩ F ∩G)

P (F ∩G)=

1

411

36

=9

11.

(f)

P (E ∩ F∣∣ E ∩G) =

P (E ∩ F ∩G)

P (E ∩G)=

P (E ∩ F ∩G)

P (E)× P (G)=

1

45

9× 1

2

=9

10.

(g) E ∩ F and E ∩G are not independent because

P (E ∩ F∣∣ E ∩G) =

9

10,

whereas

P (E ∩ F ) = P (E)× P (F ) =5

9× 2

5=

2

96= 9

10.

✷

Remark If “E and F are independent” and “E and G are independent”, F and G are not necessarily

independent. Note that P (F ∩ G) =11

366= 2

5× 1

2= P (F ) × P (G). F and G are dependent in this

question.

29


� Example 2.11 (Independent vs mutually exclusive ⋆ )

(a) Let A and B be two events of a sample space. Prove that

(i) P (A ∪B) 6 P (A) + P (B). (ii) P (A ∪B) > 1− P (Ac)− P (Bc).

(b) Prove that if two events A and B with nonzero probabilities are mutually exclusive, they are notindependent.

(c) Assume that P (A) = a and P (B) = b. Find the probabilities P (Ac ∩ B) and P (Ac | B) in termsof a and b for each of the following cases:

(i) A and B are mutually exclusive. (ii) A and B are independent.

(d) Consider an experiment of tossing two fair dices of different colors. Let A be the event that the outcomeon the red die is odd, B be that the outcome on the green die is odd and C be that the sum of the twooutcomes is odd. Prove that

(i) A, B, C are pairwise independent. (ii) A, B, C are not independent.

Solution

(a) (i) By the inclusion-exclusion principle,

P (A ∪B) = P (A) + P (B)− P (A ∩B) 6 P (A) + P (B),

since P (A ∩B) > 0.

(ii) By the inclusion-exclusion principle,

P (A ∪B) = P (A) + P (B)− P (A ∩B)

= 1− P (Ac) + P (B)− P (A ∩B)

= 1− P (Ac) + P (B\A)

> 1− P (Ac) > 1− P (Ac)− P (Bc).

(b) If A ∩B = ∅, P (A ∩B) = 0. However, P (A) · P (B) 6= 0. They are not independent.

(c) (i) A ∩B = ∅. Thus,

P (Ac ∩B) = P (B) = b, P (Ac∣∣ B) = 1.

(ii) Ac and B are also independent. Thus,

P (Ac ∩B) = P (Ac)P (B) = (1− a) b, P (Ac∣∣ B) = P (Ac) = 1− a.

(d) (i) We have P (A) = 18/36 = 1/2, P (B) = 18/36 = 1/2, P (C) = 18/36 = 1/2, P (A ∩ B) =9/36 = 1/4, P (B ∩ C) = 9/36 = 1/4, and P (C ∩A) = 9/36 = 1/4. Thus we have

P (A ∩B) = P (A)P (B), P (B ∩ C) = P (B)P (C), P (C ∩A) = P (C)P (A).

This shows that the events A, B, C are pairwise independent.

(ii) We haveP (A ∩B ∩ C) = P (∅) 6= P (A)P (B)P (C).

Thus the events A, B, C are not independent.

✷

30


If three married couples are seated at random at a round table, what is the probability that no wife sits nextto her husband?

Solution Denote the events: E = first couple sitting together, F = second couple sitting together andG = third couple sitting together. Our target is to look for the probability

1− P (E ∪ F ∪G).

In order to find P (E ∪ F ∪G) we may use “the inclusion-exclusion principle for three sets” such that

P (E ∪ F ∪G) = P (E) + P (F ) + P (G)− P (E ∩ F )− P (E ∩G)− P (F ∩G) + P (E ∩ F ∩G).

Note that the events E, F and G are simply indistinguishable and will be treated as symmetric so that

P (E) = P (F ) = P (G) and P (E ∩ F ) = P (E ∩G) = P (F ∩G).

We may deduce that (Do you know how to find them? Ask me if you cannot.)

P (E) =n(E)

n(S)=

4!× 2

5!,

P (E ∩ F ) =n(E ∩ F )

n(S)=

3!× 2× 2

5!,

P (E ∩ F ∩G) =n(E ∩ F ∩G)

n(S)=

2!× 2× 2× 2

5!.

The required probability = 1−[

3× P (E)− 3× P (E ∩ F ) + P (E ∩ F ∩G)]

= 1−[

3× 4!× 2

5!− 3× 3!× 22

5!+

2!× 23

5!

]

= 1− 6

5+

3

5− 2

15

=4

15≈ 0.2667.

✷


Assume that 2 married couples and one single man (five people in total) are seated randomly at a roundtable. what is the probability that no wife sits next to her husband?

Solution In Example 1.12 (page 7), the total number of seating arrangements has been calculated as

2× 2 + 2× 2 = 8.

The required probability is8

4!=

8

24=

1

3≈ 0.3333.

✷

Remark Based on the graphical method that we have used in Example 1.12 (page 7), it is obvious thatthis graphical method can also be applied in Example 2.12. Can you successfully divide the cases and drawthe corresponding figures again to solve the problem? Try it out and ask me if you find any difficulties. Infact there is one geometric method, namely the “Circle-and-Chord” method, which requires you to find thenumbers a and b such that

the required probability =1

5× a+

2

5× b.

This is an interesting method. Let me know if you want the details. Ans: a =4

6, b =

2

6

31


� Example 2.14 (Birthday problem ⋆ )

Suppose that there are n people in a room and that the birthdays of these people were randomly chosen fromthe 365 days of the year. Let p(n) denote the probability that there is at least one person in the room whosebirthday is on 1-st October.

(a) Find an expression of p(n) in terms of n.

(b) Show that if there are at least 253 people in the room, then it is more likely than not that someone willhave their birthday on 1-st October, i.e., p(n) > 0.5 whenever n > 253.

(c) For what values of n do we have p(n) > 0.9?

Solution

(a)p(n) = 1− P

(All n birthdays are not 1st October

)

= 1−(364

365

)n

.

(b) From the above expression for p(n), it follows that if n increases (where364

365< 1), then

(364

365

)n

decreases and hence p(n) is an increasing function of n. Also, p(253) ≈ 0.5005. Thus,

if n > 253, then p(n) > 0.5.

(c) We first solve the equation p(n) = q for n in terms of q as follows.

1−(364

365

)n

= q,

(364

365

)n

= 1− q,

n ln(364

365

)

= ln(1− q),

n =ln(1− q)

ln(364/365)≈ −364.5 ln(1− q).

By the above expression,

p(n) > 0.9 if n > −364.5 ln(1− 0.9) ≈ 839.29.

So,if n > 840, then p(n) > 0.9.

✷

� Example 2.15 (Inclusion-exclusion principle ⋆⋆ )

Five balls are randomly chosen, without replacement, from an urn that contains 5 red, 6 white and 7 blueballs. Find the probability that at least one ball of each color is chosen.

Solution Let R, W and B denote the events that there are no red, no white and no blue balls chosen,respectively. By the inclusion-exclusion principle,

P (R ∪W ∪B) = P (R) + P (W ) + P (B)− P (R ∩W )− P (R ∩B)− P (W ∩B) + P (R ∩W ∩B)

=C13

5

C185

+C12

5

C185

+C11

5

C185

− C75

C185

− C65

C185

− C55

C185

=359

1224≈ 0.2933.

Hence,P (at least one ball of each color is chosen) ≈ 1− 0.2933 = 0.7067.

✷

32

� Example 2.16 (Derangements ⋆ )

Suppose each person in a group of 3 friends brings a gift to a party. The 3 gifts will be distributed so thateach person receives one gift. Find the probability that no person will receive his/her own gift.

Solution There are a total of 3! = 6 permutations for distributing the gifts. However there are only 2derangements:

Person A Person B Person C

Gift B Gift C Gift A

Gift C Gift A Gift B


3!=

1

3≈ 0.3333.

✷

� Example 2.17 (Derangements ⋆⋆ )

In a special remedial class, there are 4 students, namely A, B, C and D. The students have taken a shorttest. The class lecturer wants to let the students grade each other’s test. Find the probability that no studentreceives his/her own test for grading.

Solution There are a total of 4! = 24 possible permutations for handling the grading. There are only 9derangements:

Student A Student B Student C Student D Outcome

Test B Test A Test D Test C BADC

Test B Test C Test D Test A BCDA

Test B Test D Test A Test C BDAC

Test C Test A Test D Test B CADB

Test C Test D Test B Test A CDBA

Test C Test D Test A Test B CDAB

Test D Test A Test B Test C DABC

Test D Test C Test B Test A DCBA

Test D Test C Test A Test B DCAB


4!=

3

8= 0.375.

✷

Remark Let D(n) be the number of derangement where n is any positive integer. It is natural to writeD(1) = 0 and D(2) = 1. Furthermore, by Example 2.16 and Example 2.17 we know that

D(3) = 2 and D(4) = 9.

We would like to know if there is a general explicit formula for D(n). In fact, by mathematical induction,we can deduce the recursive relation:

D(n)− nD(n− 1) = (−1)n,

where n = 2, 3, 4, · · · . Based on this recursive relation we should be able to recursively deduce any number

of derangement. Note (without proof) that as n → ∞, the probabilityD(n)

n!approaches e−1 ≈ 0.3679.

33


� Example 2.18 (Challenging Problem: Coin in Square)

In a carnival game a player throws a coin from a distance of about 5 feet onto the surface of a table ruled in1.5-inch squares. If the coin (1-inch in diameter) falls entirely inside a square, the player wins a large liondoll; otherwise he loses the coin. If the coin lands on the table, what is the probability to win? What if thesquares were made smaller by merely thickening the lines (from negligible width to width of 0.1 inches).

Answer: 19

� Example 2.19 (Challenging Problem: Lengths of Random Chord)

If a chord is selected at random on a fixed circle, what is the probability that its length is greater than theradius of the circle?

Answer: 0.667 or 0.866 or 0.75 depending on the notion of “at random”

� Example 2.20 (Challenging Problem: Drunk Man Walk)

From where he stands, one step toward the cliff would send the drunk man over the edge. He takes randomsteps, either toward or away from the cliff. At any step his probability of taking a step away is 2

3, of a step

toward the cliff is 13. What is the chance of escaping the cliff after five walking steps?

Answer: 136243

≈ 0.560

� Example 2.21 (Challenging Problem: Random Quadratic Equations)

What is the probability that the quadratic equation (where a, b are any independent real numbers)

x2 + 2ax+ b = 0

has real roots?

Answer: P (roots are real) ≈ 1

� Example 2.22 (Challenging Problem: Needle Lies Across a Line)

A large table has been ruled with a set of parallel lines spaced d units apart. A needle of length l (smallerthan d) is tossed randomly on the table. What is the probability that when it comes to rest it crosses a line?

Answer:2l

πd≈ 0.637×

l

d

34

Chapter 3

Conditional Probability and

Independence

� Example 3.1 (Reduced sample space ⋆ )

A bag contains 4 white balls and 3 black balls. Two balls are randomly drawn from the bag without replace-ment. Show that the second drawn ball is white has the same probability as the first drawn ball is white.

Solution

P (2nd ball is white) = P (2nd is white ∩ 1st is white)

+P (2nd is white ∩ 1st is black)

= P (2nd is white∣∣ 1st is white)× P (1st is white)

+P (2nd is white∣∣ 1st is black)× P (1st is black)

=3

6· 47+

4

6· 37

=4

7

= P (1st ball is white).✷

Remark We may generalize the given statement: A bag contains m white balls and n black balls, wherem,n > 2. Two balls are randomly drawn from the bag without replacement. Show that the second drawn ballis white has the same probability as the first drawn ball is white. Is the statement still true? True

� Example 3.2 (Reduced sample space ⋆ )

One bag contains 4 white balls and 3 black balls, and a second bag contains 3 white balls and 5 black balls.One ball is drawn from the first bag and placed unseen in the second bag. What is the probability that a ballnow drawn from the second bag is black?

Solution The problem has to be divided in two cases. It follows that

P (2nd is black) = P (2nd is black ∩ 1st is white)

+P (2nd is black ∩ 1st is black)

= P (2nd is black∣∣ 1st is white)× P (1st is white)

+P (2nd is black∣∣ 1st is black)× P (1st is black)

=5

9· 47+

6

9· 37

=38

63≈ 0.603.


35

3. Conditional Probability and Independence

� Example 3.3 (Conditional probability ⋆ )

Two fair dice are rolled and the outcome is kept secret. You are interested in the sum shown. Suppose youhave been told that at least one die shows 1. How likely is it now that the sum will be 5 or more?

Solution When two dice are rolled, the set of all possible outcomes is

S ={(i, j) : i, j = 1, 2, · · · , 6

}.

Denote the events

A ={sum will be 5 or more

}={(i, j) : i+ j > 5

}

and

B ={one die shows 1

}={(1, j) : j = 1, 2, · · · , 6

}∪{(i, 1) : i = 1, 2, · · · , 6

}.

Then,

A ∩B ={(1, 4), (1, 5), (1, 6), (4, 1), (5, 1), (6, 1)

}.

The required probability is

P (A∣∣ B) =

P (A ∩B)

P (B)=

n(A ∩B)

n(B)=

6

11.

✷

� Example 3.4 (Conditional probability ⋆ )

The probability that a married man watches a movie is 0.4 and the probability that a married woman watchesthe movie is 0.5. The probability that a man watches the movie, given that his wife does is 0.7. Find theprobability that a wife watches the movie given that her husband does not?

Solution Let A be the event that a married man watches the movie, B be that a married woman watchesthe movie. Now,

P (A) = 0.4, P (B) = 0.5 and P (A∣∣ B) = 0.7.

The probability that a wife watches the movie given that her husband does not is given by

P (B∣∣ Ac) =

P (Ac ∩B)

P (Ac),

where

P (Ac ∩B) = P (Ac∣∣ B) · P (B) = (1− 0.7)(0.5) = 0.15,

and

P (Ac) = 1− 0.4 = 0.6.

Hence,

P (B∣∣ Ac) =

P (Ac ∩B)

P (Ac)=

0.15

0.6= 0.25.

✷

� Example 3.5 (Conditional probability ⋆⋆ )

(a) A fair die with faces 1, 2, and 3 colored green and faces 4, 5 and 6 colored red is tossed once. If youcan see that the die has landed green face up (but cannot see the actual number shown), how likely willit be that the outcome is an even number?

(b) Suppose further that the die in (a) is biased with

P (1) = P (3) = P (5) =1

9, P (2) = P (4) = P (6) =

2

9.

What is the probability that the outcome is an even number given that the die lands green face up?

Solution

36

(a) Denote the events

A = {even numbers} = {2, 4, 6}, and B = {colored green} = {1, 2, 3}.

Then A ∩B = {2}. The required probability is given by the conditional probability that

P (A∣∣ B) =

P (A ∩B)

P (B)=

n(A ∩B)

n(B)=

1

3.

(b) The outcomes in (b) are not likely equally to occur. The required probability is again given by theconditional probability

P (A∣∣ B) =

P (A ∩B)

P (B)

that, however, the individual probabilities P (A ∩B) and P (B) have to be computed first. Now,

P (A ∩B) = P (2) =2

9

and

P (B) = P (1) + P (2) + P (3) =1

9+

2

9+

1

9=

4

9.

The required probability is

P (A∣∣ B) =

P (A ∩B)

P (B)=

2/9

4/9=

1

2.

✷


In an university, the academic staff of three research groups (A, B and C) are individually invited to applyfor a research grant project. Group A has 2 staff, B has 2 and C has 3. It is assumed that all staff decideindependently whether or not to apply. Staff of groups A, B and C apply with respective probabilities 1/2,1/4 and 1/5. Given that there is just one application in total, find the probability that it comes from a staffof group B.

Solution

P (1 from B∣∣ 1 in total)

=P (1 from B and 1 in total)

P (1 in total)

=P (1 from B, 0 from A and C)

P (1 from A, 0 from B and C) + P (1 from B, 0 from A and C) + P (1 from C, 0 from A and B)

=C2

1 (14)( 3

4)× ( 1

2)2 × ( 4

5)3

C21 (

12)( 1

2)× ( 3

4)2 × ( 4

5)3 + C2

1 (14)( 3

4)× ( 1

2)2 × ( 4

5)3 + C3

1 (15)( 4

5)2 × ( 1

2)2 × ( 3

4)2

=(2)(

1

4)(4

5)

(2)(3

4)(4

5) + (2)(

1

4)(4

5) + (3)(

1

5)(3

4)

=8

24 + 8 + 9

=8

41≈ 0.195.


37



An electrical system consists of four components as illustrated in the figure below. The system works ifcomponents A and B work and either of the components C or D work. The reliability (probability of working)of each component is also shown in the figure below. Find the probability that the component D does notwork, given that the entire system works. Assume that four components work independently.

0.8

A

0.7

B

0.9

C

0.6

D

Solution The probability that the entire system works can be calculated as follows:

P (System works) = P(

A ∩B ∩ (C ∪D))

= P (A)× P (B)× P (C ∪D)

= P (A)× P (B)×(

P (C) + P (D)− P (C ∩D))

= P (A)× P (B)×(

P (C) + P (D)− P (C)× P (D))

= 0.8× 0.7×(0.9 + 0.6− 0.9× 0.6

)

= 0.5376.

To calculate the conditional probability as required,

P (Component D does not work∣∣ System works)

=P (System works but Component D does not work)

P (System works)

=P (A ∩B ∩ C ∩D)

P (System works)

=0.8× 0.7× 0.9× 0.4

0.5376

= 0.375.

✷

� Example 3.8 (Bayes’ formula ⋆ )

In a certain population, there are equal number of men and women. 4% of men are colorblind while 2% ofwomen are colorblind. If a colorblind person is chosen at random, find the probability that this person is aman.

38

Solution Let M be the set of all men in the population, W be the set of all women, and B be the set ofall colorblind. It follows from the Bayes’ formula that

P (M∣∣ B) =

P (M ∩B)

P (B)=

P (B ∩M)

P (B)

=P (B

∣∣M)× P (M)

P (B∣∣M)P (M) + P (B

∣∣W )P (W )

.

By given,P (M) = P (W ) = 0.5,

P (B∣∣M) = 0.04,

P (B∣∣W ) = 0.02.

Thus,

P (M∣∣ B) =

0.04× 0.5

0.04× 0.5 + 0.02× 0.5=

2

3.

✷


A disease, which can only be diagnosed with certainty after death, exists in a proportion p0 of the population.A clinical quick test is known such that

P (test is positive given that disease is present) = p1 and

P (test is negative given that disease is absent) = p2.

Find, in terms of p0, p1, p2, the probability that a randomly chosen individual who tests positive actually hasthe disease.

Solution Let D = “has disease” and T = “test positive”. Let D̄ denote the complement of D.

P (D∣∣ T ) =

P (D ∩ T )

P (T )

=P (T

∣∣ D)× P (D)

P (T∣∣ D)× P (D) + P (T

∣∣ D̄)× P (D̄)

=p1p0

p1p0 + (1− p2)(1− p0).

✷


A manufacturer has recently designed a new car. In an international car show, the probability that the newcar will get an award for the best design is 25%, the probability that it will get an award for “the most favoritecar” is 20%, and the probability that it will get both awards is 13%.

(a) What is the probability that the car will get at least one of the two awards?

(b) Given that the car gets the award for the best design, what is the probability that it will get the awardfor the most favorite car?

(c) Given that the car does not get the award for the best design, what is the probability that it will notget the award for the most favorite car?

Solution

39


(a) By given,

P (B) = 0.25, P (F ) = 0.2

and

P (B and F ) = 0.13.

It follows from the inclusion and exclusion principle that

P (B or F ) = P (B) + P (F )− P (B and F )

= 0.25 + 0.2− 0.13

= 0.32.

(b)

P (F∣∣ B) =

P (B and F )

P (B)

=0.13

0.25= 0.52.

(c)

P (F c∣∣ Bc) =

P (Bc and F c)

P (Bc)

=P(

(B or F )c)

P (Bc)

=1− 0.32

1− 0.25≈ 0.91.

✷


An insurance company classifies drivers according to sex and to whether they are “under 25” or “25 yearsand above”. It finds that 60% of its drivers are male; 25% of the male drivers and 30% of the female driversare under 25. Find the probability of a driver being male given that the driver is under 25.

Solution Note that “25% of the male drivers are under 25” means P (under 25 | male) = 0.25.

Probabilities are given by

Male Female Total

Under 25 0.15 0.12 0.27

Over 25 0.45 0.28 0.73

Total 0.6 0.4 1

In the above, for examples,

P (Male and Under 25) = P (Under 25∣∣ Male)× P (Male)

= 0.25× 0.6

= 0.15,

andP (Female and Under 25) = P (Under 25

∣∣ Female)× P (Female)

= 0.3× 0.4

= 0.12.

40

It follows from Bayes’ formula that

P (male∣∣ under 25) =

P (male and under 25)

P (male and under 25) + P (female and under 25)

=0.15

0.27

=5

9

≈ 0.556.

The required probability is 55.6%. ✷


A large manufacturing firm purchases a certain component from three different vendors A, B and C. Vendor Asupplies 40% of the components and has a defective rate of 2%, vendor B supplies 40% of the components andhas a defective rate of 1% and vendor C supplies the remainder of the components and has a defective rateof 3%. If one component is randomly selected from a shipment and is found defective, what is the probabilitythat this shipment came from vendor C?

Solution Denote the events A, B, C: the shipments came from vendors A, B, C; and D: the randomlyselected component is defective. Apply the Bayes’ formula,

P (C∣∣ D) =

P (C and D)

P (D)

=P (D

∣∣ C)× P (C)

P (D∣∣ A)× P (A) + P (D

∣∣ B)× P (B) + P (D

∣∣ C)× P (C)

=(0.03)(0.2)

(0.02)(0.4) + (0.01)(0.4) + (0.03)(0.2)

=0.006

0.008 + 0.004 + 0.006

=1

3≈ 0.333.

✷


A manufacturing company employs two analytical plans for the design and development of a particularproduct. For cost reasons, both are used at varying times. In fact, plans 1 and 2 are used for 30% and 70%of the products respectively. The “defect rate” is different for the two plans as follows:

P (D∣∣ P1) = 0.03, P (D

∣∣ P2) = 0.01,

where P (D∣∣ P1) and P (D

∣∣ P2) are the probabilities of a defective product, given plan 1 and plan 2,

respectively. If a random product was observed and found to be defective, which plan was most likely usedand thus responsible?

Solution We compare the following two conditional probabilities:

P (P1

∣∣ D) =

P (D∣∣ P1) · P (P1)

P (D∣∣ P1) · P (P1) + P (D

∣∣ P2) · P (P2)

=(0.03)(0.3)

(0.03)(0.3) + (0.01)(0.7)

=0.009

0.016≈ 0.56,

41


and

P (P2

∣∣ D) =

P (D∣∣ P2) · P (P2)

P (D∣∣ P1) · P (P1) + P (D

∣∣ P2) · P (P2)

=(0.01)(0.7)

(0.03)(0.3) + (0.01)(0.7)

=0.007

0.016

≈ 0.4375.

Hence, Plan 1 was most likely used and thus responsible. ✷


Suppose 30% of the women in a class received an A on the examination and 25% of the men received an A.The class is 60% women. Given that a person chosen at random received an A, what is the probability thisperson is a woman?

Solution Denote the following events:

A = {receiving an A on the examination},W = {the person being a woman}, M = {the person being a man}.

ThenP (A

∣∣W ) = 0.3, P (A

∣∣M) = 0.25, and P (W ) = 0.6.

We want P (W∣∣ A). By definition,

P (W∣∣ A) =

P (W ∩A)

P (A),

whereP (W ∩A) = P (A

∣∣W ) · P (W ) = (0.3)(0.6) = 0.18,

and

P (A) = P (W ∩A) + P (M ∩A) = 0.18 + P (A∣∣M) · P (M) = 0.18 + (0.25)(0.4) = 0.28.

Hence,

P (W∣∣ A) =

0.18

0.28≈ 64.3%.

✷

� Example 3.15 (Multiplication principle for conditional probability ⋆ )

From a pack of 52 cards, we draw five cards one card by one card. What is the probability that an Ace willappear for the first time at the fifth drawn?

Solution Denote

Ei = {The i-th card is non-Ace}, i = 1, 2, 3, 4 and F = {Fifth card is Ace}.

The multiplication principle for conditional probability implies that

P (E1 ∩ E2 ∩ E3 ∩ E4 ∩ F ) = P (E1) × P (E2

∣∣ E1) × P (E3

∣∣ E1 ∩ E2) ×

P (E4

∣∣ E1 ∩ E2 ∩ E3) × P (F

∣∣ E1 ∩ E2 ∩ E3 ∩ E4).

Hence,

P (Ace appears first time at the fifth drawn) =48

52× 47

51× 46

50× 45

49× 4

48≈ 0.05989.

✷

42

Remark An alternative way of computing the probability is

P (Ace appears first time at the fifth drawn) =P 484 × P 4

1

P 525

≈ 0.05989.

� Example 3.16 (Independence of two events ⋆ )

A small town has one fire engine and one ambulance available for emergencies. The probability that the fireengine is available when needed is 0.98, and the probability that the ambulance is available when called is0.92. In the event of an injury resulting from a burning building, find the probability that both the ambulanceand the fire engine will be available.

Solution Let E be the event that “the ambulance is available” and F be the event that “the fire engineis available”. Without further information given by the question, we may assume that the two events areindependent. Thus, the required probability is given by

P (E ∩ F ) = P (E)× P (F )

= 0.98× 0.92

= 0.9016.

✷

� Example 3.17 (Conditional probability and independence ⋆ )

Consider a coin-tossing experiment involving two coins where one is fair and the other has two heads. Theexperiment is conducted as follows. The fair coin is tossed first. If a head appears, the fair coin is tossedagain, but if a tail appears the two-headed coin is tossed instead. Are the two events “heads on the first toss”and “heads on the second toss” independent? Explain why.

Solution The outcomes with their probabilities are as follows.

First Toss Second Toss Joint Probability

H H P (HH) = (1/2)(1/2) = 1/4

H T P (HT) = (1/2)(1/2) = 1/4

T H P (TH) = (1/2)(1) = 1/2

T T P (TT) = (1/2)(0) = 0

The probability that a head appears on the second toss is 1/4 + 1/2 = 3/4. However, the probability that

a head appears on the second toss given that a head appears on the first toss is not equal to 3/4. That is,

P (head second∣∣ head first) =

P (head first ∩ head second)

P (head first)

=1/4

1/4 + 1/4

=1

2

6= P (head on second) =3

4.

The two given events are not independent. ✷

43



In a deck of 52 cards, there are 13 kinds: Ace (A), King (K), Queen (Q), Jack (J) and values from 2 to 10.Each of such kinds has 4 suits: Spade (♠), Heart (♥), Club (♣) and Diamond (♦). Define Full House as aset of five cards containing three of a kind and a pair of another kind. A King Full House is a Full Housewith three Kings. Peter draws five cards randomly from the deck without replacement. In (a), (b) and (c)below, give your answers correct to 3 significant figures.

(a) What is the probability that Peter will get a Full House?

(b) What is the probability that Peter will get a King Full House?

(c) Suppose that the cards drawn by Peter have formed a Full House without any Kings. He then drawsanother five cards randomly from the remaining cards without replacement. Find the probability thatthese five cards form a King Full House.

Solution

(a) P (Full House) =C13

1 C43 × C12

1 C42

C525

≈ 0.00144.

(b) P (King Full House) =C1

1C43 × C12

1 C42

C525

≈ 0.000111.

(c)P (King Full House

∣∣ already drawn a Full House without any Kings)

=C1

1C43 ×

(C10

1 C42 + C1

1C22

)

C475

≈ 0.000159.

✷

Remark This example was taken from “HKALE 2007 Applied Mathematics A-Level Paper 2, Qn 6”.


Mary throws three fair six-sided dice simultaneously. Let X and Y be respectively the smallest and largestnumbers of spots obtained. Evaluate P (X = 5

∣∣ Y = 6).

Solution By definition,

P (X = 5∣∣ Y = 6) =

P (X = 5 and Y = 6)

P (Y = 6)

=P (all dice showing > 5)− P (all dice show “ 5 ”)− P (all dice show “ 6 ”)

1− P (no “ 6 ” is obtained)

=(2

6)3 − (

1

6)3 − (

1

6)3

1− (5

6)3

=6

91≈ 0.0659.

✷

� Example 3.20 (Conditional probability ⋆⋆⋆ )

An electronic signal, either 0 or 1, can be inputted into a device and a corresponding signal, either 0 or 1,will be generated as an output. There are totally four situations and the (conditional) probabilities of two ofthem are listed below:

P (Output = 0∣∣ Input = 0) = 0.96,

P (Output = 1∣∣ Input = 1) = 0.87.

44

(a) What are the other two situations and their corresponding probabilities?

(b) Assume that the signals 0 and 1 are equally likely to be inputted. Now,

(i) a signal is inputted into the device and a corresponding output signal is generated. What is theprobability that the output is 0?

(ii) two independent signals are inputted into the device and two corresponding output signals aregenerated. Given that both output signals are known to be 0. What is the probability that exactlyone of two input signals is 0?

(c) A signal 0 is inputted into the device, the output is then inputted back into the device as a new input,and the process may continue in the same manner as a cycle. Let Pn be the probability that the n-thoutput is 0, where n > 1. Find the value of P3.

Solution

(a)P (Output = 1

∣∣ Input = 0) = 0.04,

P (Output = 0∣∣ Input = 1) = 0.13.

(b) (i)

P (Output = 0) = P (Output = 0 and Input = 0) + P (Output = 0 and Input = 1)

= P (Output = 0∣∣ Input = 0)× P (Input = 0)

+P (Output = 0∣∣ Input = 1)× P (Input = 1)

= 0.96× 0.5 + 0.13× 0.5

= 0.545.

(ii)

The probability =P (Exactly one of inputs is 0 and two outputs are 0)

P (Two outputs are 0)

=

(

0.96× 0.5)

×(

0.13× 0.5)

× 2(

0.96× 0.5)2

+(

0.96× 0.5)

×(

0.13× 0.5)

× 2 +(

0.13× 0.5)2

≈ 0.0624

0.297025

≈ 0.2101.

(c)P1 = 0.96,

P2 = 0.96× P1 + 0.13× (1− P1)

= 0.962 + 0.13× 0.04

= 0.9268,

P3 = 0.96× P2 + 0.13× (1− P2)

= 0.96× 0.9268 + 0.13× 0.0732

≈ 0.8992.

✷

45



A signal corps has to use an information channel, which is faulty. All messages are sent as a sequence ofsix binary digits. The receiver knows that the message is one of the following: “Advance” (A), or “Retreat”(R), or “Stay where you are” (S). From past experience he expects these messages in the respective ratios1 : 4 : 5. The three messages are sent as

A : 0 1 0 1 0 0; R : 0 1 1 0 1 1; S : 1 0 1 0 0 1.

Independently for each character in the message, the fault causes “0” to be sent in place of “1” with probabilityp, and “1” to be sent in place of “0” with equal probability p, the probability that any given character istransmitted correctly being 1− p. The message is received as 0 1 1 1 0 0.

(a) State Bayes’ theorem.

(b) Show thatP (0 1 1 1 0 0 is received

∣∣ A : 0 1 0 1 0 0 is sent) = p(1− p)5

and obtain similar expressions for

P (0 1 1 1 0 0 is received∣∣ R : 0 1 1 0 1 1 is sent)

andP (0 1 1 1 0 0 is received

∣∣ S : 1 0 1 0 0 1 is sent).

(c) Deduce that

P (A : 0 1 0 1 0 0 is sent∣∣ 0 1 1 1 0 0 is received) =

(1− p)3

(1− p)3 + 4p2(1− p) + 5p3,

and write down similar expressions for

P (R : 0 1 1 0 1 1 is sent∣∣ 0 1 1 1 0 0 is received)

andP (S : 1 0 1 0 0 1 is sent

∣∣ 0 1 1 1 0 0 is received).

(d) If it is assumed that p is at most 0.1, which interpretation of the message is most likely to be correct?

Solution

(a) Bayes’ theorem states that

P (A∣∣ B) =

P (B∣∣ A)× P (A)

P (B)

where P (A) is the prior probability and P (A∣∣ B) is the posterior probability, the probability for A

after taking into account B for and against A.

(b)P (0 1 1 1 0 0 is received

∣∣ A : 0 1 0 1 0 0 is sent)

= P (1 character is transmitted incorrectly and the other 5 correctly)

= p(1− p)5.

Similarly,

P (0 1 1 1 0 0 is received∣∣ R : 0 1 1 0 1 1 is sent)

= P (3 characters are transmitted incorrectly and the other 3 correctly)

= p3(1− p)3

andP (0 1 1 1 0 0 is received

∣∣ S : 1 0 1 0 0 1 is sent)

= P (4 characters are transmitted incorrectly and the other 2 correctly)

= p4(1− p)2.

46

(c)P (A : 0 1 0 1 0 0 is sent

∣∣ 0 1 1 1 0 0 is received)

=P (0 1 1 1 0 0 is received

∣∣ A : 0 1 0 1 0 0 is sent) · P (A is sent)

P (0 1 1 1 0 0∣∣ A is sent) · P (A is sent) + P (0 1 1 1 0 0

∣∣ R is sent) · P (R is sent)

+ P (0 1 1 1 0 0∣∣ S is sent) · P (S is sent)

=p(1− p)5 × 0.1

p(1− p)5 × 0.1 + p3(1− p)3 × 0.4 + p4(1− p)2 × 0.5

=(1− p)3

(1− p)3 + 4p2(1− p) + 5p3

(

=: pA

)

.

Similarly,

P (R : 0 1 1 0 1 1 is sent∣∣ 0 1 1 1 0 0 is received) =

4p2(1− p)

(1− p)3 + 4p2(1− p) + 5p3

(

=: pR

)

and

P (S : 1 0 1 0 0 1 is sent∣∣ 0 1 1 1 0 0 is received) =

5p3

(1− p)3 + 4p2(1− p) + 5p3

(

=: pS

)

.

(d) If p = 0.1, then

pA =0.729

0.77≈ 0.9468, pR =

0.036

0.77≈ 0.0468, pS =

0.005

0.77≈ 0.0065.

Hence, the message A is most likely to be correct.

✷

47


48

Chapter 4

Discrete Random Variables

� Example 4.1 (Probability distribution ⋆ )

Suppose the following gives the frequency distribution of the vehicles owned by all families living in a smalltown.

Number of Vehicles Owned Frequency

0 30

1 470

2 850

3 490

4 160

Let X be the number of vehicles owned by a randomly selected family.

(a) Write the probability distribution of X.

(b) Find the probability that the number of vehicles owned by the families is

(i) from one to three;

(ii) at least three;

(iii) at most one.

Solution

(a) The probability distribution is given by the relative frequency distribution of the vehicles owned byall 2,000 families living in the town.

Number of Vehicles Relative Frequency Probability

x p(x) = P (X = x)

030

20000.015

1470

20000.235

2850

20000.425

3490

20000.245

4160

20000.080

49

4. Discrete Random Variables

(b) (i) The probability is given by

P (one to three) = P (X = 1) + P (X = 2) + P (X = 3)

= 0.235 + 0.425 + 0.245 = 0.905.

(ii) The probability is given by

P (at least three) = P (X = 3) + P (X = 4)

= 0.245 + 0.080 = 0.325.

(iii) The probability is given by

P (at most one) = P (X = 0) + P (X = 1)

= 0.015 + 0.235 = 0.25.

✷

� Example 4.2 (Probability distribution ⋆⋆ )

Consider the following game. There are six dice. Each of the dice has five blank sides. The sixth side has anumber which is either 1, 2, 3, 4, 5 or 6 – a different number on each die. The six dice are rolled and theplayer wins a prize depending on the total of the numbers which turn up. Let X be the total of the numberson the six dice. What is the image of the random variable X? Evaluate P (10 6 X 6 12).

Solution X be the total of the numbers which turn up on the six dice. Then

Image X = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21}.

We look forP (10 6 X 6 12) = P (X = 10) + P (X = 11) + P (X = 12).

Now,P (X = 10) = P ({4, 6}) + P ({1, 3, 6}) + P ({1, 4, 5}) + P ({2, 3, 5}) + P ({1, 2, 3, 4})

= (1

6)2(

5

6)4 + 3× (

1

6)3(

5

6)3 + (

1

6)4(

5

6)2

=1

66(54 + 3× 53 + 52

)

=1025

46656.

Similarly,

P (X = 11) = P ({5, 6}) + P ({1, 4, 6}) + P ({2, 3, 6}) + P ({2, 4, 5}) + P ({1, 2, 3, 5})

=1025

46656,

and

P (X = 12) = P ({1, 5, 6}) + P ({2, 4, 6}) + P ({3, 4, 5}) + P ({1, 2, 3, 6}) + P ({1, 2, 4, 5})

=3× 53 + 2× 52

46656=

425

46656.

Hence,

P (10 6 X 6 12) =1025 + 1025 + 425

46656

=275

5184≈ 0.053.


50

� Example 4.3 (Binomial distribution ⋆ )

A restaurant serves 8 main courses of fish, 12 of beef, and 10 of poultry. If customers select from these maincourses randomly, what is the probability that two of the next ten customers order fish main course?

Solution Let X denote the number of fish main course (successes) ordered by the next ten customers.Then X is binomial such that

X ∼ Bin(n, p),

where n = 10 and p =8

30=

4

15. Thus,

P (X = 2) = C102 ×

( 4

15

)2(11

15

)8 ≈ 0.2676.

✷


Find the probability that in a family of four children, there will be at least one boy and at least one girl.

Assume that the probability of a male birth is9

20.

Solution Denote X the number of boys in a family of 4 children. Then

P (X = 0) = C40 ×

( 9

20

)0(11

20

)4=

14641

160000,

P (X = 4) = C44 ×

( 9

20

)4(11

20

)0=

6561

160000.

Hence,

P (at least one boy and at least one girl) = 1− P (X = 0)− P (X = 4)

=138798

160000

≈ 0.8675.

The probability is 86.75%. ✷


Explain which of the following three events is more likely: that a person get (i) at least one “Six” when 6dice are rolled, (ii) at least two “Sixes” when 12 dice are rolled, (iii) at least three “Sixes” when 18 dice arerolled.

Solution

Probability for (i) = 1− P (0 Sixes)

= 1− C60

(1

6

)0(5

6

)6 ≈ 0.6651.

Probability for (ii) = 1− P (0 Sixes)− P (1 Six)

= 1− C120

(1

6

)0(5

6

)12 − C121

(1

6

)1(5

6

)11 ≈ 0.6187.

Probability for (iii) = 1− P (0)− P (1)− P (2)

= 1− C180

(1

6

)0(5

6

)18 − C181

(1

6

)1(5

6

)17 − C182

(1

6

)2(5

6

)16 ≈ 0.5973.

In conclusion, (i) is most likely to happen. ✷

51


� Example 4.6 (Binomial distribution ⋆⋆ )

Throw two fair dice. We say that “a match” occurs if the outcomes of the two dice are identical. Now weplay a game with the following three possible rules:

Rule 1. Throw the two fair dice 3 times. If you get at least one match, then you win the game.

Rule 2. Throw the two fair dice 7 times. If you get at least two matches, then you win the game.

Rule 3. Throw the two fair dice 11 times. If you get at least three matches, then you win the game.

Under which rule that you will have a higher probability of winning the game?

Solution The probability of getting a match is given by p =6

36=

1

6. Let Xi (i = 1, 2, 3) be the number

of matches obtained under the Rule i. Then

X1 ∼ Bin(3,1

6), X2 ∼ Bin(7,

1

6), X3 ∼ Bin(11,

1

6).

Now,

P (Win under Rule 1) = P (X1 > 1)

= 1− P (X1 = 0)

= 1−(5

6

)3 ≈ 0.4213,


= 1− P (X2 = 0)− P (X2 = 1)

= 1−(5

6

)7 − C71 ×

(5

6

)6(1

6

)≈ 0.3302,


= 1− P (X3 = 0)− P (X3 = 1)− P (X3 = 2)

= 1−(5

6

)11 − C111 ×

(5

6

)10(1

6

)− C11

2 ×(5

6

)9(1

6

)2 ≈ 0.2732.

In conclusion, one is most likely to win the game under Rule 1. ✷

� Example 4.7 (Binomial distribution ⋆⋆⋆ )

Suppose there is a game in which the player can either win or lose tokens. Jack repeatedly plays the game anumber of times and assume that the games are independent to each other. At each game, he will win onetoken with a probability of 0.3 or lose one token with a probability of 0.7. Suppose that Jack has 5 tokens inthe beginning (before the first play). He will quit the game play only when he has no token. Let Xk be thenumber of tokens that Jack has in hand right after playing the game k times, where k = 1, 2, 3, · · · .(a) What are the possible outcomes of X2?

(b) Find the expected value of X2.

(c) Describe briefly the situation when X5 = 0.

(d) Find the probability that Jack will quit the game play within the first 8 plays (i.e., playing the gameless than or equal to 8 times).

(e) Given that Jack quits the game play within the first 8 plays. What is the probability that he has exactly5 tokens right after the second play?

Solution

52

(a) The possible outcomes of X2 are 3, 5, 7.

(b)

P (X2 = 3) = (0.7)2 = 0.49,

P (X2 = 5) = C21 × (0.3)(0.7) = 0.42,

P (X2 = 7) = (0.3)2 = 0.09.

The expected value of X2 is given by

E(X2) = 3× 0.49 + 5× 0.42 + 7× 0.09 = 4.2

(c) John loses all 5 tokens after playing the game 5 times. He will quit the game play accordingly.

(d)

The required probability = P (X5 = 0) + P (X7 = 0)

= (0.7)5 + C51 (0.3)(0.7)

4 × (0.7)2

≈ 0.3445.

(e) Denote E1 as the event that Jack quits the game play within the first 8 plays and E2 as the eventthat Jack has exactly 5 tokens right after the second play.

The required probability = P (E2

∣∣ E1)

=P (E1 and E2)

P (E1)

=P (X2 = 5)× (0.7)5

0.3445

=0.42× (0.7)5

0.3445≈ 0.2050.

✷

� Example 4.8 (Binomial vs. hypergeometric ⋆ )

A box contains 20 balls of which 6 are white and 14 are black. Eight balls are drawn at random from the box.Find the probability that the sample contains exactly 3 white balls if

(a) sampling is done without replacement. What kind of distribution is this?

(b) sampling is done with replacement. What kind of distribution is this?

Solution

(a) Let X denote the number of white balls. For sampling without replacement, the probability of ob-taining 3 white balls is given by the hypergeometric distribution such that

X ∼ H(N,n, r), where N = 20, n = 8, r = 6.

Thus,

P (X = 3) =C6

3 C145

C208

=20× 2002

125970≈ 0.318.

53


(b) For sampling with replacement, the probability of obtaining 3 white balls is given by the binomialdistribution such that

X ∼ Bin(n, p), where n = 8, p =6

20= 0.3.

Thus,P (X = 3) = C8

3 (0.3)3 (0.7)5 ≈ 0.254.

✷

� Example 4.9 (Hypergeometric distribution ⋆ )

In a lottery, 6 numbers are drawn out of 49. Find the probability that “8” is one of the numbers drawn.

Solution The required probability is given by

C11 C

485

C496

=48× 47× 46× 45× 44

5× 4× 3× 2× 1× 6× 5× 4× 3× 2× 1

49× 48× 47× 46× 45× 44=

6

49≈ 0.1224.

✷


In Mark Six, 6 ordinary numbers are drawn out of 49. Find the probability that “8” and “18” are two of thenumbers drawn.

Solution The required probability is given by

C22 C

474

C496

=5

392≈ 0.0128.

✷


Suppose you plan to select four of 10 new stock issues (stock issued by new companies) and that, unknownto you, three of the 10 will result in substantial profits and seven will result in losses. What is the probabilitythat at least two of the three profitable issues will appear in your selection?

Solution We use the formula for the hypergeometric probability distribution to compute the probabilityof observing at least two successes in the sample of n = 4 is

P (X > 2) = P (X = 2) + P (X = 3)

=C3

2 C72

C104

+C3

3 C71

C104

=3× 21

210+

1× 7

210=

1

3.

The required probability approximately equal to 33.33%. ✷


From an ordinary pack of 52 poker cards, we draw one hand of 13 cards. What is the probability that wehave 5 spades in our hand?

Solution Let X denote the number of spades. The probability of obtaining 5 spades in our hand is givenby the hypergeometric distribution such that

X ∼ H(N,n, r), where N = 52, n = 13, r = 13.

Thus,

P (X = 5) =C13

5 C398

C5213

=1164427407

9338434700≈ 0.1247.

Note that in the above we have used the computer software Mathematica for evaluations. ✷

54

� Example 4.13 (Hypergeometric distribution and its binomial approximation ⋆⋆ )

A city is inhabited by 75,000 adults of whom 500 are university professors. In a survey on higher educa-tion carried out by the local hotline radio show, 25 people are chosen at random without replacement forquestioning.

(a) What is the probability that the sample contains at most one professor? Use a suitable random variableto solve this question.

(b) Use a suitable distribution to give a numerical approximation of the answer in part (a).

Solution

(a) The probability the sample contains k professors is given by the hypergeometric distribution,

P (X = k) =Cr

k · CN−rn−k

CNn

, where N = 75000, n = 25, r = 500.

Hence,

P (sample contains at most one professor) = P (X = 0) + P (X = 1)

=C500

0 · C7450025 + C500

1 · C7450024

C7500025

.

The above expression is simply “an answer” in the sense we understand it is indeed a value but howeverwe do not know exactly how large (or how small) of this value because we are unable to compute thevalues of the combinations. We need an approximation of this expression.

(b) In the case when n = 25 ≪ N = 75000, the binomial distribution should give an accurate approxima-tion,

X ∼ Bin(n, p), where n = 25, p =500

75000=

1

150.

Hence,

P (sample contains at most one professor) = P (X = 0) + P (X = 1)

= C250 ×

( 1

150

)0(149

150

)25+ C25

1 ×( 1

150

)1(149

150

)24

≈ 0.9880.

Indeed the value is very large, we have no clue of this if we just based on the expression in part (a).

✷

� Example 4.14 (Poisson distribution approximating binomial ⋆ )

In the lottery game “Mark Six”, players seek to guess which six numbers will be drawn out of a lottery machinewhich contains colored balls numbered 1 to 49. Suppose that 7,000,000 people play the lottery, and assume thateach player independently chooses, at random and without replacement, six numbers from 1, 2, · · · , 49. LetX denote the total number of players who win the six drawn numbers. Name the exact distribution of X, anduse a suitable approximation to this distribution and find, approximately, the probabilities (a) P (X = 0),(b) P (X = 1).

Solution The exact distribution of X is binomial:

X ∼ Bin(n, p) with n = 7,000,000 and p =1

C496

=1

13, 983, 816.

In the above, n is very large and p is very small. As an application, Poisson distribution may be used toapproximate the binomial probability. Its Poisson approximation has the parameter (mean):

λ = np = 0.50058.

55


In general, the Poisson probability distribution function is given by

P (X = k) =e−λ λk

k!.

(a) P (X = 0) = e−λ = e−0.50058 = 0.60618.

(b) P (X = 1) = λe−λ = 0.50058 · e−0.50058 = 0.30344.

✷

� Example 4.15 (Poisson distribution approximating binomial ⋆ )

An analyst predicted that 3.5% of small corporations would file for bankruptcy in the coming year. For arandom sample of 120 small corporations, use the Poisson distribution to estimate the probability that atleast 5% of them will file for bankruptcy in the next year, assuming that the analyst’s prediction is correct.

Solution The distribution of the number of corporations that will file for bankruptcy is binomial withn = 120 and p = 0.035, so that in terms of binomial variables,

X ∼ Bin(120, 0.035),

and in fact we look for the probability of at least 6 bankruptcies (120× 5% = 6)

P (X > 6) = 1− P (X 6 5) = 1−5∑

k=0

C120k (0.035)k(0.965)120−k.

The calculation of the above sum is a bit troublesome and we are going to use the Poisson distribution (themean of distribution is λ = np = 120× 0.035 = 4.2) to approximate the probability, we find

P (X > 6) = 1− P (X 6 5)

= 1−5∑

k=0

e−λλk

k!

= 1− e−4.2( (4.2)0

0!+

(4.2)1

1!+

(4.2)2

2!+

(4.2)3

3!+

(4.2)4

4!+

(4.2)5

5!

)

≈ 0.2469.

The required probability is approximately equal to 24.69%. ✷

� Example 4.16 (Poisson distribution ⋆⋆ )

Henry is an influential player in his soccer team. On the average, the team scores every 30 minutes in hispresence, but scores only once every 45 minutes in his absence. Henry has picked up a light injury now andhis chance of playing in the game tomorrow is only 50%. What is the probability that Henry’s team will scoretwo or more goals tomorrow? (Note: A soccer game is played for 90 minutes.)

Solution Let X be the scores that the Henry’s team can get. Under Henry’s presence,

X ∼ Poisson(3).

That is,

P (X = k∣∣ Henry present) =

3k

k!e−3.

Under Henry’s absence,X ∼ Poisson(2).

That is,

P (X = k∣∣ Henry absent) =

2k

k!e−2.

56

Now,

P (X > 2) = 1− P (X 6 1)

= 1− P (X 6 1∣∣ Henry present)× P (Henry present)

−P (X 6 1∣∣ Henry absent)× P (Henry absent)

= 1−(e−3 + 3e−3

)× 1

2−(e−2 + 2e−2

)× 1

2

= 1− 2e−3 − 1.5e−2

≈ 0.6974.

The required probability is approximately equal to 69.74%. ✷

� Example 4.17 (Poisson distribution ⋆⋆⋆ )

Flaws in lengths of rope made by Company A occur in a Poisson process at rate λA per metre length, so thatthe number of flaws X in a length of l metres of rope has the Poisson probability mass function

P (X = k) =exp(−λA l) (λAl)

k

k!, k = 0, 1, 2, · · · , λA > 0.

(a) Find the probability that there are (i) no flaws, (ii) more than 2 flaws, in a 1000-metre length of ropemade by company A, given that λA = 0.002.

(b) Company B makes similar rope, indistinguishable in appearance from that made by Company A, inwhich flaws occur in a Poisson process at rate λB = 0.003 per metre. A boat is rigged with 100 metresof rope from Company A and 100 metres of rope from Company B. Assuming that the lengths ofrope supplied by A and B are independent, find the probability that (i) there are no flaws, (ii) there isexactly one flaw, in the rigging of this boat.

(c) (i) A manufacturer of rigging for sailing boats buys 75% of his rope from Company A and 25% fromCompany B. The supplier’s label has become detached from a drum of rope of length 2 km whichis found to have 7 flaws. Find the probability that this drum was supplied by Company A.

(ii) Suppose, instead, that the rope in this drum had been found to have 8 flaws. Find the probabilitythat this drum was supplied by Company A. Compare this probability with your answer to part (i)and comment.

Solution

(a) X ∼ Poisson(2). P (X = k) = e−2 2k

k!.

(i) P (X = 0) = e−2 ≈ 0.1353.

(ii) P (X > 2) = 1− P (X 6 2) = 1− e−2(

1 + 2 +22

2

)

≈ 1− 0.6767 = 0.3233.

(b) XA ∼ Poisson(0.2) and XB ∼ Poisson(0.3).

(i)

P (no flaws) = P (XA = 0 and XB = 0)

= P (XA = 0)× P (XB = 0) by independence

= e−0.2 × e−0.3 = e−0.5

≈ 0.6065.

57


(ii) It follows from independence that

P (exactly one flaw) = P (XA = 0 and XB = 1) + P (XA = 1 and XB = 0)

= P (XA = 0)× P (XB = 1) + P (XA = 1)× P (XB = 0)

= e−0.2 × (0.3 e−0.3) + (0.2 e−0.2)× e−0.3

= 0.5 e−0.5

≈ 0.3033.

(c) (i)

P (A∣∣ 7 flaws) =

P (7 flaws∣∣ A)P (A)

P (7 flaws)

=P (7 flaws

∣∣ A)P (A)

P (7 flaws∣∣ A)P (A) + P (7 flaws

∣∣ B)P (B)

=e−4 47

7!× 0.75

e−447

7!× 0.75 + e−6

67

7!× 0.25

≈ 0.5647 (just larger than 0.5).

(ii)

P (A∣∣ 8 flaws) =

P (8 flaws∣∣ A)P (A)

P (8 flaws)

=P (8 flaws

∣∣ A)P (A)

P (8 flaws∣∣ A)P (A) + P (8 flaws

∣∣ B)P (B)

=e−4 48

8!× 0.75

e−448

8!× 0.75 + e−6

68

8!× 0.25

≈ 0.4638 (just less than 0.5).

✷

Remark This problem is to judge the dominant effect of “Quantity vs. Reliability”: the rigging containsmore rope from company A than from company B but the rope from B is less reliable than that from A.Thus, as we find increasingly more flaws in the rope (starts from and including 8 flaws) in the rope, theprobability that it came from company A reduces to less than 0.5. Remark that

P (A∣∣ N flaws) =

e−4 4N

N !× 3

4

e−44N

N !× 3

4+ e−6

6N

N !× 1

4

=3

3 + e−2(6

4

)N.

It follows from the above that P (A∣∣ N flaws) gets smaller as N gets larger.

58

� Example 4.18 (Negative binomial distribution ⋆⋆ )

Two teams, A and B, play a series of games. If team A has probability 0.55 of winning each game, is it to itsadvantage to play the best two out of three games or the best three out of five games? Assume the outcomesof successive games are independent.

Solution Let X denote the random variable that follows a negative binomial distribution,

X ∼ NegBin(r, p).

The key characteristic of the negative binomial distribution is:

r successes in k trials and the rth success happens on the kth trial (k > r).

Schematically,

1st 2nd 3rd 4th · · · (k − 2)th (k − 1)th︸︷︷︸

(r − 1) successes distributed as binomial over first (k − 1) trials

kth︸︷︷︸

rth success

The negative binomial probability is given by the product of two probabilities (by independence)

(

Ck−1r−1 (p)r−1 (1− p)(k−1)−(r−1)

)

︸︷︷︸

binomial probability

× p

or (after simplifications)

P (X = k) = Ck−1r−1 pr

(1− p

)k−r,

where k = r, r + 1, r + 2, · · · .

To answer the given question, now we have the following two considerations:

1. “Two out of Three”. X ∼ NegBin(r = 2, p = 0.55).

P (Team A wins) = P (X = 2) + P (X = 3)

= 0.552 + C21 (0.55)

2(0.45)

≈ 0.5748.

2. “Three out of Five”. Y ∼ NegBin(r = 3, p = 0.55).

P (Team A wins) = P (Y = 3) + P (Y = 4) + P (Y = 5)

= 0.553 + C32 (0.55)

3(0.45) + C42 (0.55)

3(0.45)2

≈ 0.5931.

It is more advantageous for team A (the better team) to play “the best three out of five games”. ✷

59


� Example 4.19 (Negative binomial distribution ⋆⋆⋆ )

A company takes out an insurance policy to cover accidents that occur at its manufacturing plant. Theprobability that one or more accidents will occur during any given month is 3/5. The number of accidentsthat occur in any given month is independent of the number of accidents that occur in all other months.Calculate the probability that there will be at least four months in which no accidents occur before the fourthmonth in which at least one accident occurs.

Solution If a month with one or more accidents is regarded as success and X be the number of failuresbefore the fourth success, then X follows a negative binomial distribution of the required probability

P (X > 4) = 1− P (X = 0)− P (X = 1)− P (X = 2)− P (X = 3)

= 1− C4−14−1 (

3

5)4(

2

5)4−4 − C5−1

4−1 (3

5)4(

2

5)5−4 − C6−1

4−1 (3

5)4(

2

5)6−4 − C7−1

4−1 (3

5)4(

2

5)7−4

= 1− (3

5)4(

1 +8

5+

8

5+

32

25

)

≈ 0.2898.

The probability is 28.98%. ✷

60

Chapter 5

Continuous Random Variables

� Example 5.1 (Probability density function ⋆ )

The continuous random variable X has probability density function given by

f(x) =

{kx2(1− x)2, 0 6 x 6 1,

0, otherwise.

Prove that P(

X 61

3

)

=17

81.

Solution We have k

∫ 1

0

x2(1− x)2 dx = 1, so k

∫ 1

0

(x2 − 2x3 + x4) dx = 1. This gives

1 = k

[1

3x3 − 1

2x4 +

1

5x5

]1

0

= k

(1

3− 1

2+

1

5

)

=⇒ k = 30.

Hence,

P(

X 61

3

)

=

∫ 1/3

0

30(x2 − 2x3 + x4) dx

= 30

[1

3x3 − 1

2x4 +

1

5x5

]1/3

0

= 30

(1

34− 1

2· 1

34+

1

5· 1

35

)

=17

81.

✷

� Example 5.2 (Normal distribution ⋆ )

Let X be the normal distributed random variable with µ = 1 and σ2 = 4. Find P(|X| > 1

).

Solution Given that X ∼ N(µ, σ2) = N(1, 22).

P(|X| > 1

)= P (X > 1 or X < −1)

= P (X > 1) + P (X < −1)

= P(

Z >1− 1

2

)

+ P(

Z <−1− 1

2

)

= P (Z > 0) + P (Z < −1)

≈ 0.5 + (0.5− 0.3413)

= 0.6587.

✷

61

5. Continuous Random Variables

� Example 5.3 (Normal distribution ⋆⋆ )

According to the Bureau of Labour Statistics, the average weekly pay for a U.S. production worker was$411.84 (The World Almanac, 2000). Assume that available data indicate that production worker wageswere normally distributed with a standard deviation of $90.

(a) What is the probability that a worker earned between $400 and $500?

(b) How much did a production worker have to earn to be in the top 20% of wage earners?

(c) For a randomly selected production worker, what is the probability that the worker earned less than$250 per week?

Solution

(a) Let X denote the production worker wages. Then

X ∼ N(µ, σ2), µ = 411.84, σ = 90.

Hence,

P (400 6 X 6 500) = P(400− 411.84

906 Z 6

500− 411.84

90

)

≈ P (−0.13 6 Z 6 0.98)

≈ 0.8365− (1− 0.5517)

= 0.3882.

(b) We must find the z-value that cuts off an area of 0.20 in the right tail. Using the standard normaltable, we find that z = 0.84 cuts off approximately 0.20 in the right tail. So,

x = µ+ zσ

= 411.84 + 0.84× 90

= 487.44.

Weekly earnings of $487.44 or above will put a production worker in the top 20%.

(c) At 250,

z =250− 411.84

90= −1.80.

Hence,P (X 6 250) = P (Z 6 −1.80)

≈ 1− 0.9641

= 0.0359.

The probability that a randomly selected production worker earns less than $250 per week is 3.59%.

✷

� Example 5.4 (Normal approximation to binomial distribution ⋆ )

According to an estimate, 55% of the people in Hong Kong have at least one credit card. If a random sampleof 30 persons is selected, what is the probability that at least 19 of them will have at least one credit card?

62

Solution Let n be the total number of persons in the sample, x be the number of persons in the samplewho have at least one credit card, and p be the probability that a person has at least one credit card. Then,this is a binomial problem with n = 30, p = 0.55. Using the binomial formula,

X ∼ Bin(30, 0.55),

and in fact we look for the probability

P (X > 19) =

30∑

k=19

C30k (0.55)k(0.45)30−k.

The calculation of the above sum is a bit troublesome and we are going to use the normal distribution(µ = np = 30× 0.55 = 16.5 and σ =

√npq = 2.7249) to approximate the probability. Remember that we

have to apply the continuity correction when approximating binomial probability using normal distribution,we find that

P (X > 19) ≈ P (Z >18.5− 16.5

2.7249)

≈ P (Z > 0.73)

≈ 0.5− 0.2673

= 0.2327.✷


In a multiple-choice examination, each question is linked with 5 possible answers, of which only one is correct.A particular candidate has probability p (0 < p < 1) of knowing the correct answer to a question. If thecandidate does not know the correct answer, then he chooses one of the possible answers at random. If theexamination consists of 50 questions, then it can be proved that a candidate must give 34 correct answersin order to obtain 30 marks. Based on this simple fact (no proof is needed) please answer the followingquestion: For the case where p = 0.75 independently for each question, find approximately the probabilitythat this candidate’s total mark for the examination is at least 30.

Solution Firstly find the probability that the candidate answers the question correctly in terms of p.

P (answers correctly) = P (answers correctly∣∣ knows)× P (knows)

+P (answers correctly∣∣ doesn’t know)× P (doesn’t know)

= 1× p+1

5× (1− p)

= p+1

5(1− p)

=1

5(1 + 4p) .

Use normal approximation. Let Y be the number of correct answers out of 50. Then

Y ∼ Bin(

50,1

5(1 + 4p)

)

= Bin(50, 0.8)approx.∼ N(40, 8).


P (Y > 34) ≈ P(

Z >33.5− 40√

8

)

≈ P (Z > −2.30)

= P (Z < 2.30)

≈ 0.9893.

✷

63



A large chain retailer purchases a certain kind of electronic device from a manufacturer. The manufacturerindicates that the defective rate of the device is 5%.

(a) The inspector of the retailer randomly picks 20 items from a shipment. What is the probability thatthere will be at least two defective items among these 20?

(b) Suppose that the retailer receives 30 shipments in a month and the inspector randomly tests 20 devicesper shipment. What is the probability that there will be more than 10 shipments containing at leasttwo defective devices? Use the normal approximation to find the answer.

Solution

(a) Let X be the number of defective items among the 20. Then

X ∼ Bin(n, p), where n = 20, p = 0.05.

Hence,

P (at least two defective) = P (X > 2)

= 1− P (X = 0)− P (X = 1)

= 1− C200

(0.05

)0 (0.95

)20 − C201

(0.05

)1 (0.95

)19

≈ 0.2642.

(b) Let Y be the number of shipments which contain at least one defective. Then

Y ∼ Bin(n, p), where n = 30, p = 0.2642.


P (Y > 10) =

30∑

k=11

C30k

(0.2642

)k (0.7358

)30−k.

Use the normal approximation to find the value of the above expression,

Y ∼approx

N(µ, σ2), where µ = np = 7.926, σ2 = npq ≈ 5.832.

Hence,

P (Y > 10) = P (Y > 11)

≈ P(

Z >10.5− 7.926√

5.832

)

≈ P (Z > 1.07)

≈ 1− 0.8577

= 0.1423.

✷


Peter is studying an examination that will consist of 45 multiple-choice questions selected randomly froma large test bank given in advance. For each question, four possible answers are presented but only one iscorrect. Peter studies all the questions of the test bank and assume that he knows the correct answers for60% of them and he will randomly choose an answer if he doesn’t know the correct answer. What is theprobability that the student will get more than 60% of the examination questions correct?

64

Solution Let X be the number of correct answers out of 45. Then

X ∼ Bin(45, p), where p = 0.6 + (0.4)(1

4) = 0.7.

Note that 60% of the 45 multiple-choice questions is 27.

P (X > 27) = P (28 6 X 6 45) =

45∑

k=28

C45k (0.7)k(0.3)45−k.

In order to find the numerical answer of the above binomial probability, use normal approximation that

Xapprox∼ N(µ, σ2), where µ = 45× 0.7 = 31.5 and σ2 = 45× 0.7× 0.3 = 9.45.

P (X > 27) ≈ P(

Z >27.5− 31.5√

9.45

)

≈ P (Z > −1.30)

≈ P (Z 6 1.30)

≈ 0.9032.✷


In a manufacturing process where glass products are produced defects or bubbles occur, occasionally renderingthe piece undesirable for selling. It is known that on average 1 in every 800 of these items produced has oneor more bubbles.

(a) What is the binomial probability that a random sample of 4000 will yield fewer than 5 items possessingbubbles? Using Poisson approximation to the binomial distribution, find the approximated value ofthe probability.

(b) Using Normal approximation with continuity corrections, find, alternatively, the approximation of thebinomial probability in (a).

Solution

(a) This is essentially a binomial experiment with n = 4000 and p = 1/800 = 0.00125. If X denotes thenumber of items possessing bubbles, we have

P (X < 5) =4∑

k=0

P (X = k)

=4∑

k=0

C4000k (0.00125)k(0.99875)4000−k.

The above expression gives the exact value of the required (binomial) probability. However, it seemstroublesome in the calculations especially the value of C4000

4 , for example, could be very large. Forthis question, since p is very small and n is quite large, we shall approximate with the Poisson usingλ = 4000× 0.00125 = 5. Hence,

P (X < 5) ≈4∑

k=0

λk e−λ

k!

= e−5

(

1 + 5 +52

2!+

53

3!+

54

4!

)

≈ 0.4405.

The required probability is approximately equal to 44.05%.

65


(b) We look for the probability by the normal approximation,

X ∼ Bin(n = 4000, p = 0.00125)approx∼ N(µ = 5, σ2 = 4.99375).

Hence,

P (X < 5) = P (0 6 X 6 4)

≈ P(

Z 64.5− 5√4.99375

)

≈ P (Z 6 −0.22)

= 1− P (Z 6 0.22) (by symmetry)

= 1− Φ(0.22)

≈ 1− 0.5871

= 0.4129.

✷

� Example 5.9 (Normal approximation to binomial distribution ⋆⋆ )

A multiple-choice test consists of 50 questions each with 5 possible answers of which only one answer iscorrect. Suppose that the 50 questions are randomly selected from a large test bank. Assume that a studentknows the answer for 60% of all questions in the test bank and the student will randomly choose an answerif the student doesn’t know the correct answer.

(a) What is the probability that the student will get more than 40 correct answers?

(b) Suppose that each correct answer is awarded 3 marks and each incorrect answer carries a penalty of1 mark. What are the mean and the standard deviation of the total marks of the student?

Solution Let X be the number of correct answers out of 50. Then

X ∼ Bin(50, p), where p = 0.6 + (0.4)(1

5) = 0.68.

(a)

P (X > 40) =50∑

k=41

C50k (0.68)k(0.32)50−k.

In order to find the numerical answer, use normal approximation that

Xapprox∼ N(µ, σ2), where µ = 50× 0.68 = 34 and σ2 = 50× 0.68× 0.32 = 10.88.

P (X > 40) = P (X > 41)

≈ P (Z >40.5− 34√

10.88)

≈ P (Z > 1.97) = 1− Φ(1.97)

≈ 1− 0.9756

= 0.0244.

(b) Let Y be the total marks obtained by the student. Then

Y = 3X − (50−X) = 4X − 50.

66

Hence,E(Y ) = E

(4X − 50

)

= 4× E(X)− 50

= 4× 34− 50 = 86.

Besides,Var(Y ) = Var(4X − 50)

= 42 ×Var(X)

= 16× 10.88 = 174.08,

orσY =

√174.08 ≈ 13.1939.

✷

� Example 5.10 (Normal approximation to binomial distribution ⋆⋆⋆ )

There is a Liberal Studies test in a school. The test has N questions and each question has 5 possible choices,exactly two of which are correct.

• All candidates select exactly two choices for each question in the test.

• A candidate can get 1 mark in a question if his/her two selected choices in this question are bothcorrect; otherwise, no marks will be given.

• All questions in the test are independent.

• A wild-guesser is a candidate who selects at random for all questions.

(a) What is the probability that a wild-guesser will get 1 mark in one single question?

(b) Suppose that N = 10. Denote X the marks that a wild-guesser will get in the test.

(i) Find the probabilities: P (X = 0), P (X 6 1) and P (X 6 2).

(ii) Determine the minimum passing mark (in integers) such that the probability that a wild-guesserwill pass the test is less than 5%.

(c) Suppose that N = 90. By using normal approximation to binomial probabilities, determine the mini-mum passing mark (in integers) such that more than 67% of wild-guessers will not pass the test.

Solution

(a) P (1 mark) =1

C52

=1

10.

(b) N = 10. X ∼ Bin(10,1

10).

(i)

P (X = 0) = (9

10)10 ≈ 0.3487,

P (X 6 1) = (9

10)10 + C10

9 (1

10)(

9

10)9 ≈ 0.7361,

P (X 6 2) = (9

10)10 + C10

9 (1

10)(

9

10)9 + C10

8 (1

10)2(

9

10)8 ≈ 0.9298.

(ii)

P (X 6 3) ≈ 0.9298 + C107 (

1

10)3(

9

10)7 ≈ 0.9872.

The minimum passing mark is therefore chosen as 4.

67


(c) Let the integer k be the passing mark. Therefore, a candidate will get a fail whenever he/she gets themarks less than or equal to k − 1; and a pass whenever he/she gets the marks more than or equal to

k. Y ∼ Bin(90,1

10)

approx.∼ N(9, 81). To find the value of k, we solve the inequality

P (Y 6 k − 1) > 0.67,

P (Z 6k − 0.5− 9√

8.1) > 0.67,

k − 0.5− 9√8.1

> 0.44,

k > 10.75.

The minimum passing mark is therefore chosen as 11.

✷

� Example 5.11 (Normal approximation to Poisson distribution ⋆⋆ )

A certain insurance company offer life, fire and vehicle coverage. The numbers of claims arriving on any givenday on these three types of policy are independent Poisson random variables with mean equal to 33, 25 and45, respectively. What is the probability, on any given day, the company will receive claims on more than 120policies of all three types? Give the formula of calculating the exact probability. Using Normal approximationwith continuity corrections, find the best guess of the probability. (Hint: The sum of independent Poissonrandom variables gives a Poisson random variable, too.)

Solution Let X1, X2 and X3 be the claims of life insurance, fire insurance and vehicle insurance, respec-tively. By given,

X1 ∼ Poisson(33), X2 ∼ Poisson(25), X3 ∼ Poisson(45).

Since X1, X2, X3 are independent,

X1 +X2 +X3 ∼ Poisson(33 + 25 + 45).

Denote X = X1 +X2 +X3. ThenX ∼ Poisson(103).

To find the exact probability,

P (X > 120) = 1− P (X 6 120) = 1−120∑

k=0

103k

k!e−103.

Using Normal approximation, Xapprox∼ N(103, 103) with continuity correction, we have

P (X > 120) = P (X > 121)

≈ P(

Z >120.5− 103√

103

)

≈ P (Z > 1.72)

= 1− Φ(1.72)

≈ 1− 0.9573

= 0.0427.

✷

68


The number of fatal traffic accidents which occur at random at a known ‘black spot’ follows a Poissondistribution with mean 4 per year. Let X denote the actual number of fatal accidents. Let Y denote theannual number of non-fatal accidents at the same place which may be assumed to be a Poisson randomvariable with mean 12. Note that X and Y are independent.

(a) (Poisson distribution) Given that there were in total 20 accidents one year (i.e., X + Y = 20). Findthe conditional probability that 5 of these are fatal.

(b) (Normal approximation to Poisson distribution) The total number of traffic accidents is denoted byV = X + Y . State the distribution of V and write down the mean and variance of V . Use normalapproximation to calculate the probability that there will be more than 18 accidents in the next year.

Solution

(a) Note that we have the following fact: If X ∼ Poisson(λ1) and Y ∼ Poisson(λ2) are independent,then

(i) X + Y ∼ Poisson(λ1 + λ2) and

(ii) P (X = k∣∣ X + Y = n) = Cn

k

( λ1

λ1 + λ2

)k ( λ2

λ1 + λ2

)n−k

.

If follows that

P (X = 5∣∣ X +Y = 20) = P

(

X = 5∣∣ X ∼ Bin(20,

4

16= 0.25)

)

= C205 (0.25)5(0.75)15 = 0.2023.


(b) V is Poisson with

Mean = E(V ) = λ = 16 and Variance = Var(V ) = λ = 16.

Use normal approximation

V ∼ Poisson(16)approx.∼ N(16, 16).

ThenP (V > 18) = P (V > 19)

≈ P(

Z >18.5− 16

4

)

= P (Z > 0.625)

≈ 0.2643.


✷


Many Chinese tourists are enthusiastic about visiting Hong Kong every year during the Golden Week hol-idays. They usually travel via airlines, railway or automobiles. The numbers of mainland Chinese peoplevisiting Hong Kong on any given day through these three types of transportation are assumed to be mutuallyindependent Poisson random variables with mean equal to 16, 59 and 33 (thousands), respectively.

(a) What is the probability, on any given day, there are more than 110 thousands people visiting HongKong via these all three types of transportation? Give the formula of calculating the exact probability.

(b) Using Normal approximation with continuity corrections, find the best guess of the probability in (a).

Solution

69


(a) Let X1, X2 and X3 be the numbers of people visiting Hong Kong via the transportation of airlines,railway and automobiles, respectively. By given,

X1 ∼ Poisson(16), X2 ∼ Poisson(59) and X3 ∼ Poisson(33).

Now let X = X1 +X2 +X3. Since X1, X2 and X3 are mutually independent variables, we have

X ∼ Poisson(108).

To find the exact probability,

P (X > 110) = 1−110∑

k=0

108k

k!e−108,

which however involves very troublesome calculations.

(b) The result in (a) leads to the normal approximation,

X ∼ Poisson(108)approx∼ N(108, 108)

with continuity correction, we have

P (X > 110) = P (X > 111)

≈ P(

Z >110.5− 108√

108

)

≈ P (Z > 0.24)

= 1− Φ(0.24)

≈ 1− 0.5948

= 0.4052.

The probability is approximately equal to 40.52%.

✷

� Example 5.14 (Sum of independent normal random variables ⋆⋆ )

The weight of a randomly selected can of a new soft drink is known to have a normal distribution with mean12.1 ounces and a standard deviation of 0.1 ounce.

(a) What is the probability that if a can is drawn at random, it weights between 11.9 and 12.3 ounces?

(b) What weight should be printed on the can so that the average weight of cans in a six pack is underweightfor only 1% of all six-pack.

Solution Let X be the weight of a can of the new soft drink. Then X ∼ N(12.1, 0.12).

(a)

P (11.9 < X < 12.3) = P(11.9− 12.1

0.1< Z <

12.3− 12.1

0.1

)

= P (−2 < Z < 2)

= (0.9772− 0.5)× 2

= 0.9544.

(b) Denote X the average weight of the cans in a six-pack. Then

X =1

6

(

X1 +X2 +X3 +X4 +X5 +X6

)

and

E(X)

=1

6× 6× 12.1 = 12.1 and Var(X ) =

1

62× 6× 0.12 =

0.12

6.

70

Thus,

X ∼ N(12.1,0.12

6).

Our target is to hunt for x̄ such that

−Z0.01 =x̄− 12.1

0.1/√6

= −2.325.

Solving the above equation for x̄ gives x̄ = 12.00508 ounces.

✷


Lemon tea is dispensed by a machine into bottles. The nominal volume of lemon tea in a bottle is 1 litre(1000 ml). The actual volumes of lemon tea put into the bottles can be regarded as being independentlynormally distributed, with mean set at 1010 ml and standard deviation 8 ml.

(a) Find the proportion, in a long run of production, of bottles containing less than the nominal volume.

(b) Bottles of lemon tea are often sold in packs of 6 bottles. Write down the distribution of the total volumeof lemon tea in a pack of 6 bottles. Find the probability that the total volume of lemon tea in a packof 6 bottles is less than 6 litres. Explain why this probability is less than your answer to part (a).

(c) A new and more accurate machine is now available, for which the volume of lemon tea dispensed isnormally distributed but with smaller standard deviation 4 ml. By how much could the existing meanvolume of lemon tea dispensed into each bottle be reduced without increasing the existing proportionof bottles with less than the nominal volume? Supposing that the additional cost of the more accuratemachine is 18,000 dollars, and the cost of the lemon tea is 8 dollars per litre, how many bottles oflemon tea would have to be filled by the more accurate machine in order to justify its greater cost?

Solution

(a) Let X be the actual volumes of lemon tea put into the bottles. Then X ∼ N(1010, 82).

The required probability = P (X < 1000)

= P(

Z <1000− 1010

8

)

= P (Z < −1.25)

≈ 0.1056.

(b) Let Y denote the total volume in a 6-pack. Then Y = X1 +X2 +X3 +X4 +X5 +X6 and

E(Y ) = E(X1 +X2 +X3 +X4 +X5 +X6

)

= E(X1) + E(X2) + E(X3) + E(X4) + E(X5) + E(X6)

= 1010 + 1010 + 1010 + 1010 + 1010 + 1010

= 6060,

Var(Y ) = Var(X1) + Var(X2) + Var(X3) + Var(X4) + Var(X5) + Var(X6)

= 82 + 82 + 82 + 82 + 82 + 82

= 384.

Thus, Y ∼ N(6060, 384) = N(6060, (8

√6)2).

The required probability = P (Y < 6000)

= P(

Z <6000− 6060

8√6

)

≈ P (Z < −3.06)

≈ 0.0011.

71


It would be less likely that all 6 bottles contain less than the nominal volume because the six bottlesare 6 independent events that cancellations may occur.

This probability is considerably smaller than in part (a). In practical terms, this is because there willbe a tendency for heavier and lighter bottles in a 6-pack to balance each other out. Alternatively,could use X̄ ∼ N(1010, 64/6) and compare P (X̄ < 1000) with P (X < 1000). In terms of theseprobability distributions of X and X̄: X̄ has the same mean as X but only one-sixth of the variance,so less of the lower tail of the distribution of X̄ is below the nominal volume of 1000.

(c) Let µ be the new mean volume of lemon tea dispensed. Then

P(

Z <1000− µ

4

)

= 0.1056.

Hence,1000− µ

4= −1.25

which implies thatµ = 1005.

There the existing mean volume of lemon tea dispensed could be reduced by 1010− 1005 = 5 (ml).

The number of bottles have to be filled =18000

8× 5× 0.001

= 450,000.

✷


Orange juice is dispensed by a machine into bottles. Bottles of orange juice are often sold in packs of 6 bottles.The nominal volume of orange juice in a bottle is 500 ml. The actual volumes of orange juice put into thebottles can be regarded as being independently normally distributed, with mean set at 505 ml and standarddeviation 4 ml. The production of a bottle of orange juice will only be accepted if its volume is within 7 mlof the target 505 ml.

(a) What is the probability that a randomly selected bottle of orange juice is acceptable?

(b) A customer specifies that a pack of 6 bottles will only be accepted if all the six bottles of orange juiceproduced are acceptable.

(i) What is the probability that a randomly selected pack will be acceptable?

(ii) What is the probability that in 15 packs, no more than two packs will be rejected?

(c) The production manager feels that it is simpler to check the total volume of the six bottles. He thinksthat the production of a pack is acceptable if the total volume of its six bottles is within 42 ml of thetarget volume 3030 ml (i.e., 505 ml ×6). What is the probability that a randomly selected pack willsatisfy this criterion?

(d) The customer and the manager inspect the packs of 6 bottles sold in a supermarket according to theirown criteria as specified in (b) and (c). Determine whether each of the following cases is possible, andgive an example with concrete data if it is (no example is needed for impossible cases):

(i) A pack is accepted by the customer but rejected by the manager.

(ii) A pack is rejected by the customer but accepted by the manager.

Solution

(a) X ∼ N(505, 42).

P (−7 < X − 505 < 7) = P(

− 7

4< Z <

7

4

)

= 0.9198.

Let p denote the above probability (p = 0.9198).

72

(b) (i)

The required probability = p6 = 0.6056.

Let q denote the above probability (q = 0.6056).

(ii)

The required probability = q15 + C151 q14(1− q) + C15

2 q13(1− q)2

≈ 0.02987.

(c) Let Y = X1 +X2 +X3 +X4 +X5 +X6. Then Y ∼ N(3030, 6× 42) = N(3030, (4√6)2).

P (−42 < Y − 3030 < 42) = P(

− 42

4√6< Z <

42

4√6

)

= 1.

(d) (i) Impossible.

(ii) Possible. Consider the particular example:

X1 = 500, X2 = 500, X3 = 500, X4 = 500, X5 = 500, X6 = 512.

✷

� Example 5.17 (Sum of independent normal random variables ⋆⋆⋆ )

My cycle journey to work is 3 km, and my cycling time (in minutes) if there are no delays is distributedN(15, 1), i.e. normally with mean 15 and variance 1.

(a) Find the probability that, if there are no delays, I get to work in at most 17 minutes.

(b) On my route there are three sets of traffic lights. Each time I meet a red traffic light, I am delayed by arandom time that is distributed N(0.7, 0.09). These lights operate independently. Find the probabilityof my getting to work in at most 17 minutes

(i) if just one light is set at red when I reach it.

(ii) if just two lights are set at red when I reach them.

(iii) if all three lights are set at red when I reach them.

(c) Suppose that, for each set of lights, the chance of delay is 0.5. Find the mean value of T , my totaljourney time, in minutes.

(d) Given that Var(T ) = 1.5025, use a suitable approximation to calculate the probability that, over10 journeys, my average journey time to work is at most 17 minutes.

Solution

(a) Let X be the cycling time in minutes for the 3km journey. Then X ∼ N(15, 1).

The required probability = P (X 6 17)

= P(

Z 617− 15

1

)

= P (Z 6 2)

≈ 0.9772.

73


(b) (i) The total journey time follows the distribution N(15 + 0.7, 1 + 0.09) = N(15.7, 1.09).

The required probability = P(

Z 617− 15.7√

1.09

)

≈ P (Z 6 1.245)

≈ 0.8935.

(ii) The total journey time follows the distribution N(15 + 0.7× 2, 1 + 0.09× 2) = N(16.4, 1.18).


Z 617− 16.4√

1.18

)

≈ P (Z 6 0.552)

≈ 0.7092.

(iii) The total journey time follows the distribution N(15 + 0.7× 3, 1 + 0.09× 3) = N(17.1, 1.27).


Z 617− 17.1√

1.27

)

≈ P (Z 6 −0.089)

≈ 0.4646.

(c) Let Tk be the total journey time if just k lights are set at red when I reach them, where k = 0, 1, 2, 3.Then

Tk ∼ N(15 + 0.7k, 1 + 0.09k), for k = 0, 1, 2, 3.

Note thatT = (0.5)3 × T0 + C3

1 (0.5)3 × T1 + C3

2 (0.5)3 × T2 + (0.5)3 × T3

= 0.125×(T0 + 3T1 + 3T2 + T3

).

The mean value of T is

E(T ) = E(0.125× (T0 + 3T1 + 3T2 + T3)

)

= 0.125×[

E(T0) + 3E(T1) + 3E(T2) + E(T3)]

= 0.125×[

15 + 3× 15.7 + 3× 16.4 + 17.1]

= 16.05.

(d)

The required probability = P (T 6 17)

= P(

Z 617− 16.05√

1.5025

10

)

≈ P (Z 6 2.45)

≈ 0.9929.

✷

74

Remark In part (d), the given variance is not correct. The correct value should be

Var(T ) = Var(

0.125 (T0 + 3T1 + 3T2 + T3))

= 0.1252 ×[

Var(T0) + 32 ×Var(T1) + 32 ×Var(T2) + Var(T3)]

, since Tk are independent

= 0.1252 ×[

1 + 9× 1.09 + 9× 1.18 + 1.27]

≈ 0.3547.

� Example 5.18 (Exponential distribution ⋆⋆ )

Suppose that 60% of all calls are personal and the rest are business calls in a call center. Furthermore, thelength of personal and business calls have exponential distributions with µp = 1 minute and µb = 3 minutes,respectively.

(a) What is the probability that a given call lasts less than 1 minute?

(b) Tom always claims that a call lasts less than 1 minute is a personal call, and otherwise it is a businesscall. What is the probability that he makes an incorrect claim on a given call?

Solution

(a) Note that

P (Tp < 1) = 1− e−1 and P (Tb < 1) = 1− e−1/3.

Hence,

P (a call lasts less than 1 minute) = P (Tp < 1 and personal) + P (Tb < 1 and business)

= P (Tp < 1∣∣ personal call)× P (personal call)+

P (Tb < 1∣∣ business call)× P (business call)

=(1− e−1

)× 0.6 +

(1− e−1/3

)× 0.4

≈ 0.4927.

(b)P (incorrect claim) = P (Tb < 1 and business) + P (Tp > 1 and personal)

= P (Tb < 1∣∣ business call)× P (business call)+

P (Tp > 1∣∣ personal call)× P (personal call)

=(1− e−1/3

)× 0.4 + e−1 × 0.6

≈ 0.3341.

✷

75


76

Chapter 6

Mathematical Expectation

� Example 6.1 (Expected value ⋆ )

On the average, how many times must a die be thrown until one gets a “Six”?

Solution (Geometric distribution) Let p denote the probability for success and q for failure (p+ q = 1).

Trial 1 2 3 4 · · · k

P (first success on trial) p qp q2p q3p · · · qk−1p

Note that

p+ qp+ q2p+ q3p+ · · · = p(

1 + q + q2 + · · ·)

=p

1− q= 1.

Now,mean number of trials = p+ 2qp+ 3q2p+ 4q3p+ · · ·

(=: N

),

qN = qp+ 2q2p+ 3q3p+ · · · ,

(1− q)N = 1, (why?)

N =1

p= 6, if p =

1

6.

✷


Consider the following two boxes in which each contains 7,000 dollars. After counting we find that

Box A contains: $1, 000× 1 paper, $500× 10 papers, $10× 100 papers.

Box B contains: $500× 2 papers, $100× 50 papers, $20× 50 papers.

In which box do you expect to get more if you pick one paper money from each box?

Solution There are 1 + 10 + 100 = 111 papers of money in Box A and 2 + 50 + 50 = 102 in Box B.The expected values of the paper money taken from each box are given by

E(dollars from Box A

)= 1000× 1

111+ 500× 10

111+ 10× 100

111= 63

7

111,

and

E(dollars from Box B

)= 500× 2

102+ 100× 50

102+ 20× 50

102= 68

64

102.

It is more likely one can get more money from Box B. ✷

77

6. Mathematical Expectation


The probability distribution for damage claims paid by the Automobile Insurance Company on collision in-surance follows.

Payment (dollars) Probability

0 0.90

400 0.04

1,000 0.03

2,000 0.01

4,000 0.01

6,000 0.01

(a) Use the expected collision payment to determine the collision insurance premium that would enable thecompany to break even.

(b) The insurance company charges an annual rate of $260 for the collision coverage. What is the expectedvalue of the collision policy for a policy holder? (Hint: It is the expected payments from the companyminus the cost of coverage.) Why does the policy holder purchase a collision policy with this expectedvalue?

Solution

(a)x f(x) xf(x)

0 0.90 0.00

400 0.04 16.00

1000 0.03 30.00

2000 0.01 20.00

4000 0.01 40.00

6000 0.01 60.00

Hence,

E(X) =∑

xf(x)

= 400× 0.04 + 1000× 0.03 + 2000× 0.01 + 4000× 0.01 + 6000× 0.01

= 166.

If the company charged a premium of $166.00 they would break even.

(b)Gain to policy holder f(Gain) Gain× f(Gain)

−260 0.90 −234.00

140 0.04 5.60

740 0.03 22.20

1740 0.01 17.40

3740 0.01 37.40

5740 0.01 57.40

Hence,E(Gain

)= −234 + 5.6 + 22.2 + 17.4 + 37.4 + 57.4 = −94.

The policy holder is more concerned that the big accident will break him than with the expectedannual loss of $94.

✷

78


A Personal Identification Number (PIN) consists of five digits in order, each of which may be any one of 0,1, 2, 3, 4, 5, 6, 7, 8, 9. Two PINs are chosen independently and at random, and you are given that eachPIN consists of five different digits. Let X be the random variable denoting the number of digits that the twoPINs have in common. Write explicitly the probability density function of X and hence find the mean of X.

Solution The image of X is {0, 1, 2, 3, 4, 5}.

P (X = 0) = P (No digit in common) =C5

5

C105

=1

252,

P (X = 1) = P (1 digit in common) =C5

1 × C54

C105

=25

252,

P (X = 2) = P (2 digits in common) =C5

2 × C53

C105

=100

252,


3 × C52

C105

=100

252,


4 × C51

C105

=25

252,


5

C105

=1

252.

By definition, the expectation of X is

E(X) = 1×( 25

252

)

+ 2×(100

252

)

+ 3×(100

252

)

+ 4×( 25

252

)

+ 5×( 1

252

)

=630

252= 2.5.

In fact, there is an easy method: E(X) = 12+ 1

2+ 1

2+ 1

2+ 1

2= 5

2= 2.5. ✷

� Example 6.5 (Expected value ⋆⋆ )

There are 6 pairs of identical socks placed in a drawer, in which 6 of them are left socks, 6 of them are rightsocks. Now, 7 socks (could either be left or right) are taken from them at random. Evaluate the expectedpairs of socks obtained.

Solution Let X be the number of pairs of socks taken from the drawer. The possible outcomes of thechosen socks and the corresponding probabilities are given in the following table:

(No. of left sock, No. of right sock

)Probability of the outcome

Outcome 1: (6, 1)C6

6 C61

C127

=6

792=

1

132

Outcome 2: (5, 2)C6

5 C62

C127

=90

792=

5

44

Outcome 3: (4, 3)C6

4 C63

C127

=300

792=

25

66

Outcome 4: (3, 4)C6

3 C64

C127

=25

66

Outcome 5: (2, 5)C6

2 C65

C127

=5

44

Outcome 6: (1, 6)C6

1 C66

C127

=1

132

Clearly, X ∈ {1, 2, 3}. By definition, the expectation is

E(X) = 1×( 1

132× 2)

+ 2×( 5

44× 2)

+ 3×(25

66× 2)

=181

66≈ 2.7424.

✷

79



Let 2 fair dice be rolled and the numbers showed up on them be X and Y . Find the expectation of |X − Y |.

Solution

P (|X − Y | = 1) = P({

(1, 2), (2, 1), (2, 3), (3, 2), (3, 4), (4, 3), (4, 5), (5, 4), (5, 6), (6, 5)})

=10

36,

P (|X − Y | = 2) = P({

(1, 3), (3, 1), (2, 4), (4, 2), (3, 5), (5, 3), (4, 6), (6, 4)})

=8

36,

P (|X − Y | = 3) = P({

(1, 4), (4, 1), (2, 5), (5, 2), (3, 6), (6, 3)})

=6

36,

P (|X − Y | = 4) = P({

(1, 5), (5, 1), (2, 6), (6, 2)})

=4

36,

P (|X − Y | = 5) = P({

(1, 6), (6, 1)})

=2

36.

By definition,

E(|X − Y |

)= 1× 10

36+ 2× 8

36+ 3× 6

36+ 4× 4

36+ 5× 2

36=

35

18.

✷


Toss a fair coin until the first head appears, and let X be the number of tosses required.

(a) What is the name of the distribution of X? Write the probability distribution function of X.

(b) What is the expected value of X?

Solution

(a) The random variable X whose probability distribution function is given by

f(1) =1

2, f(2) =

1

4, f(3) =

1

8, f(4) =

1

16, · · · , f(n) =

1

2n, · · ·

is said to be geometric random variable with parameter p =1

2.

(b) For convenience, let S = E(X),

Sdef.=

∞∑

x=1

xf(x)

= 1× 1

2+ 2× 1

4+ 3× 1

8+ 4× 1

16+ 5× 1

32+ · · ·

=1

2+

2

4+

3

8+

4

16+

5

32+ · · · ,

2S = 1 +2

2+

3

4+

4

8+

5

16+ · · · .

Subtraction gives 2S − S = 1 +1

2+

1

4+

1

8+

1

16+ · · · , or

E(X) = S =1

1− 12

= 2.

✷

80

� Example 6.8 (Average number ⋆ )

The followings are two examples of the same problem.

(I) From a shuffled deck, cards are laid out on a table one at a time, face up from left to right, and thenanother deck is laid out so that each of its cards is beneath a card of the first deck. What is the averagenumber of matches of the card above and the card below in repetitions of this experiment?

(II) A typist types letters and envelopes to n different persons. The letters are randomly put into theenvelopes. On the average, how many letters are put into their own envelopes?

Solution

(I) Given 52 cards in a deck, each card has 1 chance in 52 of making it paired card. Probability =1

52.

Average number of matches =

52 opportunities︷︸︸︷

52 ×( 1

52

)

= 1.

(II)

Average number of letters =

n opportunities︷︸︸︷n ×

( 1

n

)

= 1.

✷


A well-shuffled ordinary pack of 52 poker cards is divided randomly into four hands of 13 each. Countingjack, queen, and king as numbers 11, 12 and 13, respectively, we say that “a match” occurs in a hand if thej-th card is j. What is the expected value of the total number of matches in all four hands?

Solution Unlike Example 6.8, we shall present the full details (may be harder to read) of the solution tothis problem in the following. Let Xi (i = 1, 2, 3, 4) be the number of matches in the i-th hand. Then

X = X1 +X2 +X3 +X4

is the total number of matches in all four hands, and E(X) = E(X1)+E(X2)+E(X3)+E(X4) is our target.To calculate each E(Xi), we let Aij be the event that the j-th card in the i-th hand is j (1 6 i 6 4,1 6 j 6 13). Then by defining

Xij =

{1, if Aij occurs,

0, otherwise,

we have that

Xi =13∑

j=1

Xij = Xi1 +Xi2 +Xi3 + · · ·Xi,13.

Now, for each fixed pair of (i, j),

P (Aij) =4

52=

1

13implies that

E(Xij) = 1× P (Aij) + 0× P (Acij) =

1

13.

Hence,

E(Xi) = E( 13∑

j=1

Xij

)

=

13∑

j=1

E(Xij) =

13∑

j=1

1

13= 1.

Thus on average there is one match in every hand. From this we finally get

E(X) = E(X1) + E(X2) + E(X3) + E(X4) = 4× 1 = 4

showing that on average there are a total of four matches in all four hands. ✷

81



Eight boys and seven girls are randomly seating in a row, say, for an example, BBGGBBGBGBGBBGG.On the average, what is the expected number of unlike adjacent pairs? What if the number of boys is b andthe number of girls is g?

Solution Take the following as an example:

BBGGBBGBGBGBBGG.

There are 9 “BG or GB” unlike adjacent pairs.

P (being BG or GB in the first two seats) =8

15× 7

14+

7

15× 8

14=

8

15.

Since there are 14 adjacent pairs in total,

E(X) = E(X1 +X2 +X3 + · · ·+X14

)

= 14× E(X1)

= 14× 8

15

= 77

15≈ 7.4667.

In general, if the number of boys is b and the number of girls is g, then

E(X) = (g + b− 1)

[gb

(g + b)(g + b− 1)+

bg

(g + b)(g + b− 1)

]

=2gb

g + b.

✷


Consider the quadratic equationAx2 +Bx+ C = 0.

The coefficients A, B and C (which are assumed to be independent) could be 1 or −1 of equal probability.Find the expected number of real roots and the corresponding variance.

Solution Denote ∆ = B2 − 4AC. Consider the following cases.

1. (A,B,C) = (1, 1, 1) =⇒ ∆ < 0 =⇒ No real root.

2. (A,B,C) = (1, 1,−1) =⇒ ∆ > 0 =⇒ Two real roots.

3. (A,B,C) = (1,−1, 1) =⇒ ∆ < 0 =⇒ No real root.

4. (A,B,C) = (−1, 1, 1) =⇒ ∆ > 0 =⇒ Two real roots.

5. (A,B,C) = (1,−1,−1) =⇒ ∆ > 0 =⇒ Two real roots.

6. (A,B,C) = (−1, 1,−1) =⇒ ∆ < 0 =⇒ No real root.

7. (A,B,C) = (−1,−1, 1) =⇒ ∆ > 0 =⇒ Two real roots.

8. (A,B,C) = (−1,−1,−1) =⇒ ∆ < 0 =⇒ No real root.

Let X denote the number of real roots.

P (X = 0) =4

8=

1

2, P (X = 2) =

4

8=

1

2.

Hence,

E(X) = 0× 1

2+ 2× 1

2= 1,

Var(X) = E(X2)−(

E(X))2

=(

02 × 1

2+ 22 × 1

2

)

− 12 = 1.

✷

82

� Example 6.12 (Mean and variance of a random variable ⋆⋆⋆ )

The random variable X follows the binomial Bin(n, p) distribution with probability mass function

f(x) = Cnx pxqn−x, x = 0, 1, 2, · · · , n, 0 < p < 1, q = 1− p.

(a) Prove that E(X) = np and Var(X) = npq.

A mathematics class in a school is divided into group A with 12 students and group B with 25 students.Both groups are given a test consisting of 16 short questions. For any student in group A, the score (thatis, the number of correct answers) is distributed as Bin(16, 0.75); for any student in group B, the score isdistributed as Bin(16, 0.5). All students answer independently.

(b) Find the probability that

(i) a given group A student gets all 16 questions right.

(ii) at least one student in group A gets all 16 questions right.

(c) Use an appropriate approximation to find the probability that a given group B student scores morethan a given group A student. Justify the approximated solution by direct calculations.

(d) Let X and Y denote the mean scores of students in group A and group B respectively. Find E(X),E(Y ), Var(X), and Var(Y ).

Solution

(a) Please refer to Lecture Notes Chapter 6 for the proofs.

(b) (i)

The required probability = P (a given group A student gets all 16 questions right)

= (0.75)16

≈ 0.010022595

≈ 0.0100.

(ii)

The required probability = P (at least one student in group A gets all 16 questions right)

= 1− P (no student gets all 16 questions right)

= 1−[

1−(0.75

)16]12

≈ 0.113857868

≈ 0.1139.

(c) Let X and Y be the respective scores of a given group A students and a given group B student. ThenX ∼ Bin(16, 0.75) and Y ∼ Bin(16, 0.5). Note that 16 × 0.75 = 12 > 5 and 16 × (1 − 0.75) = 4 ≈ 5,X follows approximately a normal distribution N(12, 3). Similarly, since 16× 0.5 = 8 > 5, Y followsapproximately N(8, 4). Now,

X − Y follows approximately N(12− 8, 3 + 4) = N(4, 7).

The required probability = P (X − Y < 0)

= P (X − Y 6 −1)

≈ P(Z 6

−1 + 0.5− 4√7

)

≈ P (Z 6 −1.70)

≈ 0.0446.

83


(d) Let Xi and Yj (where 1 6 i 6 12 and 1 6 j 6 25) be the respective scores of 12 group A students and25 group B students. By the assumption that all students answer independently, Xi’s and Yj ’s are allindependent. Denote

X =1

12

12∑

i=1

Xi, and Y =1

25

25∑

j=1

Yj .

Hence,

E(X) =1

12E( 12∑

i=1

Xi

)

=1

12

12∑

i=1

E(Xi), since Xi are independent

=1

12

12∑

i=1

16× 0.75

= 12.

Similarly,E(Y ) = 16× 0.5 = 8.

To find the variances:

Var(X) =1

122Var( 12∑

i=1

Xi

)

=1

122

12∑

i=1

Var(Xi), since Xi are independent

=1

122

12∑

i=1

16× 0.75× 0.25

=1

122× 12× 3 =

1

4.

Similarly,

Var(Y ) =1

25× 16× 0.5× 0.5 =

4

25= 0.16.

✷

Remark The approximation used in (c) may need further justifications since it fails to fulfill the conditionsn > 30 and n(1− p) > 5. In case if we do not use approximations,

P (X < Y ) =15∑

k=0

P (X = k)× P (Y > k)

=15∑

k=0

(

C16k (0.75)k(0.25)16−k

16∑

r=k+1

C16r (0.5)16

)

=1

248

15∑

k=0

(

C16k 3k ·

16∑

r=k+1

C16r

)

≈ 0.0460.

� Example 6.13 (Mean and variance of a random variable ⋆⋆ )

Of the adult population in a large city, 60% favour a new leisure centre, 30% oppose it and 10% are indifferent.A random sample of 4 adults is taken from the population and their opinions on the new centre are noted.

(a) Find the probability that

(i) all four think alike.

(ii) none of the four is opposed to the new centre.

84

(iii) all three opinions (in favour, oppose, indifferent) are represented in the sample.

(iv) all four are in favour of the new centre, if it is given that none of the four is opposed.

(b) State the expectation and variance of the number in the sample who are in favour of the new centre.

(c) In this city, one quarter of adults are classified as “young” (age < 30) and three-quarters are “older”(aged at least 30). You are told that 12% of young adults oppose the new leisure centre; deduce theproportion of older adults who are opposed.

(d) Given that the sample consists of one young adult and three older adults, find the probability thatexactly one member of the sample opposes the new centre.

Solution

(a) (i)The required probability = (0.6)4 + (0.3)4 + (0.1)4 = 0.1378.

(ii)The required probability = (1− 0.3)4 = 0.2401.

(iii)

The required probability

= (0.6)(0.3)(0.1)2 × 4!

1! 1! 2!+ (0.6)(0.1)(0.3)2 × 4!

1! 1! 2!+ (0.1)(0.3)(0.6)2 × 4!

1! 1! 2!

= 0.216.

(iv)

The required probability =(0.6)4

0.2401

=1296

2401

≈ 0.5398.

(b) Let X denote the number in the sample who are in flavour of the new centre. Then X ∼ Bin(4, 0.6).Hence,

E(X) = 4× 0.6 = 2.4,

andVar(X) = 4× 0.6× (1− 0.6) = 0.96.

(c) Let x be the proportion of older adults who oppose the new leisure centre. Then

1

4(0.12) +

3

4x = 0.3.

Solving for x gives x = 0.36. Hence, the required proportion is 0.36.

(d)The required probability = (0.12)× (1− 0.36)3 + (1− 0.12)× C3

1 (0.36)(0.64)2

=164352

390625

≈ 0.4207.

✷

85


86

Chapter 7

Joint Distribution of Two

Random Variables

� Example 7.1 (Joint distribution ⋆ )

The joint probability distribution of the random variables X and Y is summarized in the following table.

x\y 0 1 2 3 pX(x)

0 k 6k 9k 4k

1 8k 18k 12k 2k

2 k 6k 9k 4k

pY (y)

(a) Find k.

(b) Find the marginal distributions of X and Y , i.e., pX(x) and pY (y).

(c) Find the conditional distribution of X given that Y = 2.

(d) State with a reason whether or not X and Y are independent.

Solution

(a) The sum of all the entries in the table is 80k. Hence, k =1

80.

(b) Row and column sums give the marginal distributions of X and Y :

X 0 1 2

P (X) 1/4 1/2 1/4

andY 0 1 2 3

P (Y ) 1/8 3/8 3/8 1/8

(c) For P (X = x∣∣ Y = 2) =

P (X = x and Y = 2)

P (Y = 2), the conditional distribution of X given that Y = 2

is given by

X 0 1 2

Probability9/80

3/8= 9/30 = 0.3

12/80

3/8= 0.4

9/80

3/8= 0.3

87

7. Joint Distribution of Two Random Variables

(d) For independence, every individual P (X = x, Y = y) in the table must be the product of its twomarginal probabilities. However, consider x = y = 0, we have

P (X = 0, Y = 0) = k =1

80.

But

pX(0)× pY (0) =1

4× 1

8=

1

32.

So, X and Y are not independent.

✷


Two fair dice of different colors are rolled and let X be the number on the red die, Y be the number on thegreen die. Denote the new random variables U = X + Y and V = |X − Y |.(a) Find the marginal probability distributions of U and V , i.e., pU (u) and pV (v).

(b) Determine whether if U and V are independent random variables or not.

(c) Find the conditional probability of U = 8 given that V 6 3.

Solution

(a) The marginal probability distribution of U is given by

u 2 3 4 5 6 7 8 9 10 11 12

pU (u)1

36

2

36

3

36

4

36

5

36

6

36

5

36

4

36

3

36

2

36

1

36

The marginal probability distribution of V is given by

v 0 1 2 3 4 5

pV (v)6

36

10

36

8

36

6

36

4

36

2

36

(b) Since

P (U = 2) =1

36,

P (V = 5) =2

36,

P (U = 2, V = 5) = 0.

Hence,P (U = 2) · P (V = 5) 6= P (U = 2, V = 5).

Consequently, the two random variables U and V are not independent.

(c) The required conditional probability is given by

P (U = 8∣∣ V 6 3) =

P (U = 8 and V 6 3)

P (V 6 3)

=

3

366 + 10 + 8 + 6

36

= 0.1.

✷

88


Roll a balanced die and let the outcome be X. Then toss a fair coin X times and let Y denote the numberof tails. Are X and Y independent? Why? What is the joint probability density function of X and Y ?

Solution Let p(x, y) be the joint probability function of X and Y . Clearly,

X ∈ {1, 2, 3, 4, 5, 6} and Y ∈ {0, 1, 2, 3, 4, 5, 6}.

Now, if X = 1, then Y = 0 or 1, we have

p(1, 0) = P (X = 1, Y = 0) = P (X = 1)× P (Y = 0∣∣ X = 1)

=1

6× 1

2=

1

12,

p(1, 1) = P (X = 1, Y = 1) = P (X = 1)× P (Y = 1∣∣ X = 1)

=1

6× 1

2=

1

12.

If X = 2, then y = 0, 1, or 2, where

p(2, 0) = P (X = 2, Y = 0) = P (X = 2)× P (Y = 0∣∣ X = 2)

=1

6× 1

4=

1

24,

p(2, 1) = P (X = 2, Y = 1) = P (X = 2)× P (Y = 1∣∣ X = 2)

=1

6× 1

2=

1

12,

p(2, 2) = P (X = 2, Y = 2) = P (X = 2)× P (Y = 2∣∣ X = 2)

=1

6× 1

4=

1

24.

If X = 3, then y = 0, 1, 2 or 3, where

p(3, 0) = P (X = 3, Y = 0) = P (X = 3)× P (Y = 0∣∣ X = 3)

=1

6× C3

0

(1

2

)0(1

2

)3

=1

48,

p(3, 1) = P (X = 3, Y = 1) = P (X = 3)× P (Y = 1∣∣ X = 3)

=1

6× C3

1

(1

2

)1(1

2

)2

=3

48,

p(3, 2) = P (X = 3, Y = 2) = P (X = 3)× P (Y = 2∣∣ X = 3)

=1

6× C3

2

(1

2

)2(1

2

)1

=3

48,

p(3, 3) = P (X = 3, Y = 3) = P (X = 3)× P (Y = 3∣∣ X = 3)

=1

6× C3

3

(1

2

)3(1

2

)0

=1

48.

89


Similar calculations will yield the following table for p(x, y).

x\y 0 1 2 3 4 5 6 pX(x)

1 1/12 1/12 0 0 0 0 0 1/6

2 1/24 2/24 1/24 0 0 0 0 1/6

3 1/48 3/48 3/48 1/48 0 0 0 1/6

4 1/96 4/96 6/96 4/96 1/96 0 0 1/6

5 1/192 5/192 10/192 10/192 5/192 1/192 0 1/6

6 1/384 6/384 15/384 20/384 15/384 6/384 1/384 1/6

pY (y) 63/384 120/384 99/384 64/384 29/384 8/384 1/384

Note that pX(x) = P (X = x) and pY (y) = P (Y = y), the probability functions of X and Y , are obtainedby summing up the rows and the columns of this table, respectively. The two variables X and Y are clearlynot independent according to the above table. ✷


Toss a fair coin three times and let the random variable X be 0 if the outcome of the first toss is a head and1 if the outcome of the first toss is a tail. Let another random variable Y denote the number of heads. Whatis the joint probability density function of X and Y ? Are X and Y independent? Why?

Solution Let p(x, y) be the joint probability function ofX and Y . Clearly, X ∈ {0, 1} and Y ∈ {0, 1, 2, 3}.Now, if X = 0, then Y = 1, 2 or 3, we have

p(0, 1) = P (X = 0, Y = 1) = P(

{HTT})

=1

8,

p(0, 2) = P (X = 0, Y = 2) = P(

{HTH,HHT})

=2

8,

p(0, 3) = P (X = 0, Y = 3) = P(

{HHH})

=1

8.

If X = 1, then Y = 0, 1 or 2, where

p(1, 0) = P (X = 1, Y = 0) = P(

{TTT})

=1

8,

p(1, 1) = P (X = 1, Y = 1) = P(

{TTH,THT})

=2

8,

p(1, 2) = P (X = 1, Y = 2) = P(

{THH})

=1

8.

The above calculations will yield the following table for p(x, y).

x\y 0 1 2 3 pX(x)

0 0 1/8 2/8 1/8 1/2

1 1/8 2/8 1/8 0 1/2

pY (y) 1/8 3/8 3/8 1/8

It is obvious that for example,p(0, 0) 6= pX(0)× pY (0).

The two variables X and Y are not independent. ✷

90

� Example 7.5 (Joint distribution ⋆⋆ )

Two balls are drawn from an urn containing one yellow, two red and three blue balls. Let X be the number ofred balls and Y be the number of blue balls drawn. Find the joint distribution and the marginal distributionsof X and Y . Are X and Y independent? Why? Given that exactly one of the drawn balls is known to bered, use the joint distribution to find the probability that the other drawn ball is blue.

Solution The joint distribution of X and Y can be expressed as the following table for p(x, y).

x\y 0 1 2 pX(x)

0 0 1/5 1/5 2/5

1 2/15 2/5 0 8/15

2 1/15 0 0 1/15

pY (y) 1/5 3/5 1/5

Note that the marginal distributions of X and Y are given by pX(x) = P (X = x) and pY (y) = P (Y = y),respectively. These probability functions of X and Y , are obtained by summing up the rows and the columnsof this table, respectively. The variables X and Y are not independent because

p(0, 0) = 0 6= pX(0)× pY (0).

Finally we need to find the conditional probability

P (one is blue∣∣ one is red) =

p(1, 1)

pX(1)

=2

5× 15

8=

3

4= 0.75.

✷


Consider an experiment that consists of two tosses of a fair die. Denote X be the number of 4 ’s and Y bethe number of 5 ’s obtained in the two tosses of the die.

(a) Find the joint probability distribution of X and Y . Find also the marginal distributions of the randomvariables.

(b) Are X and Y independent?

(c) Find P(

(X,Y ) ∈ A)

, where A is the region{

(x, y) such that x+ 2y < 3}

.

Solution

(a) The joint distribution of X and Y can be expressed as the following table for p(x, y).

x\y 0 1 2 pX(x)

0 16/36 8/36 1/36 25/36

1 8/36 2/36 0 10/36

2 1/36 0 0 1/36

pY (y) 25/36 10/36 1/36

The marginal distributions of X and Y are given by pX(x) = P (X = x) and pY (y) = P (Y = y),respectively. These probability functions of X and Y , are obtained by summing up the rows and thecolumns of this table, respectively.

(b) The random variables X and Y are not independent because (for example)

P (X = 2, Y = 2) = 0 6= P (X = 2)× P (Y = 2).

91


(c)

P(

(X,Y ) ∈ A)

= P (X = 0, Y = 0) + P (X = 1, Y = 0) + P (X = 2, Y = 0)

+P (X = 0, Y = 1)

=16

36+

8

36+

1

36+

8

36

=33

36=

11

12.

✷

� Example 7.7 (Joint distribution ⋆⋆⋆ )

The table below shows the joint distribution of two random variables X and Y .

x\y 1 2 3 4

1 6k 3k 2k 4k

2 4k 2k 4k 0

3 2k k 0 2k

(a) Calculate the expectation E(X).

(b) New random variables U and V are defined by

U =

{

1, if X = 1 or 3,

0, if X = 2,and V =

{

1, if Y = 1 or 3,

0, if Y = 2 or 4.

Write down the joint distribution of U and V in a table and state with a reason whether or not Uand V are independent.

Solution

(a) The sum of all the entries in the table is 30k. Hence, k =1

30. The marginal probabilities of X are

given by

P (X = 1) = 15k =1

2, P (X = 2) = 10k =

1

3, P (X = 3) = 5k =

1

6.

Therefore,

E(X) = 1× 1

2+ 2× 1

3+ 3× 1

6=

5

3.

(b) The table below shows the joint distribution of the new random variables U and V .

u\v 0 1

0 2k =1

158k =

4

1510k =

1

3

1 10k =1

310k =

1

320k =

2

3

12k =2

518k =

3

5

92

Note that

P (U = 0, V = 0) =1

15,

P (U = 0)× P (V = 0) =1

3× 2

5=

2

156= P (U = 0, V = 0).

Thus, the new random variables U and V are not independent.

✷


Suppose X and Y are independent random variables having Poisson distributions with respective means λand µ, where λ, µ > 0.

(a) Show that X + Y also follows a Poisson distribution.

(b) Find P (X = k∣∣ X +Y = n) when k and n are integers with 0 6 k 6 n. For given fixed n > 0, name

the distribution you have obtained.

(c) Telephone calls arriving at a computer helpline are classed as urgent or standard; urgent calls average8 per hour, standard calls average 24 per hour. Ten calls arrive within 30 minutes; find (to twosignificant figures) the probability that at most two of them are urgent, stating any assumptions youmake.

Solution

(a) As an exercise.

(b) As an exercise. Answer: P (X = k∣

∣ X + Y = n) = Cnk pk(1− p)n−k, where p =

λ

λ+ µ.

(c) As an exercise. Answer: 0.5256.

✷


Jane chooses a number X at random from the set of numbers {1, 2, 3, 4}, so that

P (X = k) =1

4, for k = 1, 2, 3, 4.

She then chooses a number Y at random from the subset of numbers {X, · · · , 4}; for example, if X = 3,then Y is chosen at random from {3, 4}.(a) Find the joint probability distribution of X and Y and display it in the form of a two-way table.

(b) Find the marginal probability distribution of Y , and hence find E(Y ) and Var(Y ).

(c) Find the probability distribution of U = X + Y .

Solution

(a) As an exercise. Answer: P (X = x, Y = y) = P (Y = y∣

∣ X = x)× P (X = x).

(b) As an exercise. Answer: E(Y ) =13

4, Var(Y ) =

41

48.

(c) As an exercise. Answer: P (U = 2) = P (U = 3) =1

16, P (U = 4) = P (U = 5) =

7

48, P (U = 6) =

5

24,

P (U = 7) =1

8, P (U = 8) =

1

4.

✷

93



Two tennis players, A and B, are playing a match. Let X be the number of serves faster than 125 mphserved by A in one of his service games and let Y be the number of these serves returned by B. The followingprobability model is proposed:

P (X = 0) = 0.4, P (X = 1) = 0.3, P (X = 2) = 0.2 and P (X = 3) = 0.1.

The conditional distribution of Y (given that X = x > 0) is binomial with parameters x and 0.4, andP (Y = 0

∣∣ X = 0) = 1. Assume that this model is correct when answering the following questions.

(a) Find the joint probability distribution of X and Y and display it in the form of a two-way table.

(b) Find the marginal distribution of Y and evaluate E(Y ).

(c) Use your joint probability distribution table to find the probability distribution of the number of servesfaster than 125 mph that are not returned by B in a game.

Solution

(a) As an exercise. Answer: P (0, 0) = 0.4, P (1, 0) = 0.18, P (1, 1) = 0.12, P (2, 0) = 0.072,

P (2, 1) = 0.096, P (2, 2) = 0.032, P (3, 0) = 0.0216, P (3, 1) = 0.0432, P (3, 2) = 0.0288, P (3, 3) = 0.0064.

(b) As an exercise. Answer: E(Y ) = 0.4.

(c) As an exercise. Answer: U = X − Y . P (U = 0) = 0.5584, P (U = 1) = 0.3048,

P (U = 2) = 0.1152, P (U = 3) = 0.0216.

✷


The joint probability distribution of the random variables X and Y is summarized in the following table.

x\y −1 1 2 pX(x)

1 6c 12c 6c

2 3c 6c 3c

3 3c 6c 3c

pY (y)

(a) Find the value of c.

(b) Find the marginal distributions of X and Y , i.e., pX(x) and pY (y).

(c) Are the random variables X and Y independent? State with a reason.

(d) Find the conditional distribution of X given that Y = −1.

Solution

(a) The sum of all the entries in the table is 48c. Hence, c =1

48.

(b) Row and column sums give the marginal distributions of X and Y :

X 1 2 3

P (X) 1/2 1/4 1/4

andY −1 1 2

P (Y ) 1/4 1/2 1/4

94

(c) For independence, every individual P (X = x, Y = y) in the table must be the product of its twomarginal probabilities.

P (X = x, Y = y) = P (X = x) · P (Y = y), for all x, y.

By definition, X and Y are independent.

(d) For

P (X = x∣∣ Y = −1) =

P (X = x and Y = −1)

P (Y = −1),

the conditional distribution of X given that Y = −1 is given by

X 1 2 3

P (X = x, Y = −1)6c

12c= 1/2

3c

12c= 1/4

3c

12c= 1/4

✷


Assume that k is a certain constant. The joint probability density function of (X,Y ) is given by

x\y −3 −2 2 pX(x)

0 20k 10k 10k

1 10k 5k 15k

2 10k 15k 5k

pY (y)

(a) Evaluate the constant k. Find the probability density function of Y and hence the expectation of Y ,i.e., E(Y ).

(b) Are the random variables X and Y independent? Why?

(c) Find the probability P (X + Y < 0).

Solution

(a) By given, 20k + 10k + 10k + 10k + 5k + 15k + 10k + 15k + 5k = 100k = 1 =⇒ k = 0.01. Hence,

pY (y) =

40k = 0.4 when y = −3,

30k = 0.3 when y = −2,

30k = 0.3 when y = 2.

E(Y ) =∑

y × pY (y) = (−3)× 0.4 + (−2)× 0.3 + (2)× 0.3 = −1.2.

(b) Note that p(0,−3) = 0.2, pX(0) = 0.4 and pY (−3) = 0.4. Thus, X and Y are not independent because

p(0,−3) 6= pX(0)× pY (−3).

(c)P (X + Y < 0) = P (X = 0, Y = −3) + P (X = 0, Y = −2) + P (X = 1, Y = −3)

+P (X = 1, Y = −2) + P (X = 2, Y = −3)

= 0.2 + 0.1 + 0.1 + 0.05 + 0.1

= 0.55.✷

95


� Example 7.13 (Joint distribution ⋆⋆⋆ )

A stick is cut into three pieces at random, so that every point on the stick is equally likely to be a cuttingpoint. Evaluate the probability that the three resulting pieces of sticks form a triangle.

Solution Without any loss of generality, assume that the length of the stick is one unit. Let X denote thedistance from the left cutting point to the left end of the stick and Y the distance from the right cuttingpoint to the left end of the stick. Then, X and Y are jointly uniformly distributed over the region

R ={

(x, y) : 0 < x < 1 and x < y < 1}

.

The three resulting parts of the stick have lengths X, Y −X and 1− Y after it is cut into three parts.These three parts form a triangle if and only if

X + (Y −X) > 1− Y and (Y −X) + (1− Y ) > X and (1− Y ) +X > Y −X.

In other words (after simplifications),

Y > 1/2 and 0 < X <1

2and 0 < Y −X <

1

2.

Therefore, if

R′ =

{

(x, y) : 0 < x <1

2and y >

1

2and 0 < y − x <

1

2

}

,

the probability the three resulting parts of the stick after cutting form a triangle is given by

P(

0 < X <1

2and Y >

1

2and 0 < Y −X <

1

2

)

= P(

(X,Y ) belongs to R′

)

=Area of R

′

Area of R

=1/8

1/2=

1

4= 0.25.

✷

96

Chapter 8

Markov Chains and Applications

� Example 8.1 (Markov chains ⋆ )

Suppose that there are 2 audio CDs of classical music, 3 of jazz music, and 4 of popular music. A personfirstly listens the classical music and will randomly select a CD to listen among the remaining two musicstyles. Construct a model of Markov chain and find the corresponding transition probability matrix. Findthe probability that the fourth CD listened is not of popular music style.

Solution The state space is given by {S1, S2, S3}, where S1 = classical, S2 = jazz, S3 = popular. Thecorresponding transition probability matrix is given by

M =

03

7

4

7

2

60

4

6

2

5

3

50

and hence

M2 =

03

7

4

7

2

60

4

6

2

5

3

50

03

7

4

7

2

60

4

6

2

5

3

50

=

13

35

12

35

2

7

4

15

19

35

4

21

1

5

6

35

22

35

.

Now,

P({

the fourth CD is of popular style})

= p(3)13

= the (1, 3)-entry of the matrix M3

=13

35× 4

7+

12

35× 4

6

=108

245.

Hence,

P((

{the fourth CD is not of popular style})

= 1− 108

245

=137

245

≈ 0.5592.

✷

97

8. Markov Chains and Applications

� Example 8.2 (Markov chains ⋆ )

Consider an experiment of throwing a balanced die of six faces. Let Xn be the minimum number of theoutcomes in the first n trials.

(a) What is the state space?

(b) What is the transition probability matrix?

(c) Based on the transition probability matrix, find the probability that the minimum number will be changedafter throwing two more times when the minimum number is initially 3.

Solution

(a) {Xn, n > 1} is a Markov chain with the state space:

S = {1, 2, 3, 4, 5, 6}.

(b) The probability transition matrix is given by

M =

1 0 0 0 0 0

1

6

5

60 0 0 0

1

6

1

6

4

60 0 0

1

6

1

6

1

6

3

60 0

1

6

1

6

1

6

1

6

2

60

1

6

1

6

1

6

1

6

1

6

1

6

.

(c)

The required probability = P (Xn+2 = 1∣∣ Xn = 3) + P (Xn+2 = 2

∣∣ Xn = 3)

= p(2)31 + p

(2)32

= “the (3, 1)-entry of M2” + “the (3, 2)-entry of M2”

= “(the third row of M) • (the first column of M)”

+“(the third row of M) • (the second column of M)”

=(1

6,1

6,4

6, 0, 0, 0

)

•(

1,1

6,1

6,1

6,1

6,1

6

)

+(1

6,1

6,4

6, 0, 0, 0

)

•(

0,5

6,1

6,1

6,1

6,1

6

)

= (1

6× 1 +

1

6× 1

6+

4

6× 1

6) + (

1

6× 0 +

1

6× 5

6+

4

6× 1

6)

=11 + 9

36

=5

9≈ 0.5556.

Note that in the above we don’t have to find all entries of M2.

✷

98

� Example 8.3 (Markov chains ⋆⋆ )

A total number of 8 identical marbles (only with color difference, 5 of them are black and 3 of them arewhite) are randomly distributed in two separated boxes such that each box will contain 4 marbles. Considera trial of an experiment in the following:

• Draw one marble from each box.

• Interchange the marbles.

• Put the interchanged marbles back in the boxes.

Denote Xn be the number of black marbles in the first box after n trials, where n = 1, 2, · · · .

(a) What is the state space?

(b) Find the transition probability matrix of the Markov chain.

(c) Assume that exactly one of the marbles in the first box is initially white. Use (b) or otherwise to findthe probability that the number of white marbles in the first box will be increased right after 2 trials.

Solution

(a) {Xn, n > 1} is a Markov chain with the state space:

S = {1, 2, 3, 4}.

(b) Denote B as black marble and W as white marble as follows. Consider the following four situations:

First box Second box

{B,W,W,W

} {B,B,B,B

}

{B,B,W,W

} {B,B,B,W

}

{B,B,B,W

} {B,B,W,W

}

{B,B,B,B

} {B,W,W,W

}

The probability transition matrix is given by

M =

1

4

3

40 0

2

4· 14

2

4· 34+

2

4· 14

2

4· 34

0

03

4· 24

3

4· 24+

1

4· 24

1

4· 24

0 03

4

1

4

=

1

4

3

40 0

1

8

1

2

3

80

03

8

1

2

1

8

0 03

4

1

4

.

99

8. Markov Chains and Applications

(c)

The required probability = P (Xn+2 = 2∣∣ Xn = 3) + P (Xn+2 = 1

∣∣ Xn = 3)

= p(2)32 + p

(2)31

= “the (3, 2)-entry of M2” + “the (3, 1)-entry of M2”

= “(the third row of M) • (the second column of M)”

+“(the third row of M) • (the first column of M)”

=(

0,3

8,1

2,1

8

)

•(3

4,1

2,3

8, 0)

+(

0,3

8,1

2,1

8

)

•(1

4,1

8, 0, 0

)

= (3

8× 1

2+

1

2× 3

8) + (

3

8× 1

8)

=27

64≈ 0.4219.

✷

100

Formula Sheet (MTH3105/MTH4105 Probability)

August 7, 2015 Preliminary Version

Summarizing Data

Sample mean:

x̄ =1

n

n∑

i=1

xi.

Median:List the numbers in ascending order. Median is:n+12

-th value if n is odd;

mean of n2-th and n+1

2-th value if n is even.

Sample variance:

s2 =1

n− 1

n∑

i=1

(xi − x̄

)2=

1

n− 1

( n∑

i=1

x2i − nx̄2

)

.

Sample standard deviation:

s =√

sample variance.

Range:

range = largest value− smallest value.

Interquartile range:

IQR = Q3 −Q1 = upper quartile− lower quartile.

Frequency table:value x1 x2 · · · xk

frequency f1 f2 · · · fk

• total number of observation: n =k∑

i=1

fi.

• sample mean: x̄ =k∑

i=1

fin

xi.

• sample variance: s2 =1

n− 1

( k∑

i=1

fix2i − nx̄2

)

.

Probability

Consider two events A and B, they are said to be

• mutually exclusive if P (A ∩B) = 0.

• exhaustive if P (A ∪B) = 1.

• independent if P (A ∩B) = P (A) · P (B).

Addition rule:

P (A ∪B) = P (A) + P (B)− P (A ∩B).

Conditional probability:

P (A∣∣ B) =

P (A ∩B)

P (B).

Multiplication rule:

P (A ∩B) = P (A) · P (B∣∣ A) = P (B) · P (A

∣∣ B).

Total probability:

P (A) = P (A∣∣ B) · P (B) + P (A

∣∣ Bc) · P (Bc).

Partition law:

P (A) =

k∑

i=1

P (A ∩Bi) =

k∑

i=1

P (A∣∣ Bi) · P (Bi),

provided that B1, B2, · · · , Bk are mutually exclusive andexhaustive events.

Baye’s formula:

P (B∣∣ A) =

P (A∣∣ B) · P (B)

P (A∣∣ B) · P (B) + P (A

∣∣ Bc) · P (Bc)

or more generally

P (Bi

∣∣ A) =

P (A∣∣ Bi) · P (Bi)

P (A)=

P (A∣∣ Bi) · P (Bi)

k∑

i=1

P (A∣∣ Bi) · P (Bi)

.

Discrete distributions

Mean value:

E(X) = µ =∑

xi∈S

xi · f(xi), f(xi) := P (X = xi).

Expectation rules:

E(a) = a; E(aX) = aE(X); E(X+Y ) = E(X)+E(Y ).

Variance:

Var(X) =∑

xi∈S

(xi − µ)2 · f(xi) =∑

xi∈S

x2i · f(xi)− µ2.

Variance rules:

Var(a) = 0; Var(aX) = a2 Var(X);

Var(X + Y ) = Var(X) + Var(Y ) if X, Y independent.

Binomial distribution:

X ∼ Bin(n, p), P (X = k) = Cnk pk (1− p)n−k

with mean np and variance np(1− p).

Geometric distribution:

X ∼ G(p), P (X = k) = (1− p)k−1 p

Hypergeometric distribution:

X ∼ H(N,n, r), P (X = k) =Cr

k CN−rn−k

CNn

.

Approximating Hypergeometric by Binomial when N ≫ n:

X ∼ H(N,n, r)approx.∼ Bin(n, p), where p =

r

N.

Poisson distribution:

X ∼ Poisson(λ), P (X = k) =λk e−λ

k!.

Approximating Poisson by Normal:

X ∼ Poisson(λ)approx.∼ N(µ, σ2),

where µ = mean = λ, σ2 = variance = λ.

Negative Binomial distribution:

X ∼ NegBin(r, p), P (X = k) = Ck−1r−1 pr (1− p)k−r,

where k = r, r + 1, r + 2, · · · .

Continuous distributions

Distribution function:

F (y) = P (X 6 y) =

∫ y

−∞f(x) dx.

Evaluating probabilities:

P (a < X < b) =

∫ b

a

f(x) dx = F (b)− F (a).

Uniform distribution:

X ∼ U(α, β), f(x) =

1

β − α, if α < x < β,

0, otherwise.

P (a < X < b) =

∫ b

a

f(x) dx.

Normal distribution:

X ∼ N(µ, σ2), where µ = Mean and σ2 = Variance.

Standardization by change of variables: Z =X − µ

σ.

Evaluating probabilities by Standard Normal Table.

Approximating Binomial by Normal:

When np(1− p) > 10, X ∼ Bin(n, p)approx.∼ N(µ, σ2),

where µ = np and σ2 = np(1− p).

Continuity correction factor:

P (a 6 X 6 b) = P( (a− 0.5)− µ

σ6 Z 6

(b+ 0.5)− µ

σ

)

.

Exponential distribution:

X ∼ Exp(µ), f(x) =1

µexp−x/µ

with mean µ and variance µ2.

Relationship between Exponential and Poisson: µ =1

λ.

Joint distributions

Joint probability distribution function:

p(x, y) = P (X = x, Y = y).

Condition for independence of X and Y :

p(x, y) = pX(x)× pY (y) holds for all x, y,

where pX(x) = P (X = x) and pY (y) = P (Y = y).

Sum of independent Binomial random variables:

X ∼ Bin(n, p), Y ∼ Bin(m, p)

=⇒ X + Y ∼ Bin(n+m, p).

Sum of independent Poisson random variables:

X ∼ Poisson(λ1), Y ∼ Poisson(λ2)

=⇒ X + Y ∼ Poisson(λ1 + λ2).

Conditional probability of X, given that X + Y = n:

X ∼ Poisson(λ1), Y ∼ Poisson(λ2), and

X, Y are independent

=⇒ P (X = k∣∣ X + Y = n) = Cn

k pk(1− p)n−k,

where p =λ1

λ1 + λ2.

Documents

MTH3105/MTH4105 SemesterI, 2015/16 Probability …7 14 + 7 15 × 8 15 = 8 15 8 15 × 7 14 + 7 ... Chapter 7. Joint Distribution of Two Random Variables Joint probability density function