MTH6130 SemesterII,2015/16 ProbabilityandStatistics ...Bayes’ theorem / formula Dependent and independent events Chapter 3. Discrete Random Variables Random variable Probability

MTH 6130 Semester II, 2015/16

Probability and Statistics

Review Questions for Quiz and Examination

Dr. Tony Yee

Department of Mathematics and Information Technology

The Hong Kong Institute of Education

December 24, 2015

AMENDMENTS.

Chapter 1.

Chapter 2.

Chapter 3.

Chapter 4.

Chapter 5.

Chapter 6.

Chapter 7.

Chapter 8.

Chapter 9.

Chapter 10.

Contents

Table of Contents ii

1 Descriptive Statistics 5

2 Probability 11

3 Discrete Random Variables 55

4 Continuous Random Variables 67

5 Mathematical Expectation 83

6 Joint Distribution of Two Random Variables 93

7 Sampling Distributions 103

8 Estimation and Confidence Interval 105

9 Hypothesis Testing 109

10 Simple Linear Regression 117

1

CONTENTS

Table of Contents (Keywords for your reference)

Chapter 1. Descriptive Statistics

✷ Parameters and statistics ✷ Scales of measurement ✷ Data handling

✷ Stem-and-leaf plot ✷ Frequency distribution ✷ Histogram ✷ Box-plot

✷ Measures of central tendency (mean, weighted mean, median, mode)

✷ Measures of spread (range, quartiles, inter-quartile range, variance, standard deviation)

✷ Use calculator to find the mean and standard deviation

Chapter 2. Probability

✷ Basic principles of counting ✷ Addition rule ✷ Multiplication rule

✷ Permutation without repetition ✷ Combination without repetition

✷ Permutation with repetition ✷ Combination with repetition

✷ Sample space and events ✷ Axioms of probability ✷ Inclusion-exclusion principle

✷ Venn diagram ✷ Equally likely outcomes ✷ Simple probability

✷ Reduced sample space ✷ Conditional probability

✷ Multiplication rule for conditional probability ✷ Conditioning ✷ Total probability

✷ Bayes’ theorem / formula ✷ Dependent and independent events

Chapter 3. Discrete Random Variables

✷ Random variable ✷ Probability density function (pdf) ✷ Bernoulli distribution

✷ Binomial distribution ✷ Geometric distribution ✷ Hypergeometric distribution

✷ Poisson distribution ✷ Negative binomial distribution

✷ Approximating Binomial by Poisson ✷ Approximating Hypergeometric by Binomial

Chapter 4. Continuous Random Variables

✷ Continuous random variable ✷ Cumulative distribution function ✷ Uniform distribution

✷ Normal distribution ✷ Approximating Binomial by Normal

✷ Approximating Poisson by Normal ✷ Continuity correction factor

✷ Exponential distribution ✷ Relationship between Exponential and Poisson

Chapter 5. Mathematical Expectation

✷ Expected value ✷ Favourable / unfavourable game

✷ Mean and variance of a random variable ✷ Expectation rules ✷ Variance rules

Chapter 6. Joint Distribution of Two Random Variables

✷ Joint probability density function (as a table) ✷ Marginal probability density functions

✷ Necessary and sufficient condition for independent random variables

✷ Sum of two independent Binomial random variables

✷ Sum of two independent Poisson random variables

2

CONTENTS

Chapter 7. Sampling Distributions

✷ Sampling distribution of sample mean ✷ Sampling methods

Chapter 8. Estimation and Confidence Interval

✷ Estimation of mean ✷ Student’s t distribution ✷ Student’s t-table

✷ Confidence interval of mean

✷ Confidence interval of difference of population means

✷ Confidence interval of proportion

Chapter 9. Hypothesis Testing

✷ Hypothesis testing, one-tailed ✷ Hypothesis testing, two-tailed

✷ Hypothesis testing for mean

✷ Hypothesis testing for difference of means from two populations

Chapter 10. Simple Linear Regression

✷ Equation of linear regression line ✷ Coefficient of correlation

✷ Computations using scientific calculator

3

CONTENTS

4

Chapter 1

Descriptive Statistics

� Example 1.1 (Data plot and presentation ⋆ )

The data in the following summarizes a sample of heights recorded from 15 people (cm):

People Heights People Heights

1 174 9 164

2 176 10 188

3 167 11 175

4 162 12 142

5 146 13 158

6 181 14 166

7 198 15 169

8 169

(a) Construct a stem-and-leaf plot of the heights of the sample.

(b) Compute the mean and standard deviation of the heights of the sample using a calculator.

(c) Identify if there is/are any outlying value(s) in the sample.

(d) Construct a box-plot and identify the values of constituting the 5-number summary.

Solution

(a)Stem (in 10cm) Leaf (in 1cm)

14 2 6

15 8

16 2 4 6 7 9 9

17 4 5 6

18 1 8

19 8

(b) By using a calculator (verify yourself!),

Mean = 169, Standard deviation = sn−1 ≈ 14.4469.

5

1. Descriptive Statistics

(c) Q1 = 162, Q2 = 169, Q3 = 176. IQR = inter-quartile range = 176− 162 = 14.

Q1 − 1.5× IQR = 162− 1.5× 14 = 141 and Q3 + 1.5× IQR = 176 + 1.5× 14 = 197.

As all data except 198 are in between 141 and 197, there is only one outlying value.

(d) The 5 numbers are : Min, Q1, Q2, Q3 and Max.

142, 162, 169, 176 and 188.

You may use the 5 numbers to construct the box-plot.

✷

6

� Example 1.2 (Data plot and presentation ⋆ )

The data in the following summarizes a sample of monthly salaries (HK dollars) responsed from 16 freshuniversity graduates:

Graduate Monthly Salary Graduate Monthly Salary

1 14, 000 9 8, 500

2 8, 200 10 13, 600

3 10, 500 11 8, 800

4 11, 000 12 11, 500

5 10, 000 13 9, 900

6 13, 300 14 12, 600

7 9, 500 15 9, 000

8 12, 500 16 13, 000

(a) Construct a stem-and-leaf plot of the monthly salaries.

(b) Compute the mean and the sample standard deviation of the monthly salaries.

(c) Identify if there is/are any outlying value(s) in the data.

(d) Construct a box-plot and identify the values of constituting the 5-number summary.

Solution

(a)

Stem (thousands) Leaf (hundreds)

8 2 5 8

9 0 5 9

10 0 5

11 0 5

12 5 6

13 0 3 6

14 0

(b) Mean = 10993.75, S.D. = sn−1 = 1962.1311.

(c) Q2 = Median =10500 + 11000

2= 10750, Q1 =

9000 + 9500

2= 9250,

Q3 =12600 + 13000

2= 12800. IQR = Inter-Quartile Range = 12800− 9250 = 3550.

Q1 − 1.5× IQR = 9250− 1.5× 3550 = 3925 and Q3 + 1.5× IQR = 12800 + 1.5× 3550 = 18125.

As all data are in between 3925 and 18125, there is no outlying value.

(d) The 5 numbers are: 8200, 9250, 10750, 12800 and 14000. The box-plot is skipped here.

✷

7


� Example 1.3 (Mean and variance ⋆ )

In a marketing survey, a fashion company takes a sample of its retail stores. The sales of the stores in acertain month were as follows:

Sales (hundred thousands) Class mark, x Number of stores, f

0 – less than 2 1 2




8 – less than 10 9 12

10 – less than 12 11 5

12 – less than 14 13 2

(a) Evaluate by first principle the mean and the sample standard deviation of the monthly sales.

(b) Estimate, from the above table, the minimum monthly sales exceeded which the stores are at the top30% of the sample.

(c) The company prepares to distribute a new fashion product through stores having monthly sales morethan or equal to $880,000. Estimate, from the above table, the probability that a retail store will offerthis new product.

Solution

(a) We attempt to evaluate the mean and the sample standard deviation by first principle.

n =∑

f = 53,∑

fx = 373,∑

fx2 = 3021.

Using the above data,

Mean = x =1

n

∑

fx =373

53= 7.0377,

Variance = s2 =1

n− 1

(∑

fx2 − nx2)

=1

52

(

3021− 53× 7.03772)

= 7.6144,

and henceStandard Deviation = s = 2.7594.

Alternatively, you might use your calculator to directly calculate the mean and the standard deviationof the given data. But you are reminded to get more familiar with your calculator on how to inputdata with the corresponding frequencies.

(b) First note that n× 30% = 53× 0.3 = 15.9. Need to find the location of x (hundred thousands)at which it occupies the top 30%. Observe that 15.9− 2− 5 = 8.9, the location of x must be in theinterval [8, 10), of which the length is 2. In order to find x, we use the above table and assume thelinear proportionality such that

3.6

12=

10− x

2.

Solving gives the monthly sales:

x = 8.5167 (hundred thousands).

(c)

P (X > 8.8) =2 + 5 + 12× 1.2

253

=14.2

53= 0.2679.

The probability is 26.79%.

✷

8

� Example 1.4 (Mean, median, quartiles and variance ⋆ )

A manufacturer of metal alloys is concerned about customer complaints concerning the lack of uniformity inthe melting points of one of the firm’s alloy filaments. Fifty filaments are selected and their melting pointsare determined. The following results are obtained:

320 326 325 318 322 320 329 317 316

331 320 320 317 329 316 308 321 319

322 335 318 313 327 314 329 323 327

323 324 314 308 305 328 330 322 310

324 314 312 318 313 320 324 311 317

325 328 319 310 324

(a) Construct a frequency distribution and display the histogram.

(b) Calculate the mean, median, quartiles, sample variance, and standard deviation. How many observa-tions lie within one standard deviation from the mean? Within two standard deviations?

Solution

(a) The frequency distribution is given by

Class Frequency

301 – 305 1

306 – 310 4

311 – 315 7

316 – 320 15

321 – 325 12

326 – 330 9

331 – 305 2

Total 50

We skip the drawing of the histogram here.

(b) With the given raw data values, we find by a calculator that

Mean = x = 320.1, S.D. = sn−1 = s = 6.750, Variance = s2 = 45.561.

In principle, we may further find the median and the quartiles by sorting the given 50 data values inascending order. However, with the help of the frequency distribution of part (a), we find that

Median = mean of 25th and 26th values =1

2(320 + 320) = 320, Q2 = Median = 320,

Q1 = Median of the lower 25 values = 13th data value = 316,

Q3 = Median of the upper 25 values = 38th data value = 325, IQR = Q3 −Q1 = 9.

The interval within one standard deviation which is given by

(x− s, x+ s) = (313.35, 326.85)

contains 31 data values whereas the interval within two standard deviations which is given by

(x− 2s, x+ 2s) = (313.35, 326.85)

contains 48 data values.

✷

9


10

Chapter 2

Probability

� Example 2.1 (Multiplication rule ⋆ )

How many ways are there to place 10 identical balls in 10 boxes of all different colors so that exactly one boxis empty?

Solution The key point is “exactly one box is empty”. That is to say, among the 10 given boxes, onebox is empty, one box contains two balls and the remaining 8 boxes are non-empty each with one ball inside.There are 10 ways of choosing the empty box and 9 ways of choosing the box with two balls. In summary,there are

10× 9 = 90

ways of placing the 10 indistinguishable balls in 10 boxes of all different colors so that exactly one box isempty. ✷

Remark We generalize the given question. How many ways are there to place n identical balls in n boxesof all different colors so that exactly one box is empty where n is an integer larger than 1? Borrowing thesame idea in the above, there are

n× (n− 1) = n(n− 1)

ways of placing n indistinguishable balls in n boxes of all different colors so that exactly one box is empty.


(a) In how many ways can 6 people be lined up to get a bus?

(b) If 2 specific persons, among 6, insist on following each other, how many ways are possible?

(c) If 2 specific persons, among 6, refuse to follow each other, how many ways are possible?

Solution

(a) Required number of ways = 6! = 6× 5× 4× 3× 2× 1 = 720.

(b) Required number of ways = 5!× 2! = 240.

(c) Required number of ways = 6!− 5!× 2! = 480.

✷

Remarks (1) Remember that “people must all be different”. We are counting permutations withoutrepetition. (2) If 2 specific persons insist on following each other, then treat them as “one”.

11

2. Probability

� Example 2.3 (Addition rule ⋆ ) Select 3 digits from 0, 1, 2, 3, 4, 5 and 6.

(a) How many three-digit numbers can be formed?

(b) How many of these are odd numbers?

(c) How many are greater than 330?

Solution

(a) The digit in the hundreds position cannot be zero.

Required number of three-digit numbers = 6× 6× 5 = 180.

(b) The digit in the units position is odd and the digit in the hundreds position is not zero.

Required number of three-digit numbers = 5× 5× 3 = 75.

(c) Case 1. The digit in the hundreds position is greater than 3.

Number of three-digit numbers = 3× 6× 5 = 90.

Case 2. The digit in the hundreds position is 3 and the digit in the tens position is greater than 3.

Number of three-digit numbers = 1× 3× 5 = 15.

In the above, the two cases that we considered are mutually exclusive and exhaustive.

Required number of three-digit numbers = 90 + 15 = 105.

✷


12321, 234432, 11511 are examples of palindromic numbers. How many 5-digit numbers which are palin-dromic?

Solution The number of all possible 5-digit palindromic numbers is given by

9 × 10 × 10 × 1 × 1 = 900.

✷


Consider a 3-digit combination lock with digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. How many choices of the 3-digitpasswords of the combination lock if consecutive repetition is not allowed? For examples, 454 is an allowedpassword while 445 is not.

Solution The number of all possible 3-digit passwords is given by

10 × 9 × 9 = 810.

✷

� Example 2.6 (Multiplication rule ⋆⋆ )

Find the number of ways in which 5 boys and 4 girls can be seated alternatively in a row and if in particularJohn and Mary have to sit next to each other (note that only one boy is named as John and only one girl isnamed as Mary among the 9 people).

12

Solution The seating plan must be in this form (G=girl, B=boy):

B G B G B G B G B.

Imagine that Case 1. John is either sitting in the 1st seat (then what is the position of Mary?) orCase 2. John in the 3rd seat or Case 3. John in the 5th seat or Case 4. John in the 7th seat orCase 5. John in the 9th seat (i.e., the last seat). In the above, Case 1 (i.e., John sitting in the 1st seat)introduces

3!× 4!

different seating arrangements, while each case from Case 2 to Case 4 introduces

(

3!× 4!)

× 2

different seating arrangements. The last case, Case 5, introduces

3!× 4!

different seating arrangements. Hence, the total of different such arrangements is given by

(

1 + 3× 2 + 1)

× 3!× 4! = 1,152.

✷

Remark In the above, the five cases that we considered are mutually exclusive and exhaustive.


A bookshelf contains 3 German books, 4 French books and 5 Chinese books in a row. Each book is differentfrom one another. What is the number of arrangements that no two Chinese books must be next to eachother?

Solution Align the German books and French books first. Putting these 3+4 = 7 books creates 7+1 = 8spaces (we count the space before the first book, the spaces between books and the space after the last book):

1st book 2nd book 3rd book 4th book 5th book 6th book 7th book.

To guarantee that no two Chinese books are next to each other, we put them into these spaces. The firstChinese book can be put into any of 8 spaces, the second into any of 7 spaces, etc., the fifth Chinese bookcan be put into any of 4 spaces. Now, the non-Chinese books can be permuted in 7! ways. Thus the totalnumber of permutations is

(

8× 7× 6× 5× 4)

× 7! = 33,868,800.

There are more than 33 million arrangements of the books. ✷


Two German, three French and four Chinese are to be seated in a row. What is the number of differentseating arrangements that a Chinese will not sit next to another Chinese but the two German must sit nextto each other?

Solution Treat the two German as “a single people”. Then align German and French first. Putting these1 + 3 = 4 “people” together in a row creates 5 spaces (we count the space before the first , the spacesbetween them and the space after the last). To assure that no two Chinese are seated next to each other, weput them into these spaces. The first Chinese can be seated into any of 5 spaces, the second into any of 4spaces, the third into any of 3 spaces, the fourth into any of 2 spaces. Now, the non-Chinese can be seatedin 4!× 2! different ways. Thus, based on the given rules, the number of different seating arrangements is

(

5× 4× 3× 2)

× 4!× 2! = 5,760.

13

2. Probability

The LHS of the above equation can be rewritten as

P 54 × P 4

4 × P 22 = 5,760.

✷

Remark Compare the similarities and differences between Example 2.7 and Example 2.8. Could youcatch the difference between the usage of “German books” and “German people”, respectively, in the twoexamples?

� Example 2.9 (Multiplication rule ⋆⋆⋆ )

5 red marbles and 5 white marbles are to be placed in a row. All marbles are identical except for colors. At nopoint in the row may three or more consecutive marbles have the same color. How many such arrangementsare possible?

Solution Let R and W denote red and white marbles respectively.

permutation(s)

Case 1. R : 1 1 1 1 1 1

(i): W : 1 1 1 1 1 ×1× 2 = 2

(ii): 2 1 1 1 ×4× 1 = 4

Case 2. R : 2 1 1 1 4

(i): W : 1 1 1 1 1 ×1× 1 = 4

(ii): 2 1 1 1 ×4× 2 = 32

(iii): 2 2 1 ×3× 1 = 12

Case 3. R : 2 2 1 3

(i): W : 2 1 1 1 ×4× 1 = 12

(ii): 2 2 1 ×3× 2 = 18

The above 7 cases(Case 1 (i); Case 1 (ii); Case 2 (i); Case 2 (ii); Case 2 (iii); Case 3 (i); Case 3 (ii)

)

are mutually exclusive and exhaustive. The total number of such arrangements will simply be given by thesum of all numbers:

2 + 4 + 4 + 32 + 12 + 12 + 18 = 84.

✷

Remark We may change the given question: 5 red marbles and 4 white marbles are to be placed ina row. All marbles are identical except for colors. At no point in the row may three or more consecutivemarbles have the same color. How many such arrangements are possible? What is the answer to thischanged question?

Answer: 45

� Example 2.10 (Inclusion-exclusion principle ⋆ )

Consider an experiment that consists of six horses, numbered 1 through 6, running a race and suppose thatthe sample space consists of the 6! possible orders in which the horses finish. Let A be the event that thenumber 1 horse is among the top three finishers, and let B be the event that the number 2 horse comes insecond. How many outcomes are in the event A ∪B ?

14

Solution Since there are 5! = 120 outcomes in which the position of number 1 horse is specified, it followsthat n(A) = 3× 120 = 360, the number 1 horse is among the top three finishers. Similarly, n(B) = 120,and n(A ∩B) = 2× 4! = 48. It follows from the inclusion-exclusion principle that

n(A ∪B) = n(A) + n(B)− n(A ∩B).

We obtain thatn(A ∪B) = 360 + 120− 48 = 432.

✷

� Example 2.11 (Round table ⋆ )

12 people are randomly seated at a round table. How many seating arrangements that John and Mary willsit next to each other?

Solution Let us assume that only one male is named as John and only one female is named as Maryamong the 12 people. Let us also assume that the 12 people are randomly and regularly located in a circle.We “cut” the circle at the location of John into a straight row as shown in the figure below. There exist 2possible cases of seating arrangements that John and Mary will sit next to each other.

Case 1.

John Mary

Case 2.

John Mary

The above two cases are mutually exclusive and exhaustive. The total number of seating arrangements istherefore given by

10! + 10! = 10!× 2 = 7,257,600

which is a large number (larger than 7 million). ✷

� Example 2.12 (Round table ⋆⋆ )

Assume that 2 married couples and one single man (five people in total) are seated randomly at a roundtable. How many seating arrangements can be made if no wife sits next to her husband?

Solution Denote the married couples and the single man as (A1, B1), (A2, B2) and B3, respectively.There are two possible seating arrangements respectively shown in the following: The above two cases are

Case 1.A1 B1

A2 B2 ×2

A2 B2 ×2

Case 2.

A1 B1

A2 B2 ×2

A2 B2 ×2

mutually exclusive and exhaustive. The required number of different seating arrangements is given by

2× 2 + 2× 2 = 8.

✷

15

2. Probability

� Example 2.13 (Permutation vs. combination ⋆ )

A password code consists of six digits. How many different password codes may be formed from

(a) six digits chosen from {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} allowing repetition?

(b) six different digits chosen from {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}?

(c) six different digits chosen from {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, restricted to be in ascending order?

Solution

(a) There are10× 10× 10× 10× 10× 10 = 106 = 1,000,000

different password codes.

(b) There are10× 9× 8× 7× 6× 5 = P 10

6 = 151,200

different password codes.

(c) Since each selection (without order) of six different digits will correspond to only one password code.There are

C106 =

151200

6!= 210

different password codes. For an illustrative example, for the selected six numbers: {4, 6, 9, 2, 1, 5},the corresponding (one and only one) password code is: 124569.

✷

� Example 2.14 (Combination rule ⋆ )

A shipment of 12 computer monitors contains 3 defective ones. In how many ways can an office purchase 5of these monitors and receive at least 2 of the defective monitors?

Solution

Required number of ways = n(two defective among five) + n(three defective among five)

= C32 × C9

3 + C33 × C9

2

= 252 + 36 = 288.

✷


From a group of 7 women and 9 men, a committee consisting of 4 women and 5 men is being formed. Howmany different committees can be formed if two of the women in the group do not really like each other andrefuse to serve on the committee together?

Solution It follows from the multiplication rule that there are

C74 × C9

5 = 35× 126 = 4,410

possible committees consisting of 4 women and 5 men in total. However, according to the given question,two of the women in the group refuse to serve on the committee together, then there are

C20 × C5

4 + C21 × C5

3 = 5 + 20 = 25

groups of 4 women not containing both of the feuding women. Since there are C95 = 126 ways to choose

the 5 men, it follows that, in this case, there are

25× 126 = 3,150

possible committees. ✷

16


A six-digit password code is palindromic if reversing it gives the same code. For example, both 321123 and142241 are palindromic password codes, but 134413 is not. It follows that a palindromic password code caninvolve at most three different digits (i.e., abccba). How many palindromic six-digit password codes can beformed using some of the nine digits: 1, 2, 3, 4, 5, 6, 7, 8, 9, if it is further required that one digit of thepalindromic password code is used four times.

Solution If one digit is used 4 times in a palindromic code, another one must be used twice. These twodigits may be chosen in {1, 2, 3, 4, 5, 6, 7, 8, 9} and there are

C92 = 36

ways of selections. Once the two digits have been chosen, say 1 and 2 for an example, only 6 patterns arepossible, which are

1 1 2 1 2 1 2 1 1

2 2 1 2 1 2 1 2 2

which correspond to the 6 password codes:

1 1 2 2 1 1 1 2 1 1 2 1 2 1 1 1 1 2

2 2 1 1 2 2 2 1 2 2 1 2 1 2 2 2 2 1

The total number of six-digit password codes satisfying the given conditions is given by

6× 36 = 216.

✷

� Example 2.17 (Combination rule ⋆⋆ )

A certain bank assigns each credit card holder a four-digit PIN (personal identity number). Each PIN iscomposed of 4 digits using any of the following digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. How many differentPINs are there if the first digit cannot be zero and no digit may occur more than twice.

Solution All of the 9× 10× 10× 10 = 9,000 possible PINs are allowed (first digit is nonzero) except:

(i) the 9 PINs where all four digits are identical: 1111, 2222, 3333, · · · , 9999.

(ii) those where one digit occurs three times and another just one. There are C102 = 45 ways of choosing

any two digits, say 1 and 2. Note that there are 8 different PINs if the two digits are fixed:

1222 2122 2212 2221

2111 1211 1121 1112

But be careful that the PINs of the following 4 patterns should be omitted since the first digit maynot be zero (where a can be any digit from 1, 2, 3, 4, 5, 6, 7, 8, 9):

0aaa 0a00 00a0 000a

There are 45× 8− 4× 9 = 324 PINs of this type.

The required number of all possible PINs is given by

9,000− 9− 324 = 8,667.

✷

17

2. Probability

� Example 2.18 (Combinations rule ⋆ )

A standard pack of 52 poker cards consists of 4 suits (Spades ♠, Hearts ♥, Diamonds ♦ and Clubs ♣). Awell-shuffled pack is dealt to 4 players so that each receives 13 cards. What is the number of ways that eachplayer receives at least 3 Spades? You may have your answer in terms of factorials.

Solution The division of Spades must be 3, 3, 3, 4 between players, say one of whom can be the personwho receives 4, there are 4 ways for this. The number of ways that each player receives at least 3 Spades isgiven by

4× C133 C39

10 × C103 C29

10 × C73C

1910 × C4

4C99

= 4× 13!

3! 10!

39!

10! 29!× 10!

3! 7!

29!

10! 19!× 7!

3! 4!

19!

10! 9!× 1

=13! 39!

9! (3!)4 (10!)3.

We have no idea about the numerical value of the above expression without the help of computer software.But it is clear enough that it must be a (large) integer. In fact I have used the computer softwareMathematica

to successfully find its value:

5,652,079,478,333,572,557,297,024,000 ≈ 5.652 octillion

= 56.52 trillion trillion.

How much is a trillion (dollars)? Check this out: http://www.pagetutor.com/trillion ✷

Remark We may change the given question: · · · What is the number of ways that each player receivesat least 3 Spades and at least 3 Hearts? What is the answer to this changed question?

Answer: C134 C13

4 C265 × C9

3C93C21

7 × C63C6

3C147 × 4 + C13

4 C133 C26

6 × C93C10

4 C206 × C6

3C63C14

7 × 12.

� Example 2.19 (Combination rule ⋆⋆ )

Ants love sweets. The figure below shows a network of routes from an ant’s initial position (labeled A) to theplace of sweets (labeled C). Assume that the ant can only go either east or north at each junction (such asB).

��

��

��

North

East

C

B

A

(a) How many routes from A to C can the ant choose?

(b) Once the ant arrives at a junction, the probability that it goes north is 0.4.

(i) Find the probability that the ant will arrive at C.

(ii) Find the probability that the ant will arrive at C without passing through the junction B.

Solution Let N denote “North” and E denote “East”.

18

(a) Number of routes = n(permutations of 4 E’s and 3 N’s) = C73 = 35.

(b) (i) P (Ant will arrive at C) = C73 × (0.4)3(0.6)4 = 0.2903.

(ii) P (Ant will arrive at C without passing B) =(C7

3 − C31 × C4

2

)× (0.4)3(0.6)4 = 0.1410.

✷

� Example 2.20 (Permutations with repetition ⋆ )

How many different letter arrangements can be made from the letters in the word of PROBABILITY?

Solution By permutations with repetition, there are

11!

1! 2! 2! 1! 1! 1! 1! 1! 1!= 9,979,200

different letter arrangements. Here we have total 11 letters, while 2 letters (B, I) appear twice, and allremaining letters (A, L, O, P , R, T , Y ) appear once each. ✷


In how many ways can 8 graduate students be assigned to one double and two triple hotel rooms during aconference?

Solution By permutations with repetition, there are

8!

2! 3! 3!= 560

different hotel-room arrangements. ✷


Six identical fair dice are rolled and subsequently arranged in a row. How many different arrangements ofgetting three pairs? (“Three pairs” means for example “a pair of 1, a pair of 2 and a pair of 5”.)

Solution “Three pairs” means a choice of 3 numbers out of the 6 numbers from 1 to 6. One can now aska question “which three pairs”? The answer is given by sampling: C6

3 = 20. Now we can focus on one ofthe 20 cases, say {1, 2, 5}, and figure out the probability of getting “a pair of 1, a pair of 2 and a pair of 5”.The number of ways that the 6 dice can show the pattern (1, 1, 2, 2, 5, 5) is given by

6!

2! 2! 2!= 90.

Finally, we multiply this by the number of choices of the 3 numbers to get

C63 × 6!

2! 2! 2!= 20× 90 = 1,800.

✷

Remark Six identical fair dice are rolled and subsequently arranged in a row. There are a total of

66 = 46,656

different arrangements. Among all these arrangements, there are 1,800 belonging to the class of “threepairs”. However, “three pairs” is only one class of arrangements. In the following we would like to find allmutually exclusive and exhaustive classes of arrangements.

19

2. Probability

Class An example No. of arrangements

Three pairs: 2 2 3 3 5 5 C63 ×

(C6

2 × C42 × C2

2

)= 1,800.

Two pairs: 2 2 3 3 4 5(C6

2 × C42

)×(C6

2 × C42 × 2!

)= 16,200.

One pair: 2 2 3 4 5 6(C6

1 × C54

)×(C6

2 × 4!)

= 10,800.

No pair: 1 2 3 4 5 6 6! = 720.

Three of a kind: 2 2 2 3 4 5(C6

1 × C53

)×(C6

3 × 3!)

= 7,200.

Four of a kind: 2 2 2 2 4 5(C6

1 × C52

)×(C6

4 × 2!)

= 1,800.

Five of a kind: 2 2 2 2 2 5(C6

1 × C51

)×(C6

5 × 1)

= 180.

Six of a kind: 2 2 2 2 2 2 C61 = 6.

Three and Two: 2 2 2 3 3 4(C6

1 × C51 × C4

1

)×(C6

3 × C32 × 1

)= 7,200.

Four and Two: 2 2 2 2 4 4(C6

1 × C51

)×(C6

4 × 1)

= 450.

Three and Three: 2 2 2 3 3 3 C62 × C6

3 = 300.

Adding the numbers of arrangements altogether gives

1,800 + 16,200 + 10,800 + 720 + 7,200 + 1,800 + 180 + 6 + 7,200 + 450 + 300 = 46,656.

� Example 2.23 (Permutations with repetition ⋆⋆ )

A Personal Identification Number (PIN) consists of five digits in order, each of which may be any one of 0,1, 2, 3, 4, 5, 6, 7, 8, 9. Find the number of PINs satisfying each of the following requirements.

(a) All five digits are different.

(b) There are exactly four different digits being used.

(c) There are exactly three different digits being used, two of which occurs twice.

(d) Exactly one of the digits occurs three times.

Solution

(a) n(PINs with all five digits different) = P 105 = 30,240.

(b) n(PINs with exactly four different digits being used) = C101 × C9

3 × 5!

2! 1! 1! 1!= 50,400.

(c) n(PINs with exactly three different digits being used, two of which occurs twice)

= C102 × C8

1 × 5!

2! 2! 1!= 10,800.

(d) n(PINs with one digit occurs three times) = C103 × C3

1 × 5!

3! 1! 1!+ C10

2 × C21 × 5!

3! 2!= 8,100.

✷

20

� Example 2.24 (Combination with repetition ⋆ )

Let n, x1, x2, · · · , xr be positive integers. How many distinct integer-valued solutions of

x1 + x2 + · · ·+ xr = n (n > r)

are possible?

Solution We rewrite the given equation as

x1 + x2 + · · ·+ xr = n =

summing n number of “1”’s︷︸︸︷

1 + 1 + · · ·+ 1 .

On the rightmost of the above equation, there are (n−1) number of the plus-sign “+” whereas on the leftmost,there are (r − 1) number of the plus-sign “+”. The total number of distinct integer-valued solutions to thegiven equation will be given by

Cn−1r−1 .

✷

Remark You may get stuck on why the answer is given by Cn−1r−1 . Now we would like to give the

reasoning and the details through an illustrative example. Let x1, x2, x3 be positive integers. How manydistinct integer-valued solutions of

x1 + x2 + x3 = 6

are possible? The answer to this (simple) question is 10. For illustration, of course, we can list all theseinteger-valued solutions in the table below.

Solution No. x1 x2 x3 x1 + x2 + x3 = 6

1. 1 1 4(1)

+++(1)

+++(1 + 1 + 1 + 1

)= 6

2. 1 2 3(1)

+++(1 + 1

)+++

(1 + 1 + 1

)= 6

3. 1 3 2(1)

+++(1 + 1 + 1

)+++

(1 + 1

)= 6

4. 1 4 1(1)

+++(1 + 1 + 1 + 1

)+++

(1)

= 6

5. 2 1 3(1 + 1

)+++

(1)

+++(1 + 1 + 1

)= 6

6. 2 2 2(1 + 1

)+++

(1 + 1

)+++

(1 + 1

)= 6

7. 2 3 1(1 + 1

)+++

(1 + 1 + 1

)+++

(1)

= 6

8. 3 1 2(1 + 1 + 1

)+++

(1)

+++(1 + 1

)= 6

9. 3 2 1(1 + 1 + 1

)+++

(1 + 1

)+++

(1)

= 6

10. 4 1 1(1 + 1 + 1 + 1

)+++

(1)

+++(1)

= 6

Apart from carefully listing all these solutions (so that after counting we know the answer is 10) we mayhave a quicker and more elegant method to “count” the total number of solutions (= 10). This can be doneby selecting two plus-signs “+” from the available five “+”. Look at the column “x1+x2+x3” in the abovetable for details. Then the total number of solutions to the equation x1 +x2 +x3 = 6 is given by the totalnumber of such selections. Counting the number of selections when in general you select 2 from 5 distinctobjects gives C5

2 which is equal to 10.


Let n be a positive integer. Let x1, x2, · · · , xr be non-negative integers (i.e., either positive or zero). Howmany distinct integer-valued solutions of

x1 + x2 + · · ·+ xr = n

are possible?

21

2. Probability

Solution Denote yi = xi + 1 for all i. Next rewrite the equation x1 + x2 + · · ·+ xr = n as

(x1 + 1

)+(x2 + 1

)+ · · ·+

(xr + 1

)= n+ r,

y1 + y2 + · · ·+ yr = n+ r.

Our focus is on the last equation of which the RHS can be rewritten as

y1 + y2 + · · ·+ yr = n+ r =

summing (n+ r) number of “1”’s︷︸︸︷

1 + 1 + 1 + · · ·+ 1 .

On the rightmost of the above equation, there are (n+ r − 1) number of the plus-sign “+” whereas on theleftmost, there are (r−1) number of the plus-sign “+”. The total number of distinct integer-valued solutionsto the given equation will be given by

Cn+r−1r−1 .

✷

Remark We revisit the last illustrative example. Let x1, x2, x3 be non-negative integers. How manydistinct integer-valued solutions of

x1 + x2 + x3 = 6

are possible? The answer to this question is 28. We may list all these integer-valued solutions in the tablebelow.

Solution No. x1 x2 x3 Solution No. x1 x2 x3

1. 0 0 6 15. 2 1 3

2. 0 1 5 16. 2 2 2

3. 0 2 4 17. 2 3 1

4. 0 3 3 18. 2 4 0

5. 0 4 2 19. 3 0 3

6. 0 5 1 20. 3 1 2

7. 0 6 0 21. 3 2 1

8. 1 0 5 22. 3 3 0

9. 1 1 4 23. 4 0 2

10. 1 2 3 24. 4 1 1

11. 1 3 2 25. 4 2 0

12. 1 4 1 26. 5 0 1

13. 1 5 0 27. 5 1 0

14. 2 0 4 28. 6 0 0

Again, the number of solutions can be deduced by selecting two “+” from the available eight “+”. Foran example,

Solution No. x1 x2 x3

y1︷︸︸︷(x1 + 1

)+

y2︷︸︸︷(x2 + 1

)+

y3︷︸︸︷(x3 + 1

)= 9

1. 0 0 6(1)

+(1)

+(1 + 1 + 1 + 1 + 1 + 1 + 1

)= 9

The total number of solutions to the equation x1+x2+x3 = 6 (where x1, x2, x3 are non-negative integers)which is equal to the total number of solutions to the equation y1 + y2 + y3 = 9 (where y1, y2, y3 arepositive integers) will be given by the total number of such selections. Counting the number of selectionswhen in general you select 2 from 8 distinct objects gives C8

2 = 28.

22


You have a box with red sweets, a box with yellow sweets and a box with black sweets. In how many ways canyou choose 10 sweets from these 3 boxes provided that you can taste the sweets of all three colors? Assumethat each box has a lot of sweets.

Solution The question is equivalent to finding the number of positive integer-valued solutions to

x1 + x2 + x3 = 10

where x1, x2, x3 are all positive integers. By Example 2.24 (page 21), we know that

total number of ways = C10−13−1 = C9

2 = 36.

Note that this question is a distribution problem and thus belonging to combination with repetition problem.✷


There are five flavors of ice cream: banana, chocolate, lemon, strawberry and vanilla. We can have threescoops. How many variations will there be?

Solution Denote the five flavors as: b, c, l, s, v. Examples of selected three scoops include{c, c, c

}means 3 scoops of chocolate,

{b, l, v

}means one each of banana, lemon and vanilla,

{b, v, v

}means one of banana, two of vanilla.

Note that in the above notation, for example,{b, l, v

}={l, v, b

}={v, b, l

}, they all mean one each of

banana, lemon and vanilla. The five favors ice cream problem is equivalent to finding the number of non-negative integer-valued solutions to

x1 + x2 + x3 + x4 + x5 = 3

where xi are all non-negative integers. This question is also equivalent to distributing 3 identical ping-pongballs into 5 different colored containers (some containers can be empty). We are interested in how manydifferent ways. Each way represents a possible combination (repetition is allowed). For examples,

(x1, x2, x3, x4, x5) = (0, 3, 0, 0, 0) is equivalent to{c, c, c

},

(x1, x2, x3, x4, x5) = (1, 0, 1, 0, 1) is equivalent to{b, l, v

},

(x1, x2, x3, x4, x5) = (1, 0, 0, 0, 2) is equivalent to{b, v, v

}.

By Example 2.25 (page 21), we know that

total number of variations = C3+5−15−1 = C7

4 = 35.

✷


Find the number of integer-valued solutions to

x1 + x2 + x3 + x4 = 100,

where x1 > 30, x2 > 21, x3 > 1 and x4 > 0.

Solution Let y1 = x1 − 29 > 1, y2 = x2 − 21 > 1, y3 = x3 > 1, y4 = x4 + 1 > 1. Next rewrite thegiven equation x1 + x2 + x3 + x4 = 100 as

(x1 − 29

)+(x2 − 21

)+(x3

)+(x4 + 1

)= 100− 29− 21 + 1,

y1 + y2 + y3 + y4 = 51.


total number of ways = C51−14−1 = C50

3 = 19,600.

✷

23

2. Probability

� Example 2.29 (Combination with repetition ⋆⋆ )

How many non-negative integer valued solutions to (the inequality)

x1 + x2 + x3 + x4 + x5 + x6 < 10?

Solution The question is equivalent to finding the number of integer valued solutions to the equation

x1 + x2 + x3 + x4 + x5 + x6 + x7 = 10,

where x1, x2, x3, x4, x5, x6 > 0 and x7 > 0. Let yi = xi > 0, 1 6 i 6 6 and y7 = x7 − 1 > 0. Nextrewrite the above equation as

x1 + x2 + x3 + x4 + x5 + x6 +(x7 − 1

)= 10− 1,

y1 + y2 + y3 + y4 + y5 + y6 + y7 = 9.


total number of ways = C9+7−17−1 = C15

6 = 5,005.

✷


In how many ways can we distribute 12 identical folders into 5 distinct drawers such that the last drawer hasat most 3 folders in it? (Assume that the drawers can be empty.)

Solution The question is equivalent to finding the number of distinct integer solutions to

x1 + x2 + x3 + x4 + x5 = 12 in which “x5 = 0, 1, 2, 3” and “x1, x2, x3, x4 > 0”.

It follows from Example 2.25 (page 21) that when

x5 = 0: x1 + x2 + x3 + x4 = 12, x1x2, x3, x4 > 0. The number of solutions is C12+4−14−1 = C15

3 .

x5 = 1: x1 + x2 + x3 + x4 = 11, x1, x2, x3, x4 > 0. The number of solutions is C11+4−14−1 = C14

3 .


3 .


3 .

Total number of ways = C153 + C14

3 + C133 + C12

3

= 455 + 364 + 286 + 220 = 1,325.

✷


In how many ways can we distribute eight identical balls into four distinct containers so that the fourthcontainer has an odd number of balls in it? (Assume that the other containers can be empty.)

Solution This question is equivalent to finding the number of distinct integer solutions to

x1 + x2 + x3 + x4 = 8 in which x1, x2, x3 > 0 and x4 = 1, 3, 5 or 7.

It follows from Example 2.25 (page 21) that when

x4 = 1: x1 + x2 + x3 = 7, x1, x2, x3 > 0. The number of solutions is C7+3−13−1 = C9

2 .


2 .


2 .


2 .

Total number of ways = C92 + C7

2 + C52 + C3

2

= 36 + 21 + 10 + 3 = 70.

✷

24

� Example 2.32 (Sudoku counting ⋆⋆⋆ )

Sudoku puzzles usually start with some hint entries. If you start with a completely empty 4× 4 grid Sudokuboard, how many different ways are there to fill it in?

The figure above which shows a completed Sudoku board is counted as one way of filling.

Solution Method 1 (Wrong answer just for illustration)

Total number of distinct Sudoku =16!

4! 4! 4! 4!= 63,063,000.

Method 1 is incorrect and the answer is wrong. (Why?)

Method 2 (Wrong answer just for illustration)

1st Row −→ 4× 3× 2× 1

2nd Row −→ 2× 1× 2× 1

3rd Row −→ 2× 2× 1× 1

4th Row −→ 1× 1× 1× 1

Total number of distinct Sudoku = 24× 4× 4 = 384.

Method 2 is incorrect and the answer is wrong. (Why?)

Method 3 (Suggested method)

Step 1.

First row: 4! = 24 possibilities.

Step 2.

To fill the first block there are now 2 possibilities (i.e., 3 4 or 4 3).

25

2. Probability

Step 3.

(a) (b)

To fill the second block there are 2 possible cases (a) and (b) (respectively, 1 2 and 2 1).

Step 4.

(a) (b)

For case (a), there are 4 possibilities for the third row:

2 1 4 3, 2 3 4 1, 4 1 2 3, 4 3 2 1.

For case (b), there are only 2 possibilities for the third row:

2 1 4 3, 4 3 1 2.

Step 5 (final step).

Total number of distinct Sudoku = 4!× 2×(4 + 2

)= 288.

Method 4 (Alternative method)

4! ×2 × 2 ×3

In the third Sudoku board, the symbol ∗ can be either 1, 2 or 3 (3 possibilities).

Total number of distinct Sudoku = 4!× 2× 2× 3 = 288.

✷

Remark How many 9× 9 distinct Sudoku puzzles are there? There are Many. Many more than you canimagine! There are

6,670,903,752,021,072,936,960 distinct Sudoku puzzles.

And, how many of them are essentially distinct (inequivalent)? We can create many Sudoku puzzles outof a given one by: transposing / flipping it, interchanging “stacks”, interchanging “bands”, interchangingcolumns in a stack, interchanging rows in a band, relabeling of the digits, rotating it. All these changescreate group actions on the set of all Sudoku. In fact,

5,472,730,538 of the above are essentially distinct.

Felgenhauer & Jarvis, Mathematics of Sudoku I (2006), Russel & Jarvis, Mathematics of Sudoku II (2006)

26

� Example 2.33 (Mark Six ⋆⋆⋆ )

In Mark Six, 6 numbers are drawn out of a possible 49 (the “extra number” has been neglected here).

(a) What is the number of all possible outcomes?

(b) In which how many consisting of consecutive three numbers?

Solution

(a)

The number of all possible outcomes = C496 = 13,983,816.

Note. Before we can answer part (b) we have to state clearly the meaning of the keyword in thisexample. The following are some examples of “consecutive-three”:

{

22, 27, 28, 29, 42, 46}

,{

22, 23, 28, 41, 42, 43}

,{

22, 38, 39, 41, 42, 43}

whereas some examples of “not-consecutive-three”:

{

26, 27, 28, 29, 42, 46}

,{

25, 26, 27, 28, 29, 48}

,{

26, 27, 28, 41, 42, 43}

.

(b) Let x1, x2, x3, x4, x5, x6 be the drawn numbers such that

1 6 x1 < x2 < x3 < x4 < x5 < x6 6 49.

We further define

x0 := 0, x7 := 50 and ci := xi − xi−1 for i = 1, 2, 3, 4, 5, 6, 7.

Note that in the above definition of ci, for examples, c2 is the difference between the smallest two drawnnumbers, c6 is the difference between the largest two drawn numbers, basically, ci is the differencebetween two consecutive drawn numbers. Again by the above definitions,

c1 + c2 + c3 + c4 + c5 + c6 + c7 = 50, where ci are all positive integers.

Now we consider the following two cases: Case 1: with a consecutive three and without a consecutivetwo; Case 2: with a consecutive three and a consecutive two.

Case 1: with a consecutive three and without a consecutive two.

(i) x1, x2, x3 are consecutive. Then

c2 = 1, c3 = 1, c4, c5, c6 > 1 and this sub-case is denoted by[1 1 >1 >1 >1

].

(ii) x2, x3, x4 are consecutive. Then

c3 = 1, c4 = 1, c2, c5, c6 > 1 and this sub-case is denoted by[>1 1 1 >1 >1

].

(iii) x3, x4, x5 are consecutive. Then

c4 = 1, c5 = 1, c2, c3, c6 > 1 and this sub-case is denoted by[>1 >1 1 1 >1

].

(iv) x4, x5, x6 are consecutive. Then

c5 = 1, c6 = 1, c2, c3, c4 > 1 and this sub-case is denoted by[>1 >1 >1 1 1

].

27

2. Probability

The above four sub-cases are symmetric so that the numbers of possible solutions for each sub-caseare the same. Take (i) for an example in the following (c2 = c3 = 1). Denoting

c′1 = c1, c′4 = c4 − 1, c′5 = c5 − 1, c′6 = c6 − 1, c′7 = c7

simplifies the equation c1 + c2 + c3 + c4 + c5 + c6 + c7 = 50 to

(c′1)+(1)+(1)+(c′4 + 1

)+(c′5 + 1

)+(c′6 + 1

)+(c′7)

= 50

orc′1 + c′4 + c′5 + c′6 + c′7 = 45, where c′i are all positive integers.

By Example 2.24 (page 21) we know that the number of possible solutions for this simplified equationis C45−1

5−1 = C444 and hence the total number of possible solutions for Case 1 is

4× C444 .

Case 2: with a consecutive three and a consecutive two. There are a total of 6 sub-cases which aresymmetric.

Sub-case An example for reference

[1 1 >1 1 >1

] {

7, 8, 9, 19, 20, 36}

[1 1 >1 >1 1

] {

9, 10, 11, 29, 42, 43}

[>1 1 1 >1 1

] {

12, 17, 18, 19, 22, 23}

[1 >1 1 1 >1

] {

14, 15, 18, 19, 20, 42}

[1 >1 >1 1 1

] {

18, 19, 26, 31, 32, 33}

[>1 1 >1 1 1

] {

22, 38, 39, 41, 42, 43}

Take the first sub-case for an example in the following (c2 = c3 = c5 = 1). Denoting

c′1 = c1, c′4 = c4 − 1, c′6 = c6 − 1, c′7 = c7

simplifies the equation c1 + c2 + c3 + c4 + c5 + c6 + c7 = 50 to

c′1 + c′4 + c′6 + c′7 = 45, where c′i are all positive integers.

By Example 2.24 (page 21) we know that the number of possible solutions for this simplified equationis C45−1

4−1 = C443 and hence the total number of possible solutions for Case 2 is

6× C443 .

Combining Case 1 and Case 2, the total number of all combinations is given by

4× C444 + 6× C44

3 = 543,004 + 79,464

= 622,468.

✷

28

� Example 2.34 (Hong Kong mahjong ⋆⋆⋆ )

What is the number of different combinations of “13 wans” in a Hong Kong mahjong game?

Solution The question is equivalent to counting how many integer-solutions (k1, k2, k3, k4, k5, k6, k7, k8, k9)satisfying the following equation

k1 + k2 + k3 + k4 + k5 + k6 + k7 + k8 + k9 = 13,

where all ki are non-negative integers such that 0 6 ki 6 4. Examples of some integer-solutions include:(compare the third one with the figure above)

(4, 4, 4, 1, 0, 0, 0, 0, 0

),

(2, 2, 2, 2, 2, 2, 1, 0, 0

),

(3, 1, 1, 1, 1, 1, 1, 1, 3

).

Below we consider the mutually exclusive and exhaustive “classes”. For each class we may count the numberof permutations with repetitions:

Class No. Integer-solution No. of permutations

with repetitions

1.(4, 4, 4, 1, 0, 0, 0, 0, 0

) 9!

3! 1! 5!= 504

2.(4, 4, 3, 2, 0, 0, 0, 0, 0

) 9!

2! 1! 1! 5!= 1,512

3.(4, 4, 3, 1, 1, 0, 0, 0, 0

) 9!

2! 1! 2! 4!= 3,780

4.(4, 4, 2, 2, 1, 0, 0, 0, 0

) 9!

2! 2! 1! 4!= 3,780

5.(4, 4, 2, 1, 1, 1, 0, 0, 0

) 9!

2! 1! 3! 3!= 5,040

6.(4, 4, 1, 1, 1, 1, 1, 0, 0

) 9!

2! 5! 2!= 756

7.(4, 3, 3, 3, 0, 0, 0, 0, 0

) 9!

1! 3! 5!= 504

8.(4, 3, 3, 2, 1, 0, 0, 0, 0

) 9!

1! 2! 1! 1! 4!= 7,560

9.(4, 3, 3, 1, 1, 1, 0, 0, 0

) 9!

1! 2! 3! 3!= 5,040

10.(4, 3, 2, 2, 2, 0, 0, 0, 0

) 9!

1! 1! 3! 4!= 2,520

11.(4, 3, 2, 2, 1, 1, 0, 0, 0

) 9!

1! 1! 2! 2! 3!= 15,120

12.(4, 3, 2, 1, 1, 1, 1, 0, 0

) 9!

1! 1! 1! 4! 2!= 7,560

13.(4, 3, 1, 1, 1, 1, 1, 1, 0

) 9!

1! 1! 6! 1!= 504

14.(4, 2, 2, 2, 2, 1, 0, 0, 0

) 9!

1! 4! 1! 3!= 2,520

29

2. Probability

Class No. Integer-solution No. of permutations

with repetitions

15.(4, 2, 2, 2, 1, 1, 1, 0, 0

) 9!

1! 3! 3! 2!= 5,040

16.(4, 2, 2, 1, 1, 1, 1, 1, 0

) 9!

1! 2! 5! 1!= 1,512

17.(4, 2, 1, 1, 1, 1, 1, 1, 1

) 9!

1! 1! 7!= 72

18.(3, 3, 3, 3, 1, 0, 0, 0, 0

) 9!

4! 1! 4!= 630

19.(3, 3, 3, 2, 2, 0, 0, 0, 0

) 9!

3! 2! 4!= 1,260

20.(3, 3, 3, 2, 1, 1, 0, 0, 0

) 9!

3! 1! 2! 3!= 5,040

21.(3, 3, 3, 1, 1, 1, 1, 0, 0

) 9!

3! 4! 2!= 1,260

22.(3, 3, 2, 2, 2, 1, 0, 0, 0

) 9!

2! 3! 1! 3!= 5,040

23.(3, 3, 2, 2, 1, 1, 1, 0, 0

) 9!

2! 2! 3! 2!= 7,560

24.(3, 3, 2, 1, 1, 1, 1, 1, 0

) 9!

2! 1! 5! 1!= 1,512

25.(3, 3, 1, 1, 1, 1, 1, 1, 1

) 9!

2! 7!= 36

26.(3, 2, 2, 2, 2, 2, 0, 0, 0

) 9!

1! 5! 3!= 504

27.(3, 2, 2, 2, 2, 1, 1, 0, 0

) 9!

1! 4! 2! 2!= 3,780

28.(3, 2, 2, 2, 1, 1, 1, 1, 0

) 9!

1! 3! 4! 1!= 2,520

29.(3, 2, 2, 1, 1, 1, 1, 1, 1

) 9!

1! 2! 6!= 252

30.(2, 2, 2, 2, 2, 2, 1, 0, 0

) 9!

6! 1! 2!= 252

31.(2, 2, 2, 2, 2, 1, 1, 1, 0

) 9!

5! 3! 1!= 504

32.(2, 2, 2, 2, 1, 1, 1, 1, 1

) 9!

4! 5!= 126

Total = 93,600

The total number of permutations (= 93,600) gives the answer to the original question.✷

Remark

C3613 = 2,310,789,600 ≈ 2.31 billion

is simply a wrong answer (why?). Think carefully what is C3613 and compare its value with 93,600.

30

Alternative method

Class No. 4 of a kind 3 of a kind a pair a single No. of combinations

1. 3 0 0 1 C93 × C6

1 = 504

2. 2 1 1 0 C92 × C7

1 × C61 = 1,512

3. 2 1 0 2 C92 × C7

1 × C62 = 3,780

4. 2 0 2 1 C92 × C7

2 × C51 = 3,780

5. 2 0 1 3 C92 × C7

1 × C63 = 5,040

6. 2 0 0 5 C92 × C7

5 = 756

7. 1 3 0 0 C91 × C8

3 = 504

8. 1 2 1 1 C91 × C8

2 × C61 × C5

1 = 7,560

9. 1 2 0 3 C91 × C8

2 × C63 = 5,040

10. 1 1 3 0 C91 × C8

1 × C73 = 2,520

11. 1 1 2 2 C91 × C8

1 × C72 × C5

2 = 15,120

12. 1 1 1 4 C91 × C8

1 × C71 × C6

4 = 7,560

13. 1 1 0 6 C91 × C8

1 × C76 = 504

14. 1 0 4 1 C91 × C8

4 × C41 = 2,520

15. 1 0 3 3 C91 × C8

3 × C53 = 5,040

16. 1 0 2 5 C91 × C8

2 × C65 = 1,512

17. 1 0 1 7 C91 × C8

1 × C77 = 72

18. 0 4 0 1 C94 × C5

1 = 630

19. 0 3 2 0 C93 × C6

2 = 1,260

20. 0 3 1 2 C93 × C6

1 × C52 = 5,040

21. 0 3 0 4 C93 × C6

4 = 1,260

22. 0 2 3 1 C92 × C7

3 × C41 = 5,040

23. 0 2 2 3 C92 × C7

2 × C53 = 7,560

24. 0 2 1 5 C92 × C7

1 × C65 = 1,512

25. 0 2 0 7 C92 × C7

7 = 36

26. 0 1 5 0 C91 × C8

5 = 504

27. 0 1 4 2 C91 × C8

4 × C42 = 3,780

28. 0 1 3 4 C91 × C8

3 × C54 = 2,520

29. 0 1 2 6 C91 × C8

2 × C66 = 252

30. 0 0 6 1 C96 × C3

1 = 252

31. 0 0 5 3 C95 × C4

3 = 504

32. 0 0 4 5 C94 × C5

5 = 126

Total = 93,600

The total number of combinations (= 93,600) gives the answer to the original question. ✷

31

2. Probability

� Example 2.35 (Simple probability ⋆ )

A bookshelf contains 3 German books, 4 French books and 5 Chinese books in a row. Each book is differentfrom one another. What is the probability that no two Chinese books must be next to each other?

Solution In Example 2.7 (page 13), the total number of permutations has been calculated as

(

8× 7× 6× 5× 4)

× 7! = 33,868,800.

The required probability is therefore

33868800

12!=

33868800

479001600≈ 0.0707.

The probability is approximately equal to 7.07%. ✷

Remark The total number of arrangements of the books (≈ 33.9 million) sounds a huge number but,interestingly, the corresponding probability is however a very small number.


Two German, three French and four Chinese are to be seated in a row. What is the probability that a Chinesewill not sit next to another Chinese but the two German must sit next to each other?

Solution In Example 2.8 (page 13), the total number of permutations has been calculated as

(

5× 4× 3× 2)

× 4!× 2! = 5,760.

The required probability is therefore

5760

9!=

5760

362880=

1

63≈ 0.0159.



Six fair dices are rolled. What is the probability of getting three pairs? (“Three pairs” means for example “apair of 1, a pair of 2 and a pair of 5”. )

Solution In Example 2.22 (page 19), the total number of different arrangements has been calculated as

C63 × 6!

2! 2! 2!= 1800.

The required probability is given by

P (Three pairs) =1800

66=

25

648≈ 0.03858.

✷

Remark An alternative way of computing the probability is

P (Three pairs) =C6

3 × C62 · C4

2 · C22

66≈ 0.03858.

� Example 2.38 (Round table ⋆ )

12 people are randomly seated at a round table. What is the probability that John and Mary will sit next toeach other?

32

Solution In Example 2.11 (page 15), the total number of seating arrangements has been calculated as

10! + 10! = 10!× 2.

The required probability is10!× 2

11!=

2

11≈ 0.1818.

✷

Remark Alternatively, we may use a more elegant method in the following. Assume the position of Johnis fixed, then there are 11 available seats for Mary in which only two of them will meet the requirement ofthe question. We can quickly write the required probability as

2

11≈ 0.1818.


2 red balls and 13 green balls are randomly put into five identical boxes, so that each box contains 3 balls.Find the probability that the 2 red balls are put in different boxes.

Solution Denote R as a red ball and X as a ball of any color (including red).{

R,X,X} {

X,X,X} {

X,X,X} {

X,X,X} {

X,X,X}

The required probability is given by12

14=

6

7≈ 0.8571.

✷


A fair six-sided die is tossed n times. Let P (n) be the probability that the total number of times of obtaininga “2” in the n tosses is an odd number. Find P (1), P (2) and P (3).

Solution

P (1) = P (toss once and one “2”) =1

6≈ 0.1667,

P (2) = P (toss twice and one “2”)

=1

6× 5

6+

5

6× 1

6=

5

18≈ 0.2778,

P (3) = P (toss three times and one “2”) + P (toss three times and three “2”s)

= C31

(1

6

) (5

6

)2+(1

6

)3

=75

216+

1

216=

19

54≈ 0.3519.

✷


Roll six fair dices. What is the probability that the outcome of the rolled dices is an “one pair”? (Forexample, {2, 2, 3, 4, 5, 6} is called an “one pair”, or generally in symbols {a, a, b, c, d, e}, where a, b, c, d, eare all unequal.)

Solution Refer to Example 2.22 (page 19) for the class “One pair”. The required probability is given by(

C61 × C5

4

)

×(

C62 × 4!

)

66=

10800

46656=

25

108≈ 0.2315.

✷

33

2. Probability


We are playing with a selected deck of 16 poker cards, as shown below:

J♥ A♦ J♣ A♠

Q♥ 2♦ Q♣ 2♠

K♥ 3♦ K♣ 3♠

A♥ 4♦ A♣ 4♠

Let H be the event that the drawn card is a heart (♥); D be the event that the drawn card is a diamond (♦);A be the event that the drawn card is an ace (A).

(a) What is the probability P (H ∩ D)?

(b) What is the probability P (H ∩ A)?

(c) What is the probability P (H ∪ D)?

(d) What is the probability P (H ∪ A)?

(e) Are H and D independent events? Why?

(f) Are H and A independent events? Why?

(g) If three cards are drawn from the deck, one at a time, what is the probability that an ace will appearfor the first time at the third drawn?

Solution

(a) P (H ∩ D) = 0.

(b) P (H ∩ A) =1

16.

(c) P (H ∪ D) = P (H) + P (D) =1

4+

1

4=

1

2.

(d) P (H ∪ A) = P (H) + P (A)− P (H ∩ A) =1

4+

1

4− 1

16=

7

16.

(e) The events H and D are not independent because they are mutually exclusive.

(f) The events H and A are independent because P (H ∩ A) = P (H)× P (A).

(g) P (Ace appears first time at the third drawn) =12

16× 11

15× 4

14=

11

70.

✷


We are playing with a selected deck of 16 poker cards, as shown below:

♠A ♥3 ♦J ♣A

♠ 2 ♥ 5 ♦ Q ♣ 3

♠ 4 ♥ 7 ♦ K ♣ 5

♠ 6 ♥ 9 ♦ A ♣ 7

If three cards are randomly drawn from the deck, one at a time, what is the probability that

(a) one and only one card is an ace?

(b) at least one card is an ace?

(c) exactly two cards are spades (♠)?

Solution

(a)

P (one and only one ace) =3

16× 13

15× 12

14× 3 =

1404

3360=

117

280≈ 0.4179.

34

(b)

P (at least one ace) = 1− P (no ace) = 1− 13

16× 12

15× 11

14=

1644

3360=

137

280≈ 0.4893.

(c)

P (exactly two spades) =4

16× 3

15× 12

14× 3 =

432

3360=

9

70≈ 0.1286.

✷

Remark Alternatively,

(a)

P (one and only one ace) =C3

1 × C132

C163

=234

560=

117

280.

(b)

P (at least one ace) = 1− P (no ace) = 1− C30 × C13

3

C163

= 1− 286

560=

137

280.

(c)

P (exactly two spades) =C4

2 × C121

C163

=72

560=

9

70.

� Example 2.44 (Simple probability ⋆⋆ )

The events E, F and G are such that E is independent of F , E is independent of G, and

P (E) =5

9, P (F ) =

2

5, P (G) =

1

2, P (E ∩ F ∩G) =

1

4, P (E ∩ F ∩G) =

1

6,

where E means the complement of E. Find

(a) P (E ∩ F ).

(b) P (E ∩ F ∩G).

(c) P (F ∩G).

(d) P (F∣∣ G).

(e) P (E∣∣ F ∩G).

(f) P (E ∩ F∣∣ E ∩G).

(g) Are E ∩ F and E ∩G independent? Why?

Solution

(a)

P (E ∩ F ) = P (E)× P (F ) =(

1− P (E))

×(

1− P (F ))

=(

1− 5

9

)(

1− 2

5

)

=4

15.

(b)

P (E ∩ F ∩G) = P (E ∩ F )− P (E ∩ F ∩G) =4

15− 1

6=

1

10.

(c)

P (F ∩G) = P (E ∩ F ∩G) + P (E ∩ F ∩G)

= P (E ∩ F ∩G) +[P (E ∩G)− P (E ∩ F ∩G)

]

= P (E ∩ F ∩G) +(1− P (E)

)× P (G)− P (E ∩ F ∩G)

]

=1

4+

4

9× 1

2− 1

6=

11

36.

35

2. Probability

(d)

P (F∣∣ G) =

P (F ∩G)

P (G)=

11

361

2

=11

18.

(e)

P (E∣∣ F ∩G) =

P (E ∩ F ∩G)

P (F ∩G)=

1

411

36

=9

11.

(f)

P (E ∩ F∣∣ E ∩G) =

P (E ∩ F ∩G)

P (E ∩G)=

P (E ∩ F ∩G)

P (E)× P (G)=

1

45

9× 1

2

=9

10.

(g) E ∩ F and E ∩G are not independent because

P (E ∩ F∣∣ E ∩G) =

9

10,

whereas

P (E ∩ F ) = P (E)× P (F ) =5

9× 2

5=

2

96= 9

10.

✷

Remark If “E and F are independent” and “E and G are independent”, F and G are not necessarily

independent. Note that P (F ∩ G) =11

366= 2

5× 1

2= P (F ) × P (G). F and G are dependent in this

question.

36

� Example 2.45 (Independent vs mutually exclusive ⋆ )

(a) Let A and B be two events of a sample space. Prove that

(i) P (A ∪B) 6 P (A) + P (B). (ii) P (A ∪B) > 1− P (Ac)− P (Bc).

(b) Prove that if two events A and B with nonzero probabilities are mutually exclusive, they are notindependent.

(c) Assume that P (A) = a and P (B) = b. Find the probabilities P (Ac ∩ B) and P (Ac | B) in termsof a and b for each of the following cases:

(i) A and B are mutually exclusive. (ii) A and B are independent.

(d) Consider an experiment of tossing two fair dices of different colors. Let A be the event that the outcomeon the red die is odd, B be that the outcome on the green die is odd and C be that the sum of the twooutcomes is odd. Prove that

(i) A, B, C are pairwise independent. (ii) A, B, C are not independent.

Solution

(a) (i) By the inclusion-exclusion principle,

P (A ∪B) = P (A) + P (B)− P (A ∩B) 6 P (A) + P (B),

since P (A ∩B) > 0.

(ii) By the inclusion-exclusion principle,

P (A ∪B) = P (A) + P (B)− P (A ∩B)

= 1− P (Ac) + P (B)− P (A ∩B)

= 1− P (Ac) + P (B\A)

> 1− P (Ac) > 1− P (Ac)− P (Bc).

(b) If A ∩B = ∅, P (A ∩B) = 0. However, P (A) · P (B) 6= 0. They are not independent.

(c) (i) A ∩B = ∅. Thus,

P (Ac ∩B) = P (B) = b, P (Ac∣∣ B) = 1.

(ii) Ac and B are also independent. Thus,

P (Ac ∩B) = P (Ac)P (B) = (1− a) b, P (Ac∣∣ B) = P (Ac) = 1− a.

(d) (i) We have P (A) = 18/36 = 1/2, P (B) = 18/36 = 1/2, P (C) = 18/36 = 1/2, P (A ∩ B) =9/36 = 1/4, P (B ∩ C) = 9/36 = 1/4, and P (C ∩A) = 9/36 = 1/4. Thus we have

P (A ∩B) = P (A)P (B), P (B ∩ C) = P (B)P (C), P (C ∩A) = P (C)P (A).

This shows that the events A, B, C are pairwise independent.

(ii) We haveP (A ∩B ∩ C) = P (∅) 6= P (A)P (B)P (C).

Thus the events A, B, C are not independent.

✷

37

2. Probability


If three married couples are seated at random at a round table, what is the probability that no wife sits nextto her husband?

Solution Denote the events: E = first couple sitting together, F = second couple sitting together andG = third couple sitting together. Our target is to look for the probability

1− P (E ∪ F ∪G).

In order to find P (E ∪ F ∪G) we may use “the inclusion-exclusion principle for three sets” such that

P (E ∪ F ∪G) = P (E) + P (F ) + P (G)− P (E ∩ F )− P (E ∩G)− P (F ∩G) + P (E ∩ F ∩G).

Note that the events E, F and G are simply indistinguishable and will be treated as symmetric so that

P (E) = P (F ) = P (G) and P (E ∩ F ) = P (E ∩G) = P (F ∩G).

We may deduce that (Do you know how to find them? Ask me if you cannot.)

P (E) =n(E)

n(S)=

4!× 2

5!,

P (E ∩ F ) =n(E ∩ F )

n(S)=

3!× 2× 2

5!,

P (E ∩ F ∩G) =n(E ∩ F ∩G)

n(S)=

2!× 2× 2× 2

5!.

The required probability = 1−[

3× P (E)− 3× P (E ∩ F ) + P (E ∩ F ∩G)]

= 1−[

3× 4!× 2

5!− 3× 3!× 22

5!+

2!× 23

5!

]

= 1− 6

5+

3

5− 2

15

=4

15≈ 0.2667.

✷


Assume that 2 married couples and one single man (five people in total) are seated randomly at a roundtable. what is the probability that no wife sits next to her husband?

Solution In Example 2.12 (page 15), the total number of seating arrangements has been calculated as

2× 2 + 2× 2 = 8.

The required probability is8

4!=

8

24=

1

3≈ 0.3333.

✷

Remark Based on the graphical method that we have used in Example 2.12 (page 15), it is obvious thatthis graphical method can also be applied in Example 2.46. Can you successfully divide the cases and drawthe corresponding figures again to solve the problem? Try it out and ask me if you find any difficulties. Infact there is one geometric method, namely the “Circle-and-Chord” method, which requires you to find thenumbers a and b such that

the required probability =1

5× a+

2

5× b.

This is an interesting method. Let me know if you want the details. Ans: a =4

6, b =

2

6

38

� Example 2.48 (Birthday problem ⋆ )

Suppose that there are n people in a room and that the birthdays of these people were randomly chosen fromthe 365 days of the year. Let p(n) denote the probability that there is at least one person in the room whosebirthday is on 1-st October.

(a) Find an expression of p(n) in terms of n.

(b) Show that if there are at least 253 people in the room, then it is more likely than not that someone willhave their birthday on 1-st October, i.e., p(n) > 0.5 whenever n > 253.

(c) For what values of n do we have p(n) > 0.9?

Solution

(a)p(n) = 1− P

(All n birthdays are not 1st October

)

= 1−(364

365

)n

.

(b) From the above expression for p(n), it follows that if n increases (where364

365< 1), then

(364

365

)n

decreases and hence p(n) is an increasing function of n. Also, p(253) ≈ 0.5005. Thus,

if n > 253, then p(n) > 0.5.

(c) We first solve the equation p(n) = q for n in terms of q as follows.

1−(364

365

)n

= q,

(364

365

)n

= 1− q,

n ln(364

365

)

= ln(1− q),

n =ln(1− q)

ln(364/365)≈ −364.5 ln(1− q).

By the above expression,

p(n) > 0.9 if n > −364.5 ln(1− 0.9) ≈ 839.29.

So,if n > 840, then p(n) > 0.9.

✷

� Example 2.49 (Inclusion-exclusion principle ⋆⋆ )

Five balls are randomly chosen, without replacement, from an urn that contains 5 red, 6 white and 7 blueballs. Find the probability that at least one ball of each color is chosen.

Solution Let R, W and B denote the events that there are no red, no white and no blue balls chosen,respectively. By the inclusion-exclusion principle,

P (R ∪W ∪B) = P (R) + P (W ) + P (B)− P (R ∩W )− P (R ∩B)− P (W ∩B) + P (R ∩W ∩B)

=C13

5

C185

+C12

5

C185

+C11

5

C185

− C75

C185

− C65

C185

− C55

C185

=359

1224≈ 0.2933.

Hence,P (at least one ball of each color is chosen) ≈ 1− 0.2933 = 0.7067.

✷

39

2. Probability

� Example 2.50 (Derangements ⋆ )

Suppose each person in a group of 3 friends brings a gift to a party. The 3 gifts will be distributed so thateach person receives one gift. Find the probability that no person will receive his/her own gift.

Solution There are a total of 3! = 6 permutations for distributing the gifts. However there are only 2derangements:

No. Person A Person B Person C

1. Gift B Gift C Gift A

2. Gift C Gift A Gift B


3!=

1

3≈ 0.3333.

✷

� Example 2.51 (Derangements ⋆⋆ )

In a special remedial class, there are 4 students, namely A, B, C and D. The students have taken a shorttest. The class lecturer wants to let the students grade each other’s test. Find the probability that no studentreceives his/her own test for grading.

Solution There are a total of 4! = 24 possible permutations for handling the grading. There are only 9derangements:

No. Student A Student B Student C Student D Outcome

1. Test B Test A Test D Test C BADC

2. Test B Test C Test D Test A BCDA

3. Test B Test D Test A Test C BDAC

4. Test C Test A Test D Test B CADB

5. Test C Test D Test B Test A CDBA

6. Test C Test D Test A Test B CDAB

7. Test D Test A Test B Test C DABC

8. Test D Test C Test B Test A DCBA

9. Test D Test C Test A Test B DCAB


4!=

3

8= 0.375.

✷

Remark Let D(n) be the number of derangement where n is any positive integer. It is natural to writeD(1) = 0 and D(2) = 1. Furthermore, by Example 2.50 and Example 2.51 we know that

D(3) = 2 and D(4) = 9.

We would like to know if there is a general explicit formula for D(n). In fact, by mathematical induction,we can deduce the recursive relation:

D(n)− nD(n− 1) = (−1)n,

where n = 2, 3, 4, · · · . Based on this recursive relation we should be able to recursively deduce any number

of derangement. Note (without proof) that as n → ∞, the probabilityD(n)

n!approaches e−1 ≈ 0.3679.

40

� Example 2.52 (Challenging Problem: Coin in Square ⋆⋆⋆ )

In a carnival game a player throws a coin from a distance of about 5 feet onto the surface of a table ruled in1.5-inch squares. If the coin (1-inch in diameter) falls entirely inside a square, the player wins a large liondoll; otherwise he loses the coin. If the coin lands on the table, what is the probability to win? What if thesquares were made smaller by merely thickening the lines (from negligible width to width of 0.1 inches).

Answer: 19

� Example 2.53 (Challenging Problem: Lengths of Random Chord ⋆⋆⋆ )

If a chord is selected at random on a fixed circle, what is the probability that its length is greater than theradius of the circle?

Answer: 0.667 or 0.866 or 0.75 depending on the notion of “at random”

� Example 2.54 (Challenging Problem: Drunk Man Walk ⋆⋆⋆ )

From where he stands, one step toward the cliff would send the drunk man over the edge. He takes randomsteps, either toward or away from the cliff. At any step his probability of taking a step away is 2

3, of a step

toward the cliff is 13. What is the chance of escaping the cliff after five walking steps?

Answer: 136243

≈ 0.560

� Example 2.55 (Challenging Problem: Random Quadratic Equations ⋆⋆⋆ )

What is the probability that the quadratic equation (where a, b are any independent real numbers)

x2 + 2ax+ b = 0

has real roots?

Answer: P (roots are real) ≈ 1

� Example 2.56 (Challenging Problem: Needle Lies Across a Line ⋆⋆⋆ )

A large table has been ruled with a set of parallel lines spaced d units apart. A needle of length l (smallerthan d) is tossed randomly on the table. What is the probability that when it comes to rest it crosses a line?

Answer:2l

πd≈ 0.637×

l

d

41

2. Probability

� Example 2.57 (Reduced sample space ⋆ )

A bag contains 4 white balls and 3 black balls. Two balls are randomly drawn from the bag without replace-ment. Show that the second drawn ball is white has the same probability as the first drawn ball is white.

Solution

P (2nd ball is white) = P (2nd is white ∩ 1st is white)

+P (2nd is white ∩ 1st is black)

= P (2nd is white∣∣ 1st is white)× P (1st is white)

+P (2nd is white∣∣ 1st is black)× P (1st is black)

=3

6· 47+

4

6· 37

=4

7

= P (1st ball is white).✷

Remark We may generalize the given statement: A bag contains m white balls and n black balls, wherem,n > 2. Two balls are randomly drawn from the bag without replacement. Show that the second drawn ballis white has the same probability as the first drawn ball is white. Is the statement still true? True

� Example 2.58 (Reduced sample space ⋆ )

One bag contains 4 white balls and 3 black balls, and a second bag contains 3 white balls and 5 black balls.One ball is drawn from the first bag and placed unseen in the second bag. What is the probability that a ballnow drawn from the second bag is black?

Solution The problem has to be divided in two cases. It follows that

P (2nd is black) = P (2nd is black ∩ 1st is white)

+P (2nd is black ∩ 1st is black)

= P (2nd is black∣∣ 1st is white)× P (1st is white)

+P (2nd is black∣∣ 1st is black)× P (1st is black)

=5

9· 47+

6

9· 37

=38

63≈ 0.603.


42

� Example 2.59 (Conditional probability ⋆ )

Two fair dice are rolled and the outcome is kept secret. You are interested in the sum shown. Suppose youhave been told that at least one die shows 1. How likely is it now that the sum will be 5 or more?

Solution When two dice are rolled, the set of all possible outcomes is

S ={(i, j) : i, j = 1, 2, · · · , 6

}.

Denote the events

A ={sum will be 5 or more

}={(i, j) : i+ j > 5

}

and

B ={one die shows 1

}={(1, j) : j = 1, 2, · · · , 6

}∪{(i, 1) : i = 1, 2, · · · , 6

}.

Then,

A ∩B ={(1, 4), (1, 5), (1, 6), (4, 1), (5, 1), (6, 1)

}.

The required probability is

P (A∣∣ B) =

P (A ∩B)

P (B)=

n(A ∩B)

n(B)=

6

11.

✷

� Example 2.60 (Conditional probability ⋆ )

The probability that a married man watches a movie is 0.4 and the probability that a married woman watchesthe movie is 0.5. The probability that a man watches the movie, given that his wife does is 0.7. Find theprobability that a wife watches the movie given that her husband does not?

Solution Let A be the event that a married man watches the movie, B be that a married woman watchesthe movie. Now,

P (A) = 0.4, P (B) = 0.5 and P (A∣∣ B) = 0.7.

The probability that a wife watches the movie given that her husband does not is given by

P (B∣∣ Ac) =

P (Ac ∩B)

P (Ac),

where

P (Ac ∩B) = P (Ac∣∣ B) · P (B) = (1− 0.7)(0.5) = 0.15,

and

P (Ac) = 1− 0.4 = 0.6.

Hence,

P (B∣∣ Ac) =

P (Ac ∩B)

P (Ac)=

0.15

0.6= 0.25.

✷

� Example 2.61 (Conditional probability ⋆⋆ )

(a) A fair die with faces 1, 2, and 3 colored green and faces 4, 5 and 6 colored red is tossed once. If youcan see that the die has landed green face up (but cannot see the actual number shown), how likely willit be that the outcome is an even number?

(b) Suppose further that the die in (a) is biased with

P (1) = P (3) = P (5) =1

9, P (2) = P (4) = P (6) =

2

9.

What is the probability that the outcome is an even number given that the die lands green face up?

Solution

43

2. Probability

(a) Denote the events

A = {even numbers} = {2, 4, 6}, and B = {colored green} = {1, 2, 3}.

Then A ∩B = {2}. The required probability is given by the conditional probability that

P (A∣∣ B) =

P (A ∩B)

P (B)=

n(A ∩B)

n(B)=

1

3.

(b) The outcomes in (b) are not likely equally to occur. The required probability is again given by theconditional probability

P (A∣∣ B) =

P (A ∩B)

P (B)

that, however, the individual probabilities P (A ∩B) and P (B) have to be computed first. Now,

P (A ∩B) = P (2) =2

9

and

P (B) = P (1) + P (2) + P (3) =1

9+

2

9+

1

9=

4

9.

The required probability is

P (A∣∣ B) =

P (A ∩B)

P (B)=

2/9

4/9=

1

2.

✷


In an university, the academic staff of three research groups (A, B and C) are individually invited to applyfor a research grant project. Group A has 2 staff, B has 2 and C has 3. It is assumed that all staff decideindependently whether or not to apply. Staff of groups A, B and C apply with respective probabilities 1/2,1/4 and 1/5. Given that there is just one application in total, find the probability that it comes from a staffof group B.

Solution

P (1 from B∣∣ 1 in total)

=P (1 from B and 1 in total)

P (1 in total)

=P (1 from B, 0 from A and C)

P (1 from A, 0 from B and C) + P (1 from B, 0 from A and C) + P (1 from C, 0 from A and B)

=C2

1 (14)( 3

4)× ( 1

2)2 × ( 4

5)3

C21 (

12)( 1

2)× ( 3

4)2 × ( 4

5)3 + C2

1 (14)( 3

4)× ( 1

2)2 × ( 4

5)3 + C3

1 (15)( 4

5)2 × ( 1

2)2 × ( 3

4)2

=(2)(

1

4)(4

5)

(2)(3

4)(4

5) + (2)(

1

4)(4

5) + (3)(

1

5)(3

4)

=8

24 + 8 + 9

=8

41≈ 0.195.


44


An electrical system consists of four components as illustrated in the figure below. The system works ifcomponents A and B work and either of the components C or D work. The reliability (probability of working)of each component is also shown in the figure below. Find the probability that the component D does notwork, given that the entire system works. Assume that four components work independently.

0.8

A

0.7

B

0.9

C

0.6

D

Solution The probability that the entire system works can be calculated as follows:

P (System works) = P(

A ∩B ∩ (C ∪D))

= P (A)× P (B)× P (C ∪D)

= P (A)× P (B)×(

P (C) + P (D)− P (C ∩D))

= P (A)× P (B)×(

P (C) + P (D)− P (C)× P (D))

= 0.8× 0.7×(0.9 + 0.6− 0.9× 0.6

)

= 0.5376.

To calculate the conditional probability as required,

P (Component D does not work∣∣ System works)

=P (System works but Component D does not work)

P (System works)

=P (A ∩B ∩ C ∩D)

P (System works)

=0.8× 0.7× 0.9× 0.4

0.5376

= 0.375.

✷

� Example 2.64 (Bayes’ formula ⋆ )

In a certain population, there are equal number of men and women. 4% of men are colorblind while 2% ofwomen are colorblind. If a colorblind person is chosen at random, find the probability that this person is aman.

45

2. Probability

Solution Let M be the set of all men in the population, W be the set of all women, and B be the set ofall colorblind. It follows from the Bayes’ formula that

P (M∣∣ B) =

P (M ∩B)

P (B)=

P (B ∩M)

P (B)

=P (B

∣∣M)× P (M)

P (B∣∣M)P (M) + P (B

∣∣W )P (W )

.

By given,P (M) = P (W ) = 0.5,

P (B∣∣M) = 0.04,

P (B∣∣W ) = 0.02.

Thus,

P (M∣∣ B) =

0.04× 0.5

0.04× 0.5 + 0.02× 0.5=

2

3.

✷


A disease, which can only be diagnosed with certainty after death, exists in a proportion p0 of the population.A clinical quick test is known such that

P (test is positive given that disease is present) = p1 and

P (test is negative given that disease is absent) = p2.

Find, in terms of p0, p1, p2, the probability that a randomly chosen individual who tests positive actually hasthe disease.

Solution Let D = “has disease” and T = “test positive”. Let D denote the complement of D.

P (D∣∣ T ) =

P (D ∩ T )

P (T )

=P (T

∣∣ D)× P (D)

P (T∣∣ D)× P (D) + P (T

∣∣ D)× P (D)

=p1p0

p1p0 + (1− p2)(1− p0).

✷


A manufacturer has recently designed a new car. In an international car show, the probability that the newcar will get an award for the best design is 25%, the probability that it will get an award for “the most favoritecar” is 20%, and the probability that it will get both awards is 13%.

(a) What is the probability that the car will get at least one of the two awards?

(b) Given that the car gets the award for the best design, what is the probability that it will get the awardfor the most favorite car?

(c) Given that the car does not get the award for the best design, what is the probability that it will notget the award for the most favorite car?

Solution

46

(a) By given,

P (B) = 0.25, P (F ) = 0.2

and

P (B and F ) = 0.13.

It follows from the inclusion and exclusion principle that

P (B or F ) = P (B) + P (F )− P (B and F )

= 0.25 + 0.2− 0.13

= 0.32.

(b)

P (F∣∣ B) =

P (B and F )

P (B)

=0.13

0.25= 0.52.

(c)

P (F c∣∣ Bc) =

P (Bc and F c)

P (Bc)

=P(

(B or F )c)

P (Bc)

=1− 0.32

1− 0.25≈ 0.91.

✷


An insurance company classifies drivers according to sex and to whether they are “under 25” or “25 yearsand above”. It finds that 60% of its drivers are male; 25% of the male drivers and 30% of the female driversare under 25. Find the probability of a driver being male given that the driver is under 25.

Solution Note that “25% of the male drivers are under 25” means P (under 25 | male) = 0.25.

Probabilities are given by

Male Female Total

Under 25 0.15 0.12 0.27

Over 25 0.45 0.28 0.73

Total 0.6 0.4 1

In the above, for examples,

P (Male and Under 25) = P (Under 25∣∣ Male)× P (Male)

= 0.25× 0.6

= 0.15,

andP (Female and Under 25) = P (Under 25

∣∣ Female)× P (Female)

= 0.3× 0.4

= 0.12.

47

2. Probability

It follows from Bayes’ formula that

P (male∣∣ under 25) =

P (male and under 25)

P (male and under 25) + P (female and under 25)

=0.15

0.27

=5

9

≈ 0.556.

The required probability is 55.6%. ✷


A large manufacturing firm purchases a certain component from three different vendors A, B and C. Vendor Asupplies 40% of the components and has a defective rate of 2%, vendor B supplies 40% of the components andhas a defective rate of 1% and vendor C supplies the remainder of the components and has a defective rateof 3%. If one component is randomly selected from a shipment and is found defective, what is the probabilitythat this shipment came from vendor C?

Solution Denote the events A, B, C: the shipments came from vendors A, B, C; and D: the randomlyselected component is defective. Apply the Bayes’ formula,

P (C∣∣ D) =

P (C and D)

P (D)

=P (D

∣∣ C)× P (C)

P (D∣∣ A)× P (A) + P (D

∣∣ B)× P (B) + P (D

∣∣ C)× P (C)

=(0.03)(0.2)

(0.02)(0.4) + (0.01)(0.4) + (0.03)(0.2)

=0.006

0.008 + 0.004 + 0.006

=1

3≈ 0.333.

✷


A manufacturing company employs two analytical plans for the design and development of a particularproduct. For cost reasons, both are used at varying times. In fact, plans 1 and 2 are used for 30% and 70%of the products respectively. The “defect rate” is different for the two plans as follows:

P (D∣∣ P1) = 0.03, P (D

∣∣ P2) = 0.01,

where P (D∣∣ P1) and P (D

∣∣ P2) are the probabilities of a defective product, given plan 1 and plan 2,

respectively. If a random product was observed and found to be defective, which plan was most likely usedand thus responsible?

Solution We compare the following two conditional probabilities:

P (P1

∣∣ D) =

P (D∣∣ P1) · P (P1)

P (D∣∣ P1) · P (P1) + P (D

∣∣ P2) · P (P2)

=(0.03)(0.3)

(0.03)(0.3) + (0.01)(0.7)

=0.009

0.016≈ 0.56,

48

and

P (P2

∣∣ D) =

P (D∣∣ P2) · P (P2)

P (D∣∣ P1) · P (P1) + P (D

∣∣ P2) · P (P2)

=(0.01)(0.7)

(0.03)(0.3) + (0.01)(0.7)

=0.007

0.016

≈ 0.4375.

Hence, plan 1 was most likely used and thus responsible. ✷


Suppose 30% of the women in a class received an A on the examination and 25% of the men received an A.The class is 60% women. Given that a person chosen at random received an A, what is the probability thisperson is a woman?

Solution Denote the following events:

A = {receiving an A on the examination},W = {the person being a woman}, M = {the person being a man}.

ThenP (A

∣∣W ) = 0.3, P (A

∣∣M) = 0.25, and P (W ) = 0.6.

We want P (W∣∣ A). By definition,

P (W∣∣ A) =

P (W ∩A)

P (A),

whereP (W ∩A) = P (A

∣∣W ) · P (W ) = (0.3)(0.6) = 0.18,

and

P (A) = P (W ∩A) + P (M ∩A) = 0.18 + P (A∣∣M) · P (M) = 0.18 + (0.25)(0.4) = 0.28.

Hence,

P (W∣∣ A) =

0.18

0.28≈ 64.3%.

✷

� Example 2.71 (Multiplication principle for conditional probability ⋆⋆ )

From a pack of 52 cards, we draw five cards one card by one card. What is the probability that an Ace willappear for the first time at the fifth drawn?

Solution Denote

Ei = {The i-th card is non-Ace}, i = 1, 2, 3, 4 and F = {Fifth card is Ace}.

The multiplication principle for conditional probability implies that

P (E1 ∩ E2 ∩ E3 ∩ E4 ∩ F ) = P (E1) × P (E2

∣∣ E1) × P (E3

∣∣ E1 ∩ E2) ×

P (E4

∣∣ E1 ∩ E2 ∩ E3) × P (F

∣∣ E1 ∩ E2 ∩ E3 ∩ E4).

Hence,

P (Ace appears first time at the fifth drawn) =48

52× 47

51× 46

50× 45

49× 4

48≈ 0.05989.

✷

49

2. Probability

Remark An alternative way of computing the probability is

P (Ace appears first time at the fifth drawn) =P 484 × P 4

1

P 525

≈ 0.05989.

� Example 2.72 (Independence of two events ⋆ )

A small town has one fire engine and one ambulance available for emergencies. The probability that the fireengine is available when needed is 0.98, and the probability that the ambulance is available when called is0.92. In the event of an injury resulting from a burning building, find the probability that both the ambulanceand the fire engine will be available.

Solution Let E be the event that “the ambulance is available” and F be the event that “the fire engineis available”. Without further information given by the question, we may assume that the two events areindependent. Thus, the required probability is given by

P (E ∩ F ) = P (E)× P (F )

= 0.98× 0.92

= 0.9016.

✷

� Example 2.73 (Conditional probability and independence ⋆ )

Consider a coin-tossing experiment involving two coins where one is fair and the other has two heads. Theexperiment is conducted as follows. The fair coin is tossed first. If a head appears, the fair coin is tossedagain, but if a tail appears the two-headed coin is tossed instead. Are the two events “heads on the first toss”and “heads on the second toss” independent? Explain why.

Solution The outcomes with their probabilities are as follows.

First Toss Second Toss Joint Probability

H H P (HH) = (1/2)(1/2) = 1/4

H T P (HT) = (1/2)(1/2) = 1/4

T H P (TH) = (1/2)(1) = 1/2

T T P (TT) = (1/2)(0) = 0

The probability that a head appears on the second toss is 1/4 + 1/2 = 3/4. However, the probability that

a head appears on the second toss given that a head appears on the first toss is not equal to 3/4. That is,

P (head second∣∣ head first) =

P (head first ∩ head second)

P (head first)

=1/4

1/4 + 1/4

=1

2

6= P (head on second) =3

4.

The two given events are not independent. ✷

50


In a deck of 52 cards, there are 13 kinds: Ace (A), King (K), Queen (Q), Jack (J) and values from 2 to 10.Each of such kinds has 4 suits: Spade (♠), Heart (♥), Club (♣) and Diamond (♦). Define Full House as aset of five cards containing three of a kind and a pair of another kind. A King Full House is a Full Housewith three Kings. Peter draws five cards randomly from the deck without replacement. In (a), (b) and (c)below, give your answers correct to 3 significant figures.

(a) What is the probability that Peter will get a Full House?

(b) What is the probability that Peter will get a King Full House?

(c) Suppose that the cards drawn by Peter have formed a Full House without any Kings. He then drawsanother five cards randomly from the remaining cards without replacement. Find the probability thatthese five cards form a King Full House.

Solution

(a) P (Full House) =C13

1 C43 × C12

1 C42

C525

≈ 0.00144.

(b) P (King Full House) =C1

1C43 × C12

1 C42

C525

≈ 0.000111.

(c)P (King Full House

∣∣ already drawn a Full House without any Kings)

=C1

1C43 ×

(C10

1 C42 + C1

1C22

)

C475

≈ 0.000159.

✷

Remark This example was taken from “HKALE 2007 Applied Mathematics A-Level Paper 2, Qn 6”.


Mary throws three fair six-sided dice simultaneously. Let X and Y be respectively the smallest and largestnumbers of spots obtained. Evaluate P (X = 5

∣∣ Y = 6).

Solution By definition,

P (X = 5∣∣ Y = 6) =

P (X = 5 and Y = 6)

P (Y = 6)

=P (all dice showing > 5)− P (all dice show “ 5 ”)− P (all dice show “ 6 ”)

1− P (no “ 6 ” is obtained)

=(2

6)3 − (

1

6)3 − (

1

6)3

1− (5

6)3

=6

91≈ 0.0659.

✷

� Example 2.76 (Conditional probability ⋆⋆⋆ )

An electronic signal, either 0 or 1, can be inputted into a device and a corresponding signal, either 0 or 1,will be generated as an output. There are totally four situations and the (conditional) probabilities of two ofthem are listed below:

P (Output = 0∣∣ Input = 0) = 0.96,

P (Output = 1∣∣ Input = 1) = 0.87.

51

2. Probability

(a) What are the other two situations and their corresponding probabilities?

(b) Assume that the signals 0 and 1 are equally likely to be inputted. Now,

(i) a signal is inputted into the device and a corresponding output signal is generated. What is theprobability that the output is 0?

(ii) two independent signals are inputted into the device and two corresponding output signals aregenerated. Given that both output signals are known to be 0. What is the probability that exactlyone of two input signals is 0?

(c) A signal 0 is inputted into the device, the output is then inputted back into the device as a new input,and the process may continue in the same manner as a cycle. Let Pn be the probability that the n-thoutput is 0, where n > 1. Find the value of P3.

Solution

(a)P (Output = 1

∣∣ Input = 0) = 0.04,

P (Output = 0∣∣ Input = 1) = 0.13.

(b) (i)

P (Output = 0) = P (Output = 0 and Input = 0) + P (Output = 0 and Input = 1)

= P (Output = 0∣∣ Input = 0)× P (Input = 0)

+P (Output = 0∣∣ Input = 1)× P (Input = 1)

= 0.96× 0.5 + 0.13× 0.5

= 0.545.

(ii)

The probability =P (Exactly one of inputs is 0 and two outputs are 0)

P (Two outputs are 0)

=

(

0.96× 0.5)

×(

0.13× 0.5)

× 2(

0.96× 0.5)2

+(

0.96× 0.5)

×(

0.13× 0.5)

× 2 +(

0.13× 0.5)2

≈ 0.0624

0.297025

≈ 0.2101.

(c)P1 = 0.96,

P2 = 0.96× P1 + 0.13× (1− P1)

= 0.962 + 0.13× 0.04

= 0.9268,

P3 = 0.96× P2 + 0.13× (1− P2)

= 0.96× 0.9268 + 0.13× 0.0732

≈ 0.8992.

✷

52


A signal corps has to use an information channel, which is faulty. All messages are sent as a sequence ofsix binary digits. The receiver knows that the message is one of the following: “Advance” (A), or “Retreat”(R), or “Stay where you are” (S). From past experience he expects these messages in the respective ratios1 : 4 : 5. The three messages are sent as

A : 0 1 0 1 0 0; R : 0 1 1 0 1 1; S : 1 0 1 0 0 1.

Independently for each character in the message, the fault causes “0” to be sent in place of “1” with probabilityp, and “1” to be sent in place of “0” with equal probability p, the probability that any given character istransmitted correctly being 1− p. The message is received as 0 1 1 1 0 0.

(a) State Bayes’ theorem.

(b) Show thatP (0 1 1 1 0 0 is received

∣∣ A : 0 1 0 1 0 0 is sent) = p(1− p)5

and obtain similar expressions for

P (0 1 1 1 0 0 is received∣∣ R : 0 1 1 0 1 1 is sent)

andP (0 1 1 1 0 0 is received

∣∣ S : 1 0 1 0 0 1 is sent).

(c) Deduce that

P (A : 0 1 0 1 0 0 is sent∣∣ 0 1 1 1 0 0 is received) =

(1− p)3

(1− p)3 + 4p2(1− p) + 5p3,

and write down similar expressions for

P (R : 0 1 1 0 1 1 is sent∣∣ 0 1 1 1 0 0 is received)

andP (S : 1 0 1 0 0 1 is sent

∣∣ 0 1 1 1 0 0 is received).

(d) If it is assumed that p is at most 0.1, which interpretation of the message is most likely to be correct?

Solution

(a) Bayes’ theorem states that

P (A∣∣ B) =

P (B∣∣ A)× P (A)

P (B)

where P (A) is the prior probability and P (A∣∣ B) is the posterior probability, the probability for A

after taking into account B for and against A.

(b)P (0 1 1 1 0 0 is received

∣∣ A : 0 1 0 1 0 0 is sent)

= P (1 character is transmitted incorrectly and the other 5 correctly)

= p(1− p)5.

Similarly,

P (0 1 1 1 0 0 is received∣∣ R : 0 1 1 0 1 1 is sent)

= P (3 characters are transmitted incorrectly and the other 3 correctly)

= p3(1− p)3

andP (0 1 1 1 0 0 is received

∣∣ S : 1 0 1 0 0 1 is sent)

= P (4 characters are transmitted incorrectly and the other 2 correctly)

= p4(1− p)2.

53

2. Probability

(c)P (A : 0 1 0 1 0 0 is sent

∣∣ 0 1 1 1 0 0 is received)

=P (0 1 1 1 0 0 is received

∣∣ A : 0 1 0 1 0 0 is sent) · P (A is sent)

P (0 1 1 1 0 0∣∣ A is sent) · P (A is sent) + P (0 1 1 1 0 0

∣∣ R is sent) · P (R is sent)

+ P (0 1 1 1 0 0∣∣ S is sent) · P (S is sent)

=p(1− p)5 × 0.1

p(1− p)5 × 0.1 + p3(1− p)3 × 0.4 + p4(1− p)2 × 0.5

=(1− p)3

(1− p)3 + 4p2(1− p) + 5p3

(

=: pA

)

.

Similarly,

P (R : 0 1 1 0 1 1 is sent∣∣ 0 1 1 1 0 0 is received) =

4p2(1− p)

(1− p)3 + 4p2(1− p) + 5p3

(

=: pR

)

and

P (S : 1 0 1 0 0 1 is sent∣∣ 0 1 1 1 0 0 is received) =

5p3

(1− p)3 + 4p2(1− p) + 5p3

(

=: pS

)

.

(d) If p = 0.1, then

pA =0.729

0.77≈ 0.9468, pR =

0.036

0.77≈ 0.0468, pS =

0.005

0.77≈ 0.0065.

Hence, the message A is most likely to be correct.

✷

54

Chapter 3

Discrete Random Variables

� Example 3.1 (Probability distribution ⋆ )

Suppose the following gives the frequency distribution of the vehicles owned by all families living in a smalltown.

Number of Vehicles Owned Frequency

0 30

1 470

2 850

3 490

4 160

Let X be the number of vehicles owned by a randomly selected family.

(a) Write the probability distribution of X.

(b) Find the probability that the number of vehicles owned by the families is

(i) from one to three;

(ii) at least three;

(iii) at most one.

Solution

(a) The probability distribution is given by the relative frequency distribution of the vehicles owned byall 2,000 families living in the town.

Number of Vehicles Relative Frequency Probability

x p(x) = P (X = x)

030

20000.015

1470

20000.235

2850

20000.425

3490

20000.245

4160

20000.080

55

3. Discrete Random Variables

(b) (i) The probability is given by

P (one to three) = P (X = 1) + P (X = 2) + P (X = 3)

= 0.235 + 0.425 + 0.245 = 0.905.

(ii) The probability is given by

P (at least three) = P (X = 3) + P (X = 4)

= 0.245 + 0.080 = 0.325.

(iii) The probability is given by

P (at most one) = P (X = 0) + P (X = 1)

= 0.015 + 0.235 = 0.25.

✷

� Example 3.2 (Probability distribution ⋆⋆ )

Consider the following game. There are six dice. Each of the dice has five blank sides. The sixth side has anumber which is either 1, 2, 3, 4, 5 or 6 – a different number on each die. The six dice are rolled and theplayer wins a prize depending on the total of the numbers which turn up. Let X be the total of the numberson the six dice. What is the image of the random variable X? Evaluate P (10 6 X 6 12).

Solution X be the total of the numbers which turn up on the six dice. Then

Image X = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21}.

We look forP (10 6 X 6 12) = P (X = 10) + P (X = 11) + P (X = 12).

Now,P (X = 10) = P ({4, 6}) + P ({1, 3, 6}) + P ({1, 4, 5}) + P ({2, 3, 5}) + P ({1, 2, 3, 4})

= (1

6)2(

5

6)4 + 3× (

1

6)3(

5

6)3 + (

1

6)4(

5

6)2

=1

66(54 + 3× 53 + 52

)

=1025

46656.

Similarly,

P (X = 11) = P ({5, 6}) + P ({1, 4, 6}) + P ({2, 3, 6}) + P ({2, 4, 5}) + P ({1, 2, 3, 5})

=1025

46656,

and

P (X = 12) = P ({1, 5, 6}) + P ({2, 4, 6}) + P ({3, 4, 5}) + P ({1, 2, 3, 6}) + P ({1, 2, 4, 5})

=3× 53 + 2× 52

46656=

425

46656.

Hence,

P (10 6 X 6 12) =1025 + 1025 + 425

46656

=275

5184≈ 0.053.


56

� Example 3.3 (Binomial distribution ⋆ )

A restaurant serves 8 main courses of fish, 12 of beef, and 10 of poultry. If customers select from these maincourses randomly, what is the probability that two of the next ten customers order fish main course?

Solution Let X denote the number of fish main course (successes) ordered by the next ten customers.Then X is binomial such that

X ∼ Bin(n, p),

where n = 10 and p =8

30=

4

15. Thus,

P (X = 2) = C102 ×

( 4

15

)2(11

15

)8 ≈ 0.2676.

✷


Find the probability that in a family of four children, there will be at least one boy and at least one girl.

Assume that the probability of a male birth is9

20.

Solution Denote X the number of boys in a family of 4 children. Then

P (X = 0) = C40 ×

( 9

20

)0(11

20

)4=

14641

160000,

P (X = 4) = C44 ×

( 9

20

)4(11

20

)0=

6561

160000.

Hence,

P (at least one boy and at least one girl) = 1− P (X = 0)− P (X = 4)

=138798

160000

≈ 0.8675.

The probability is 86.75%. ✷


Explain which of the following three events is more likely: that a person get (i) at least one “Six” when 6dice are rolled, (ii) at least two “Sixes” when 12 dice are rolled, (iii) at least three “Sixes” when 18 dice arerolled.

Solution

Probability for (i) = 1− P (0 Sixes)

= 1− C60

(1

6

)0(5

6

)6 ≈ 0.6651.

Probability for (ii) = 1− P (0 Sixes)− P (1 Six)

= 1− C120

(1

6

)0(5

6

)12 − C121

(1

6

)1(5

6

)11 ≈ 0.6187.

Probability for (iii) = 1− P (0)− P (1)− P (2)

= 1− C180

(1

6

)0(5

6

)18 − C181

(1

6

)1(5

6

)17 − C182

(1

6

)2(5

6

)16 ≈ 0.5973.

In conclusion, (i) is most likely to happen. ✷

57


� Example 3.6 (Binomial distribution ⋆⋆ )

Throw two fair dice. We say that “a match” occurs if the outcomes of the two dice are identical. Now weplay a game with the following three possible rules:

Rule 1. Throw the two fair dice 3 times. If you get at least one match, then you win the game.

Rule 2. Throw the two fair dice 7 times. If you get at least two matches, then you win the game.

Rule 3. Throw the two fair dice 11 times. If you get at least three matches, then you win the game.

Under which rule that you will have a higher probability of winning the game?

Solution The probability of getting a match is given by p =6

36=

1

6. Let Xi (i = 1, 2, 3) be the number

of matches obtained under the Rule i. Then

X1 ∼ Bin(3,1

6), X2 ∼ Bin(7,

1

6), X3 ∼ Bin(11,

1

6).

Now,

P (Win under Rule 1) = P (X1 > 1)

= 1− P (X1 = 0)

= 1−(5

6

)3 ≈ 0.4213,


= 1− P (X2 = 0)− P (X2 = 1)

= 1−(5

6

)7 − C71 ×

(5

6

)6(1

6

)≈ 0.3302,


= 1− P (X3 = 0)− P (X3 = 1)− P (X3 = 2)

= 1−(5

6

)11 − C111 ×

(5

6

)10(1

6

)− C11

2 ×(5

6

)9(1

6

)2 ≈ 0.2732.

In conclusion, one is most likely to win the game under Rule 1. ✷

� Example 3.7 (Binomial distribution ⋆⋆⋆ )

Suppose there is a game in which the player can either win or lose tokens. Jack repeatedly plays the game anumber of times and assume that the games are independent to each other. At each game, he will win onetoken with a probability of 0.3 or lose one token with a probability of 0.7. Suppose that Jack has 5 tokens inthe beginning (before the first play). He will quit the game play only when he has no token. Let Xk be thenumber of tokens that Jack has in hand right after playing the game k times, where k = 1, 2, 3, · · · .(a) What are the possible outcomes of X2?

(b) Find the expected value of X2.

(c) Describe briefly the situation when X5 = 0.

(d) Find the probability that Jack will quit the game play within the first 8 plays (i.e., playing the gameless than or equal to 8 times).

(e) Given that Jack quits the game play within the first 8 plays. What is the probability that he has exactly5 tokens right after the second play?

Solution

58

(a) The possible outcomes of X2 are 3, 5, 7.

(b)

P (X2 = 3) = (0.7)2 = 0.49,

P (X2 = 5) = C21 × (0.3)(0.7) = 0.42,

P (X2 = 7) = (0.3)2 = 0.09.

The expected value of X2 is given by

E(X2) = 3× 0.49 + 5× 0.42 + 7× 0.09 = 4.2

(c) John loses all 5 tokens after playing the game 5 times. He will quit the game play accordingly.

(d)

The required probability = P (X5 = 0) + P (X7 = 0)

= (0.7)5 + C51 (0.3)(0.7)

4 × (0.7)2

≈ 0.3445.

(e) Denote E1 as the event that Jack quits the game play within the first 8 plays and E2 as the eventthat Jack has exactly 5 tokens right after the second play.

The required probability = P (E2

∣∣ E1)

=P (E1 and E2)

P (E1)

≈ P (X2 = 5)× (0.7)5

0.3445

=0.42× (0.7)5

0.3445≈ 0.2050.

✷

� Example 3.8 (Binomial vs. hypergeometric ⋆ )

A box contains 20 balls of which 6 are white and 14 are black. Eight balls are drawn at random from the box.Find the probability that the sample contains exactly 3 white balls if

(a) sampling is done without replacement. What kind of distribution is this?

(b) sampling is done with replacement. What kind of distribution is this?

Solution

(a) Let X denote the number of white balls. For sampling without replacement, the probability of ob-taining 3 white balls is given by the hypergeometric distribution such that

X ∼ H(N,n, r), where N = 20, n = 8, r = 6.

Thus,

P (X = 3) =C6

3 C145

C208

=20× 2002

125970≈ 0.318.

59


(b) For sampling with replacement, the probability of obtaining 3 white balls is given by the binomialdistribution such that

X ∼ Bin(n, p), where n = 8, p =6

20= 0.3.

Thus,P (X = 3) = C8

3 (0.3)3 (0.7)5 ≈ 0.254.

✷

� Example 3.9 (Hypergeometric distribution ⋆ )

In a lottery, 6 numbers are drawn out of 49. Find the probability that “8” is one of the numbers drawn.

Solution The required probability is given by

C11 C

485

C496

=48× 47× 46× 45× 44

5× 4× 3× 2× 1× 6× 5× 4× 3× 2× 1

49× 48× 47× 46× 45× 44=

6

49≈ 0.1224.

✷


In Mark Six, 6 ordinary numbers are drawn out of 49. Find the probability that “8” and “18” are two of thenumbers drawn.

Solution The required probability is given by

C22 C

474

C496

=5

392≈ 0.0128.

✷


Suppose you plan to select four of 10 new stock issues (stock issued by new companies) and that, unknownto you, three of the 10 will result in substantial profits and seven will result in losses. What is the probabilitythat at least two of the three profitable issues will appear in your selection?

Solution We use the formula for the hypergeometric probability distribution to compute the probabilityof observing at least two successes in the sample of n = 4 is

P (X > 2) = P (X = 2) + P (X = 3)

=C3

2 C72

C104

+C3

3 C71

C104

=3× 21

210+

1× 7

210=

1

3.

The required probability approximately equal to 33.33%. ✷


From an ordinary pack of 52 poker cards, we draw one hand of 13 cards. What is the probability that wehave 5 spades in our hand?

Solution Let X denote the number of spades. The probability of obtaining 5 spades in our hand is givenby the hypergeometric distribution such that

X ∼ H(N,n, r), where N = 52, n = 13, r = 13.

Thus,

P (X = 5) =C13

5 C398

C5213

=1164427407

9338434700≈ 0.1247.

Note that in the above we have used the computer software Mathematica for evaluations. ✷

60

� Example 3.13 (Hypergeometric distribution and its binomial approximation ⋆⋆ )

A city is inhabited by 75,000 adults of whom 500 are university professors. In a survey on higher educa-tion carried out by the local hotline radio show, 25 people are chosen at random without replacement forquestioning.

(a) What is the probability that the sample contains at most one professor? Use a suitable random variableto solve this question.

(b) Use a suitable distribution to give a numerical approximation of the answer in part (a).

Solution

(a) The probability the sample contains k professors is given by the hypergeometric distribution,

P (X = k) =Cr

k · CN−rn−k

CNn

, where N = 75000, n = 25, r = 500.

Hence,

P (sample contains at most one professor) = P (X = 0) + P (X = 1)

=C500

0 · C7450025 + C500

1 · C7450024

C7500025

.

The above expression is simply “an answer” in the sense we understand it is indeed a value but howeverwe do not know exactly how large (or how small) of this value because we are unable to compute thevalues of the combinations. We need an approximation of this expression.

(b) In the case when n = 25 ≪ N = 75000, the binomial distribution should give an accurate approxima-tion,

X ∼ Bin(n, p), where n = 25, p =500

75000=

1

150.

Hence,

P (sample contains at most one professor) = P (X = 0) + P (X = 1)

= C250 ×

( 1

150

)0(149

150

)25+ C25

1 ×( 1

150

)1(149

150

)24

≈ 0.9880.

Indeed the value is very large, we have no clue of this if we just based on the expression in part (a).

✷

� Example 3.14 (Poisson distribution approximating binomial ⋆ )

In the lottery game “Mark Six”, players seek to guess which six numbers will be drawn out of a lottery machinewhich contains colored balls numbered 1 to 49. Suppose that 7,000,000 people play the lottery, and assume thateach player independently chooses, at random and without replacement, six numbers from 1, 2, · · · , 49. LetX denote the total number of players who win the six drawn numbers. Name the exact distribution of X, anduse a suitable approximation to this distribution and find, approximately, the probabilities (a) P (X = 0),(b) P (X = 1).

Solution The exact distribution of X is binomial:

X ∼ Bin(n, p) with n = 7,000,000 and p =1

C496

=1

13, 983, 816.

In the above, n is very large and p is very small. As an application, Poisson distribution may be used toapproximate the binomial probability. Its Poisson approximation has the parameter (mean):

λ = np = 0.50058.

61


In general, the Poisson probability distribution function is given by

P (X = k) =e−λ λk

k!.

(a) P (X = 0) = e−λ = e−0.50058 = 0.60618.

(b) P (X = 1) = λe−λ = 0.50058 · e−0.50058 = 0.30344.

✷

� Example 3.15 (Poisson distribution approximating binomial ⋆ )

An analyst predicted that 3.5% of small corporations would file for bankruptcy in the coming year. For arandom sample of 120 small corporations, use the Poisson distribution to estimate the probability that atleast 5% of them will file for bankruptcy in the next year, assuming that the analyst’s prediction is correct.

Solution The distribution of the number of corporations that will file for bankruptcy is binomial withn = 120 and p = 0.035, so that in terms of binomial variables,

X ∼ Bin(120, 0.035),

and in fact we look for the probability of at least 6 bankruptcies (120× 5% = 6)

P (X > 6) = 1− P (X 6 5) = 1−5∑

k=0

C120k (0.035)k(0.965)120−k.

The calculation of the above sum is a bit troublesome and we are going to use the Poisson distribution (themean of distribution is λ = np = 120× 0.035 = 4.2) to approximate the probability, we find

P (X > 6) = 1− P (X 6 5)

= 1−5∑

k=0

e−λλk

k!

= 1− e−4.2( (4.2)0

0!+

(4.2)1

1!+

(4.2)2

2!+

(4.2)3

3!+

(4.2)4

4!+

(4.2)5

5!

)

≈ 0.2469.

The required probability is approximately equal to 24.69%. ✷

� Example 3.16 (Poisson distribution ⋆⋆ )

Henry is an influential player in his soccer team. On the average, the team scores every 30 minutes in hispresence, but scores only once every 45 minutes in his absence. Henry has picked up a light injury now andhis chance of playing in the game tomorrow is only 50%. What is the probability that Henry’s team will scoretwo or more goals tomorrow? (Note: A soccer game is played for 90 minutes.)

Solution Let X be the scores that the Henry’s team can get. Under Henry’s presence,

X ∼ Poisson(3).

That is,

P (X = k∣∣ Henry present) =

3k

k!e−3.

Under Henry’s absence,X ∼ Poisson(2).

That is,

P (X = k∣∣ Henry absent) =

2k

k!e−2.

62

Now,

P (X > 2) = 1− P (X 6 1)

= 1− P (X 6 1∣∣ Henry present)× P (Henry present)

−P (X 6 1∣∣ Henry absent)× P (Henry absent)

= 1−(e−3 + 3e−3

)× 1

2−(e−2 + 2e−2

)× 1

2

= 1− 2e−3 − 1.5e−2

≈ 0.6974.

The required probability is approximately equal to 69.74%. ✷

� Example 3.17 (Poisson distribution ⋆⋆⋆ )

Flaws in lengths of rope made by Company A occur in a Poisson process at rate λA per metre length, so thatthe number of flaws X in a length of l metres of rope has the Poisson probability mass function

P (X = k) =exp(−λA l) (λAl)

k

k!, k = 0, 1, 2, · · · , λA > 0.

(a) Find the probability that there are (i) no flaws, (ii) more than 2 flaws, in a 1000-metre length of ropemade by company A, given that λA = 0.002.

(b) Company B makes similar rope, indistinguishable in appearance from that made by Company A, inwhich flaws occur in a Poisson process at rate λB = 0.003 per metre. A boat is rigged with 100 metresof rope from Company A and 100 metres of rope from Company B. Assuming that the lengths ofrope supplied by A and B are independent, find the probability that (i) there are no flaws, (ii) there isexactly one flaw, in the rigging of this boat.

(c) (i) A manufacturer of rigging for sailing boats buys 75% of his rope from Company A and 25% fromCompany B. The supplier’s label has become detached from a drum of rope of length 2 km whichis found to have 7 flaws. Find the probability that this drum was supplied by Company A.

(ii) Suppose, instead, that the rope in this drum had been found to have 8 flaws. Find the probabilitythat this drum was supplied by Company A. Compare this probability with your answer to part (i)and comment.

Solution

(a) Average per unit metre = λ = λAl. X ∼ Poisson(λ = 0.002× 1000 = 2). P (X = k) = e−2 2k

k!.

(i) P (X = 0) = e−2 ≈ 0.1353.

(ii) P (X > 2) = 1− P (X 6 2) = 1− e−2(

1 + 2 +22

2

)

≈ 1− 0.6767 = 0.3233.

(b) XA ∼ Poisson(0.002× 100) = Poisson(0.2) and XB ∼ Poisson(0.003× 100) = Poisson(0.3).

(i)

P (no flaws) = P (XA = 0 and XB = 0)

= P (XA = 0)× P (XB = 0) by independence

= e−0.2 × e−0.3 = e−0.5

≈ 0.6065.

63


(ii) It follows from independence that

P (exactly one flaw) = P (XA = 0 and XB = 1) + P (XA = 1 and XB = 0)

= P (XA = 0)× P (XB = 1) + P (XA = 1)× P (XB = 0)

= e−0.2 × (0.3 e−0.3) + (0.2 e−0.2)× e−0.3

= 0.5 e−0.5

≈ 0.3033.

(c) (i)

P (A∣∣ 7 flaws) =

P (7 flaws∣∣ A)P (A)

P (7 flaws)

=P (7 flaws

∣∣ A)P (A)

P (7 flaws∣∣ A)P (A) + P (7 flaws

∣∣ B)P (B)

=e−4 47

7!× 0.75

e−447

7!× 0.75 + e−6

67

7!× 0.25

≈ 0.5647.

The above probability is just larger than 0.5 so that A is more responsible than B.

(ii)

P (A∣∣ 8 flaws) =

P (8 flaws∣∣ A)P (A)

P (8 flaws)

=P (8 flaws

∣∣ A)P (A)

P (8 flaws∣∣ A)P (A) + P (8 flaws

∣∣ B)P (B)

=e−4 48

8!× 0.75

e−448

8!× 0.75 + e−6

68

8!× 0.25

≈ 0.4638.

The above probability is just less than 0.5 so that B is, however, more responsible than A.

✷

Remark This problem is to judge the dominant effect of “Quantity vs. Reliability”: the rigging containsmore rope from company A than from company B but the rope from B is less reliable than that from A.Thus, as we find increasingly more flaws in the rope (starts from and including 8 flaws) in the rope, theprobability that it came from company A reduces to less than 0.5. Remark that

P (A∣∣ N flaws) =

e−4 4N

N !× 3

4

e−44N

N !× 3

4+ e−6

6N

N !× 1

4

=3

3 + e−2(6

4

)N.

It follows from the above that P (A∣∣ N flaws) gets smaller as N gets larger.

64

� Example 3.18 (Negative binomial distribution ⋆⋆ )

Two teams, A and B, play a series of games. If team A has probability 0.55 of winning each game, is it to itsadvantage to play the best two out of three games or the best three out of five games? Assume the outcomesof successive games are independent.

Solution Let X denote the random variable that follows a negative binomial distribution,

X ∼ NegBin(r, p).

The key characteristic of the negative binomial distribution is:

r successes in k trials and the rth success happens on the kth trial (k > r).

Schematically, whenever k = r, r + 1, r + 2, · · · ,

1st 2nd 3rd 4th · · · (k − 2)th (k − 1)th︸︷︷︸

(r − 1) successes distributed as binomial over first (k − 1) trials

kth︸︷︷︸

rth success

The negative binomial probability is given by the product of two probabilities (by independence)

(

Ck−1r−1 (p)r−1 (1− p)(k−1)−(r−1)

)

︸︷︷︸

binomial probability

× p

or (after simplifications)

P (X = k) = Ck−1r−1 pr

(1− p

)k−r,

where k = r, r + 1, r + 2, · · · .

To answer the given question, now we have the following two considerations:

1. “Two out of Three”. X ∼ NegBin(r = 2, p = 0.55).

P (Team A wins) = P (X = 2) + P (X = 3)

= 0.552 + C21 (0.55)

2(0.45)

≈ 0.5748.

2. “Three out of Five”. Y ∼ NegBin(r = 3, p = 0.55).

P (Team A wins) = P (Y = 3) + P (Y = 4) + P (Y = 5)

= 0.553 + C32 (0.55)

3(0.45) + C42 (0.55)

3(0.45)2

≈ 0.5931.

It is more advantageous for team A (the better team) to play “the best three out of five games”. ✷

65


66

Chapter 4

Continuous Random Variables

� Example 4.1 (Probability density function ⋆ )

The continuous random variable X has probability density function given by

f(x) =

{kx2(1− x)2, 0 6 x 6 1,

0, otherwise.

Prove that P(

X 61

3

)

=17

81.

Solution We have k

∫ 1

0

x2(1− x)2 dx = 1, so k

∫ 1

0

(x2 − 2x3 + x4) dx = 1. This gives

1 = k

[1

3x3 − 1

2x4 +

1

5x5

]1

0

= k

(1

3− 1

2+

1

5

)

=⇒ k = 30.

Hence,

P(

X 61

3

)

=

∫ 1/3

0

30(x2 − 2x3 + x4) dx

= 30

[1

3x3 − 1

2x4 +

1

5x5

]1/3

0

= 30

(1

34− 1

2· 1

34+

1

5· 1

35

)

=17

81.

✷

� Example 4.2 (Normal distribution ⋆ )

Let X be the normal distributed random variable with µ = 1 and σ2 = 4. Find P(|X| > 1

).

Solution Given that X ∼ N(µ, σ2) = N(1, 22).

P(|X| > 1

)= P (X > 1 or X < −1)

= P (X > 1) + P (X < −1)

= P(

Z >1− 1

2

)

+ P(

Z <−1− 1

2

)

= P (Z > 0) + P (Z < −1)

≈ 0.5 + (0.5− 0.3413)

= 0.6587.

✷

67

4. Continuous Random Variables

� Example 4.3 (Normal distribution ⋆⋆ )

According to the Bureau of Labour Statistics, the average weekly pay for a U.S. production worker was$411.84 (The World Almanac, 2000). Assume that available data indicate that production worker wageswere normally distributed with a standard deviation of $90.

(a) What is the probability that a worker earned between $400 and $500?

(b) How much did a production worker have to earn to be in the top 20% of wage earners?

(c) For a randomly selected production worker, what is the probability that the worker earned less than$250 per week?

Solution

(a) Let X denote the production worker wages. Then

X ∼ N(µ, σ2), µ = 411.84, σ = 90.

Hence,

P (400 6 X 6 500) = P(400− 411.84

906 Z 6

500− 411.84

90

)

≈ P (−0.13 6 Z 6 0.98)

≈ 0.8365− (1− 0.5517)

= 0.3882.

(b) We must find the z-value that cuts off an area of 0.20 in the right tail. Using the standard normaltable, we find that z = 0.84 cuts off approximately 0.20 in the right tail. So,

x = µ+ zσ

= 411.84 + 0.84× 90

= 487.44.

Weekly earnings of $487.44 or above will put a production worker in the top 20%.

(c) At 250,

z =250− 411.84

90= −1.80.

Hence,P (X 6 250) = P (Z 6 −1.80)

≈ 1− 0.9641

= 0.0359.

The probability that a randomly selected production worker earns less than $250 per week is 3.59%.

✷

� Example 4.4 (Normal approximation to binomial distribution ⋆ )

According to an estimate, 55% of the people in Hong Kong have at least one credit card. If a random sampleof 30 persons is selected, what is the probability that at least 19 of them will have at least one credit card?

68

Solution Let n be the total number of persons in the sample, x be the number of persons in the samplewho have at least one credit card, and p be the probability that a person has at least one credit card. Then,this is a binomial problem with n = 30, p = 0.55. Using the binomial formula,

X ∼ Bin(30, 0.55),

and in fact we look for the probability

P (X > 19) =

30∑

k=19

C30k (0.55)k(0.45)30−k.

The calculation of the above sum is a bit troublesome and we are going to use the normal distribution(µ = np = 30× 0.55 = 16.5 and σ =

√npq = 2.7249) to approximate the probability. Remember that we

have to apply the continuity correction when approximating binomial probability using normal distribution,we find that

P (X > 19) ≈ P (Z >18.5− 16.5

2.7249)

≈ P (Z > 0.73)

≈ 0.5− 0.2673

= 0.2327.✷


In a multiple-choice examination, each question is linked with 5 possible answers, of which only one is correct.A particular candidate has probability p (0 < p < 1) of knowing the correct answer to a question. If thecandidate does not know the correct answer, then he chooses one of the possible answers at random. If theexamination consists of 50 questions, then it can be proved that a candidate must give 34 correct answersin order to obtain 30 marks. Based on this simple fact (no proof is needed) please answer the followingquestion: For the case where p = 0.75 independently for each question, find approximately the probabilitythat this candidate’s total mark for the examination is at least 30.

Solution Firstly find the probability that the candidate answers the question correctly in terms of p.

P (answers correctly) = P (answers correctly∣∣ knows)× P (knows)

+P (answers correctly∣∣ doesn’t know)× P (doesn’t know)

= 1× p+1

5× (1− p)

= p+1

5(1− p)

=1

5(1 + 4p) .

Use normal approximation. Let Y be the number of correct answers out of 50. Then

Y ∼ Bin(

50,1

5(1 + 4p)

)

= Bin(50, 0.8)approx.∼ N(40, 8).


P (Y > 34) ≈ P(

Z >33.5− 40√

8

)

≈ P (Z > −2.30)

= P (Z < 2.30)

≈ 0.9893.

✷

69



A large chain retailer purchases a certain kind of electronic device from a manufacturer. The manufacturerindicates that the defective rate of the device is 5%.

(a) The inspector of the retailer randomly picks 20 items from a shipment. What is the probability thatthere will be at least two defective items among these 20?

(b) Suppose that the retailer receives 30 shipments in a month and the inspector randomly tests 20 devicesper shipment. What is the probability that there will be more than 10 shipments containing at leasttwo defective devices? Use the normal approximation to find the answer.

Solution

(a) Let X be the number of defective items among the 20. Then

X ∼ Bin(n, p), where n = 20, p = 0.05.

Hence,

P (at least two defective) = P (X > 2)

= 1− P (X = 0)− P (X = 1)

= 1− C200

(0.05

)0 (0.95

)20 − C201

(0.05

)1 (0.95

)19

≈ 0.2642.

(b) Let Y be the number of shipments which contain at least two defective devices. Then

Y ∼ Bin(n, p), where n = 30, p = 0.2642.


P (Y > 10) =

30∑

k=11

C30k

(0.2642

)k (0.7358

)30−k.

Use the normal approximation to find the value of the above expression,

Y ∼approx

N(µ, σ2), where µ = np = 7.926, σ2 = npq ≈ 5.832.

Hence,

P (Y > 10) = P (Y > 11)

≈ P(

Z >10.5− 7.926√

5.832

)

≈ P (Z > 1.07)

≈ 1− 0.8577

= 0.1423.

✷


Peter is studying an examination that will consist of 45 multiple-choice questions selected randomly froma large test bank given in advance. For each question, four possible answers are presented but only one iscorrect. Peter studies all the questions of the test bank and assume that he knows the correct answers for60% of them and he will randomly choose an answer if he doesn’t know the correct answer. What is theprobability that the student will get more than 60% of the examination questions correct?

70

Solution Let X be the number of correct answers out of 45. Then

X ∼ Bin(45, p), where p = 0.6 + 0.4× 1

4= 0.7.

Note that 60% of the 45 multiple-choice questions is 27.

P (X > 27) = P (28 6 X 6 45) =

45∑

k=28

C45k (0.7)k(0.3)45−k.

In order to find the numerical answer of the above binomial probability, use normal approximation that

Xapprox∼ N(µ, σ2), where µ = 45× 0.7 = 31.5 and σ2 = 45× 0.7× 0.3 = 9.45.

P (X > 27) ≈ P(

Z >27.5− 31.5√

9.45

)

≈ P (Z > −1.30)

= P (Z 6 1.30)

≈ 0.9032.✷


In a manufacturing process where glass products are produced defects or bubbles occur, occasionally renderingthe piece undesirable for selling. It is known that on average 1 in every 800 of these items produced has oneor more bubbles.

(a) What is the binomial probability that a random sample of 4000 will yield fewer than 5 items possessingbubbles? Using Poisson approximation to the binomial distribution, find the approximated value ofthe probability.

(b) Using Normal approximation with continuity corrections, find, alternatively, the approximation of thebinomial probability in (a).

Solution

(a) This is essentially a binomial experiment with n = 4000 and p = 1/800 = 0.00125. If X denotes thenumber of items possessing bubbles, we have

P (X < 5) =4∑

k=0

P (X = k)

=4∑

k=0

C4000k (0.00125)k(0.99875)4000−k.

The above expression gives the exact value of the required (binomial) probability. However, it seemstroublesome in the calculations especially the value of C4000

4 , for example, could be very large. Forthis question, since p is very small and n is quite large, we shall approximate with the Poisson usingλ = 4000× 0.00125 = 5. Hence,

P (X < 5) ≈4∑

k=0

λk e−λ

k!

= e−5

(

1 + 5 +52

2!+

53

3!+

54

4!

)

≈ 0.4405.

The required probability is approximately equal to 44.05%.

71


(b) We look for the probability by the normal approximation,

X ∼ Bin(n = 4000, p = 0.00125)approx∼ N(µ = 5, σ2 = 4.99375).

Hence,P (X < 5) = P (0 6 X 6 4)

≈ P(

Z 64.5− 5√4.99375

)

≈ P (Z 6 −0.22)

= 1− P (Z 6 0.22) (by symmetry)

≈ 1− 0.5871

= 0.4129.

✷

� Example 4.9 (Normal approximation to binomial distribution ⋆⋆ )

A multiple-choice test consists of 50 questions each with 5 possible answers of which only one answer iscorrect. Suppose that the 50 questions are randomly selected from a large test bank. Assume that a studentknows the answer for 60% of all questions in the test bank and the student will randomly choose an answerif the student doesn’t know the correct answer.

(a) What is the probability that the student will get more than 40 correct answers?

(b) Suppose that each correct answer is awarded 3 marks and each incorrect answer carries a penalty of1 mark. What are the mean and the standard deviation of the total marks of the student?

Solution Let X be the number of correct answers out of 50. Then

X ∼ Bin(50, p), where p = 0.6 + (0.4)(1

5) = 0.68.

(a)

P (X > 40) =50∑

k=41

C50k (0.68)k(0.32)50−k.

In order to find the numerical answer, use normal approximation that

Xapprox∼ N(µ, σ2), where µ = 50× 0.68 = 34 and σ2 = 50× 0.68× 0.32 = 10.88.

P (X > 40) = P (X > 41)

≈ P (Z >40.5− 34√

10.88)

≈ P (Z > 1.97)

≈ 1− 0.9756

= 0.0244.

(b) Let Y be the total marks obtained by the student. Then

Y = 3X − (50−X) = 4X − 50.

Hence,E(Y ) = E

(4X − 50

)

= 4× E(X)− 50

= 4× 34− 50 = 86.

72

Besides,

Var(Y ) = Var(4X − 50)

= 42 ×Var(X)

= 16× 10.88 = 174.08,

or

σY =√174.08 ≈ 13.1939.

✷

� Example 4.10 (Normal approximation to binomial distribution ⋆⋆⋆ )

There is a Liberal Studies test in a school. The test has N questions and each question has 5 possible choices,exactly two of which are correct.

• All candidates select exactly two choices for each question in the test.

• A candidate can get 1 mark in a question if his/her two selected choices in this question are bothcorrect; otherwise, no marks will be given.

• All questions in the test are independent.

• A wild-guesser is a candidate who selects at random for all questions.

(a) What is the probability that a wild-guesser will get 1 mark in one single question?

(b) Suppose that N = 10. Denote X the marks that a wild-guesser will get in the test.

(i) Find the probabilities: P (X = 0), P (X 6 1) and P (X 6 2).

(ii) Determine the minimum passing mark (in integers) such that the probability that a wild-guesserwill pass the test is less than 5%.

(c) Suppose that N = 90. By using normal approximation to binomial probabilities, determine the mini-mum passing mark (in integers) such that more than 67% of wild-guessers will not pass the test.

Solution

(a) P (1 mark) =1

C52

=1

10.

(b) N = 10. X ∼ Bin(10,1

10).

(i)

P (X = 0) = (9

10)10 ≈ 0.3487,

P (X 6 1) = (9

10)10 + C10

9 (1

10)(

9

10)9 ≈ 0.7361,

P (X 6 2) = (9

10)10 + C10

9 (1

10)(

9

10)9 + C10

8 (1

10)2(

9

10)8 ≈ 0.9298.

(ii)

P (X 6 3) ≈ 0.9298 + C107 (

1

10)3(

9

10)7 ≈ 0.9872.

The minimum passing mark is therefore chosen as 4.

73


(c) Let the integer k be the passing mark. Therefore, a candidate will get a fail whenever he/she gets themarks less than or equal to k − 1; and a pass whenever he/she gets the marks more than or equal to

k. Y ∼ Bin(90,1

10)

approx.∼ N(9, 81). To find the value of k, we solve the inequality

P (Y 6 k − 1) > 0.67,

P (Z 6k − 0.5− 9√

8.1) > 0.67,

k − 0.5− 9√8.1

> 0.44,

k > 10.75.

The minimum passing mark is therefore chosen as 11.

✷

� Example 4.11 (Normal approximation to Poisson distribution ⋆⋆ )

A certain insurance company offer life, fire and vehicle coverage. The numbers of claims arriving on any givenday on these three types of policy are independent Poisson random variables with mean equal to 33, 25 and45, respectively. What is the probability, on any given day, the company will receive claims on more than 120policies of all three types? Give the formula of calculating the exact probability. Using Normal approximationwith continuity corrections, find the best guess of the probability. (Hint: The sum of independent Poissonrandom variables gives a Poisson random variable, too.)

Solution Let X1, X2 and X3 be the claims of life insurance, fire insurance and vehicle insurance, respec-tively. By given,

X1 ∼ Poisson(33), X2 ∼ Poisson(25), X3 ∼ Poisson(45).

Since X1, X2, X3 are independent,

X1 +X2 +X3 ∼ Poisson(33 + 25 + 45).

Denote X = X1 +X2 +X3. ThenX ∼ Poisson(103).

To find the exact probability,

P (X > 120) = 1− P (X 6 120) = 1−120∑

k=0

103k

k!e−103.

Using Normal approximation, Xapprox∼ N(103, 103) with continuity correction, we have

P (X > 120) = P (X > 121)

≈ P(

Z >120.5− 103√

103

)

≈ P (Z > 1.72)

≈ 1− 0.9573

= 0.0427.

✷

74


The number of fatal traffic accidents which occur at random at a known ‘black spot’ follows a Poissondistribution with mean 4 per year. Let X denote the actual number of fatal accidents. Let Y denote theannual number of non-fatal accidents at the same place which may be assumed to be a Poisson randomvariable with mean 12. Note that X and Y are independent.

(a) (Poisson distribution) Given that there were in total 20 accidents one year (i.e., X + Y = 20). Findthe conditional probability that 5 of these are fatal.

(b) (Normal approximation to Poisson distribution) The total number of traffic accidents is denoted byV = X + Y . State the distribution of V and write down the mean and variance of V . Use normalapproximation to calculate the probability that there will be more than 18 accidents in the next year.

Solution

(a) Note that we have the following fact: If X ∼ Poisson(λ1) and Y ∼ Poisson(λ2) are independent,then

(i) X + Y ∼ Poisson(λ1 + λ2) and

(ii) P (X = k∣∣ X + Y = n) = Cn

k

( λ1

λ1 + λ2

)k ( λ2

λ1 + λ2

)n−k

.

If follows that λ1 = 4, λ2 = 12, and

P (X = 5∣∣ X +Y = 20) = P

(

X = 5∣∣ X ∼ Bin(20,

4

16= 0.25)

)

= C205 (0.25)5(0.75)15 = 0.2023.


(b) V is Poisson with

Mean = E(V ) = λ = 16 and Variance = Var(V ) = λ = 16.

Use normal approximation

V ∼ Poisson(16)approx.∼ N(16, 16).

ThenP (V > 18) = P (V > 19)

≈ P(

Z >18.5− 16

4

)

= P (Z > 0.625)

≈ 0.2643.


✷


Many Chinese tourists are enthusiastic about visiting Hong Kong every year during the Golden Week hol-idays. They usually travel via airlines, railway or automobiles. The numbers of mainland Chinese peoplevisiting Hong Kong on any given day through these three types of transportation are assumed to be mutuallyindependent Poisson random variables with mean equal to 16, 59 and 33 (thousands), respectively.

(a) What is the probability, on any given day, there are more than 110 thousands people visiting HongKong via these all three types of transportation? Give the formula of calculating the exact probability.

(b) Using Normal approximation with continuity corrections, find the best guess of the probability in (a).

Solution

75


(a) Let X1, X2 and X3 be the numbers of people visiting Hong Kong via the transportation of airlines,railway and automobiles, respectively. By given,

X1 ∼ Poisson(16), X2 ∼ Poisson(59) and X3 ∼ Poisson(33).

Now let X = X1 +X2 +X3. Since X1, X2 and X3 are mutually independent variables, we have

X ∼ Poisson(108).

To find the exact probability,

P (X > 110) = 1−110∑

k=0

108k

k!e−108,

which however involves very troublesome calculations.

(b) The result in (a) leads to the normal approximation,

X ∼ Poisson(108)approx∼ N(108, 108)

with continuity correction, we have

P (X > 110) = P (X > 111)

≈ P(

Z >110.5− 108√

108

)

≈ P (Z > 0.24)

≈ 1− 0.5948

= 0.4052.

The probability is approximately equal to 40.52%.

✷

� Example 4.14 (Sum of independent normal random variables ⋆⋆ )

The weight of a randomly selected can of a new soft drink is known to have a normal distribution with mean12.1 ounces and a standard deviation of 0.1 ounce.

(a) What is the probability that if a can is drawn at random, it weights between 11.9 and 12.3 ounces?

(b) What weight should be printed on the can so that the average weight of cans in a six pack is underweightfor only 1% of all six-pack.

Solution Let X be the weight of a can of the new soft drink. Then X ∼ N(12.1, 0.12).

(a)

P (11.9 < X < 12.3) = P(11.9− 12.1

0.1< Z <

12.3− 12.1

0.1

)

= P (−2 < Z < 2)

= (0.9772− 0.5)× 2

= 0.9544.

(b) Denote X the average weight of the cans in a six-pack. Then

X =1

6

(

X1 +X2 +X3 +X4 +X5 +X6

)

and

E(X)

=1

6× 6× 12.1 = 12.1 and Var(X ) =

1

62× 6× 0.12 =

0.12

6.

76

Thus,

X ∼ N(12.1,0.12

6).

Our target is to hunt for x such that

−Z0.01 =x− 12.1

0.1/√6

= −2.325.

Solving the above equation for x gives x = 12.00508 ounces.

✷


Lemon tea is dispensed by a machine into bottles. The nominal volume of lemon tea in a bottle is 1 litre(1000 ml). The actual volumes of lemon tea put into the bottles can be regarded as being independentlynormally distributed, with mean set at 1010 ml and standard deviation 8 ml.

(a) Find the proportion, in a long run of production, of bottles containing less than the nominal volume.

(b) Bottles of lemon tea are often sold in packs of 6 bottles. Write down the distribution of the total volumeof lemon tea in a pack of 6 bottles. Find the probability that the total volume of lemon tea in a packof 6 bottles is less than 6 litres. Explain why this probability is less than your answer to part (a).

(c) A new and more accurate machine is now available, for which the volume of lemon tea dispensed isnormally distributed but with smaller standard deviation 4 ml. By how much could the existing meanvolume of lemon tea dispensed into each bottle be reduced without increasing the existing proportionof bottles with less than the nominal volume? Supposing that the additional cost of the more accuratemachine is 18,000 dollars, and the cost of the lemon tea is 8 dollars per litre, how many bottles oflemon tea would have to be filled by the more accurate machine in order to justify its greater cost?

Solution

(a) Let X be the actual volumes of lemon tea put into the bottles. Then X ∼ N(1010, 82).

The required probability = P (X < 1000)

= P(

Z <1000− 1010

8

)

= P (Z < −1.25)

≈ 0.1056.

(b) Let Y denote the total volume in a 6-pack. Then Y = X1 +X2 +X3 +X4 +X5 +X6 and

E(Y ) = E(X1 +X2 +X3 +X4 +X5 +X6

)

= E(X1) + E(X2) + E(X3) + E(X4) + E(X5) + E(X6)

= 1010 + 1010 + 1010 + 1010 + 1010 + 1010

= 6060,

Var(Y ) = Var(X1) + Var(X2) + Var(X3) + Var(X4) + Var(X5) + Var(X6)

= 82 + 82 + 82 + 82 + 82 + 82

= 384.

Thus, Y ∼ N(6060, 384) = N(6060, (8

√6)2).

The required probability = P (Y < 6000)

= P(

Z <6000− 6060

8√6

)

≈ P (Z < −3.06)

≈ 0.0011.

77


It would be less likely that all 6 bottles contain less than the nominal volume because the six bottlesare 6 independent events that cancellations may occur.

This probability is considerably smaller than in part (a). In practical terms, this is because there willbe a tendency for heavier and lighter bottles in a 6-pack to balance each other out. Alternatively,could use X ∼ N(1010, 64/6) and compare P (X < 1000) with P (X < 1000). In terms of theseprobability distributions of X and X: X has the same mean as X but only one-sixth of the variance,so less of the lower tail of the distribution of X is below the nominal volume of 1000.

(c) Let µ be the new mean volume of lemon tea dispensed. Then

P(

Z <1000− µ

4

)

= 0.1056.

Hence,1000− µ

4= −1.25

which implies thatµ = 1005.

There the existing mean volume of lemon tea dispensed could be reduced by 1010− 1005 = 5 (ml).

The number of bottles have to be filled =18000

8× 5× 0.001

= 450,000.

✷


Orange juice is dispensed by a machine into bottles. Bottles of orange juice are often sold in packs of 6 bottles.The nominal volume of orange juice in a bottle is 500 ml. The actual volumes of orange juice put into thebottles can be regarded as being independently normally distributed, with mean set at 505 ml and standarddeviation 4 ml. The production of a bottle of orange juice will only be accepted if its volume is within 7 mlof the target 505 ml.

(a) What is the probability that a randomly selected bottle of orange juice is acceptable?

(b) A customer specifies that a pack of 6 bottles will only be accepted if all the six bottles of orange juiceproduced are acceptable.

(i) What is the probability that a randomly selected pack will be acceptable?

(ii) What is the probability that in 15 packs, no more than two packs will be rejected?

(c) The production manager feels that it is simpler to check the total volume of the six bottles. He thinksthat the production of a pack is acceptable if the total volume of its six bottles is within 42 ml of thetarget volume 3030 ml (i.e., 505 ml ×6). What is the probability that a randomly selected pack willsatisfy this criterion?

(d) The customer and the manager inspect the packs of 6 bottles sold in a supermarket according to theirown criteria as specified in (b) and (c). Determine whether each of the following cases is possible, andgive an example with concrete data if it is (no example is needed for impossible cases):

(i) A pack is accepted by the customer but rejected by the manager.

(ii) A pack is rejected by the customer but accepted by the manager.

Solution

(a) X ∼ N(505, 42).

P (−7 < X − 505 < 7) = P(

− 7

4< Z <

7

4

)

= 0.9198.

Let p denote the above probability (p = 0.9198).

78

(b) (i)

The required probability = p6 = 0.6056.

Let q denote the above probability (q = 0.6056).

(ii)

The required probability = q15 + C151 q14(1− q) + C15

2 q13(1− q)2

≈ 0.02987.

(c) Let Y = X1 +X2 +X3 +X4 +X5 +X6. Then Y ∼ N(3030, 6× 42) = N(3030, (4√6)2).

P (−42 < Y − 3030 < 42) = P(

− 42

4√6< Z <

42

4√6

)

= 1.

(d) (i) Impossible.

(ii) Possible. Consider the particular example:

X1 = 500, X2 = 500, X3 = 500, X4 = 500, X5 = 500, X6 = 512.

✷

� Example 4.17 (Sum of independent normal random variables ⋆⋆⋆ )

My cycle journey to work is 3 km, and my cycling time (in minutes) if there are no delays is distributedN(15, 1), i.e. normally with mean 15 and variance 1.

(a) Find the probability that, if there are no delays, I get to work in at most 17 minutes.

(b) On my route there are three sets of traffic lights. Each time I meet a red traffic light, I am delayed by arandom time that is distributed N(0.7, 0.09). These lights operate independently. Find the probabilityof my getting to work in at most 17 minutes

(i) if just one light is set at red when I reach it.

(ii) if just two lights are set at red when I reach them.

(iii) if all three lights are set at red when I reach them.

(c) Suppose that, for each set of lights, the chance of delay is 0.5. Find the mean value of T , my totaljourney time, in minutes.

(d) Given that Var(T ) = 1.5025, use a suitable approximation to calculate the probability that, over10 journeys, my average journey time to work is at most 17 minutes.

Solution

(a) Let X be the cycling time in minutes for the 3km journey. Then X ∼ N(15, 1).

The required probability = P (X 6 17)

= P(

Z 617− 15

1

)

= P (Z 6 2)

≈ 0.9772.

79


(b) (i) The total journey time follows the distribution N(15 + 0.7, 1 + 0.09) = N(15.7, 1.09).

The required probability = P(

Z 617− 15.7√

1.09

)

≈ P (Z 6 1.245)

≈ 0.8935.

(ii) The total journey time follows the distribution N(15 + 0.7× 2, 1 + 0.09× 2) = N(16.4, 1.18).


Z 617− 16.4√

1.18

)

≈ P (Z 6 0.552)

≈ 0.7092.

(iii) The total journey time follows the distribution N(15 + 0.7× 3, 1 + 0.09× 3) = N(17.1, 1.27).


Z 617− 17.1√

1.27

)

≈ P (Z 6 −0.089)

≈ 0.4646.

(c) Let Tk be the total journey time if just k lights are set at red when I reach them, where k = 0, 1, 2, 3.Then

Tk ∼ N(15 + 0.7k, 1 + 0.09k), for k = 0, 1, 2, 3.

Note thatT = (0.5)3 × T0 + C3

1 (0.5)3 × T1 + C3

2 (0.5)3 × T2 + (0.5)3 × T3

= 0.125×(T0 + 3T1 + 3T2 + T3

).

The mean value of T is

E(T ) = E(0.125× (T0 + 3T1 + 3T2 + T3)

)

= 0.125×[

E(T0) + 3E(T1) + 3E(T2) + E(T3)]

= 0.125×[

15 + 3× 15.7 + 3× 16.4 + 17.1]

= 16.05.

(d)

The required probability = P (T 6 17)

= P(

Z 617− 16.05√

1.5025

10

)

≈ P (Z 6 2.45)

≈ 0.9929.

✷

80

Remark In part (d), the given variance is not correct. The correct value should be

Var(T ) = Var(

0.125 (T0 + 3T1 + 3T2 + T3))

= 0.1252 ×[

Var(T0) + 32 ×Var(T1) + 32 ×Var(T2) + Var(T3)]

, since Tk are independent

= 0.1252 ×[

1 + 9× 1.09 + 9× 1.18 + 1.27]

≈ 0.3547.

� Example 4.18 (Exponential distribution ⋆⋆ )

Suppose that 60% of all calls are personal and the rest are business calls in a call center. Furthermore, thelength of personal and business calls have exponential distributions with µp = 1 minute and µb = 3 minutes,respectively.

(a) What is the probability that a given call lasts less than 1 minute?

(b) Tom always claims that a call lasts less than 1 minute is a personal call, and otherwise it is a businesscall. What is the probability that he makes an incorrect claim on a given call?

Solution

(a) Note that the pdf of the exponential distribution is given by f(x) =1

µe−

xµ and hence

P (T < c) = 1− e−c/µ.

Particularly,P (Tp < 1) = 1− e−1 and P (Tb < 1) = 1− e−1/3.

Using the above results,

P (a call lasts less than 1 minute) = P (Tp < 1 and personal) + P (Tb < 1 and business)

= P (Tp < 1∣∣ personal call)× P (personal call)+

P (Tb < 1∣∣ business call)× P (business call)

=(1− e−1

)× 0.6 +

(1− e−1/3

)× 0.4

≈ 0.4927.

(b)P (incorrect claim) = P (Tb < 1 and business) + P (Tp > 1 and personal)

= P (Tb < 1∣∣ business call)× P (business call)+

P (Tp > 1∣∣ personal call)× P (personal call)

=(1− e−1/3

)× 0.4 + e−1 × 0.6

≈ 0.3341.

✷

81


82

Chapter 5

Mathematical Expectation

� Example 5.1 (Expected value ⋆ )

On the average, how many times must a die be thrown until one gets a “Six”?

Solution (Geometric distribution) Let p denote the probability for success and q for failure (p+ q = 1).

Trial 1 2 3 4 · · · k

P (first success on trial) p qp q2p q3p · · · qk−1p

Note thatp+ qp+ q2p+ q3p+ · · · = p

(

1 + q + q2 + · · ·)

=p

1− q= 1.

Now,mean number of trials = N = p+ 2qp+ 3q2p+ 4q3p+ · · · ,

qN = qp+ 2q2p+ 3q3p+ · · · .Subtraction gives

(1− q)N = 1.

If p =1

6, then

N =1

1− q=

1

p= 6.

✷


Consider the following two boxes in which each contains 7,000 dollars. After counting we find that

Box A contains: $1, 000× 1 paper, $500× 10 papers, $10× 100 papers.

Box B contains: $500× 2 papers, $100× 50 papers, $20× 50 papers.

In which box do you expect to get more if you pick one paper money from each box?

Solution There are 1 + 10 + 100 = 111 papers of money in Box A and 2 + 50 + 50 = 102 in Box B.The expected values of the paper money taken from each box are given by

E(dollars from Box A

)= 1000× 1

111+ 500× 10

111+ 10× 100

111= 63

7

111,

and

E(dollars from Box B

)= 500× 2

102+ 100× 50

102+ 20× 50

102= 68

64

102.

It is more likely one can get more money from Box B. ✷

83

5. Mathematical Expectation


The probability distribution for damage claims paid by the Automobile Insurance Company on collision in-surance follows.

Payment (dollars) Probability

0 0.90

400 0.04

1,000 0.03

2,000 0.01

4,000 0.01

6,000 0.01

(a) Use the expected collision payment to determine the collision insurance premium that would enable thecompany to break even.

(b) The insurance company charges an annual rate of $260 for the collision coverage. What is the expectedvalue of the collision policy for a policy holder? (Hint: It is the expected payments from the companyminus the cost of coverage.) Why does the policy holder purchase a collision policy with this expectedvalue?

Solution

(a)x f(x) xf(x)

0 0.90 0.00

400 0.04 16.00

1000 0.03 30.00

2000 0.01 20.00

4000 0.01 40.00

6000 0.01 60.00

Hence,

E(X) =∑

xf(x)

= 400× 0.04 + 1000× 0.03 + 2000× 0.01 + 4000× 0.01 + 6000× 0.01

= 166.

If the company charged a premium of $166.00 they would break even.

(b)Gain to policy holder f(Gain) Gain× f(Gain)

−260 0.90 −234.00

140 0.04 5.60

740 0.03 22.20

1740 0.01 17.40

3740 0.01 37.40

5740 0.01 57.40

Hence,E(Gain

)= −234 + 5.6 + 22.2 + 17.4 + 37.4 + 57.4 = −94.

The policy holder is more concerned that the big accident will break him than with the expectedannual loss of $94.

✷

84


A Personal Identification Number (PIN) consists of five digits in order, each of which may be any one of 0,1, 2, 3, 4, 5, 6, 7, 8, 9. Two PINs are chosen independently and at random, and you are given that eachPIN consists of five different digits. Let X be the random variable denoting the number of digits that the twoPINs have in common. Write explicitly the probability density function of X and hence find the mean of X.

Solution The image of X is {0, 1, 2, 3, 4, 5}.

P (X = 0) = P (No digit in common) =C5

5

C105

=1

252,

P (X = 1) = P (1 digit in common) =C5

1 × C54

C105

=25

252,

P (X = 2) = P (2 digits in common) =C5

2 × C53

C105

=100

252,


3 × C52

C105

=100

252,


4 × C51

C105

=25

252,


5

C105

=1

252.

By definition, the expectation of X is

E(X) = 1×( 25

252

)

+ 2×(100

252

)

+ 3×(100

252

)

+ 4×( 25

252

)

+ 5×( 1

252

)

=630

252= 2.5.

In fact, there is an easy method: E(X) = 12+ 1

2+ 1

2+ 1

2+ 1

2= 5

2= 2.5. ✷

� Example 5.5 (Expected value ⋆⋆ )

There are 6 pairs of identical socks placed in a drawer, in which 6 of them are left socks, 6 of them are rightsocks. Now, 7 socks (could either be left or right) are taken from them at random. Evaluate the expectedpairs of socks obtained.

Solution Let X be the number of pairs of socks taken from the drawer. The possible outcomes of thechosen socks and the corresponding probabilities are given in the following table:

(No. of left sock, No. of right sock

)Probability of the outcome

Outcome 1: (6, 1)C6

6 C61

C127

=6

792=

1

132

Outcome 2: (5, 2)C6

5 C62

C127

=90

792=

5

44

Outcome 3: (4, 3)C6

4 C63

C127

=300

792=

25

66

Outcome 4: (3, 4)C6

3 C64

C127

=25

66

Outcome 5: (2, 5)C6

2 C65

C127

=5

44

Outcome 6: (1, 6)C6

1 C66

C127

=1

132

Clearly, X ∈ {1, 2, 3}. By definition, the expectation is

E(X) = 1×( 1

132× 2)

+ 2×( 5

44× 2)

+ 3×(25

66× 2)

=181

66≈ 2.7424.

✷

85



Let 2 fair dice be rolled and the numbers showed up on them be X and Y . Find the expectation of |X − Y |.

Solution

P (|X − Y | = 1) = P({

(1, 2), (2, 1), (2, 3), (3, 2), (3, 4), (4, 3), (4, 5), (5, 4), (5, 6), (6, 5)})

=10

36,

P (|X − Y | = 2) = P({

(1, 3), (3, 1), (2, 4), (4, 2), (3, 5), (5, 3), (4, 6), (6, 4)})

=8

36,

P (|X − Y | = 3) = P({

(1, 4), (4, 1), (2, 5), (5, 2), (3, 6), (6, 3)})

=6

36,

P (|X − Y | = 4) = P({

(1, 5), (5, 1), (2, 6), (6, 2)})

=4

36,

P (|X − Y | = 5) = P({

(1, 6), (6, 1)})

=2

36.

By definition,

E(|X − Y |

)= 1× 10

36+ 2× 8

36+ 3× 6

36+ 4× 4

36+ 5× 2

36=

35

18.

✷


Toss a fair coin until the first head appears, and let X be the number of tosses required.

(a) What is the name of the distribution of X? Write the probability distribution function of X.

(b) What is the expected value of X?

Solution

(a) The random variable X whose probability distribution function is given by

f(1) =1

2, f(2) =

1

4, f(3) =

1

8, f(4) =

1

16, · · · , f(n) =

1

2n, · · ·

is said to be geometric random variable with parameter p =1

2.

(b) For convenience, let S = E(X),

Sdef.=

∞∑

x=1

xf(x)

= 1× 1

2+ 2× 1

4+ 3× 1

8+ 4× 1

16+ 5× 1

32+ · · ·

=1

2+

2

4+

3

8+

4

16+

5

32+ · · · ,

2S = 1 +2

2+

3

4+

4

8+

5

16+ · · · .

Subtraction gives 2S − S = 1 +1

2+

1

4+

1

8+

1

16+ · · · , or

E(X) = S =1

1− 12

= 2.

✷

86

� Example 5.8 (Average number ⋆ )

The followings are two examples of the same problem.

(I) From a shuffled deck, cards are laid out on a table one at a time, face up from left to right, and thenanother deck is laid out so that each of its cards is beneath a card of the first deck. What is the averagenumber of matches of the card above and the card below in repetitions of this experiment?

(II) A typist types letters and envelopes to n different persons. The letters are randomly put into theenvelopes. On the average, how many letters are put into their own envelopes?

Solution

(I) Given 52 cards in a deck, each card has 1 chance in 52 of making it paired card. Probability =1

52.

Average number of matches =

52 opportunities︷︸︸︷

52 ×( 1

52

)

= 1.

(II)

Average number of letters =

n opportunities︷︸︸︷n ×

( 1

n

)

= 1.

✷


A well-shuffled ordinary pack of 52 poker cards is divided randomly into four hands of 13 each. Countingjack, queen, and king as numbers 11, 12 and 13, respectively, we say that “a match” occurs in a hand if thej-th card is j. What is the expected value of the total number of matches in all four hands?

Solution Unlike Example 5.8, we shall present the full details (may be harder to read) of the solution tothis problem in the following. Let Xi (i = 1, 2, 3, 4) be the number of matches in the i-th hand. Then

X = X1 +X2 +X3 +X4

is the total number of matches in all four hands, and E(X) = E(X1)+E(X2)+E(X3)+E(X4) is our target.To calculate each E(Xi), we let Aij be the event that the j-th card in the i-th hand is j (1 6 i 6 4,1 6 j 6 13). Then by defining

Xij =

{1, if Aij occurs,

0, otherwise,

we have that

Xi =13∑

j=1

Xij = Xi1 +Xi2 +Xi3 + · · ·Xi,13.

Now, for each fixed pair of (i, j),

P (Aij) =4

52=

1

13implies that

E(Xij) = 1× P (Aij) + 0× P (Aij) =1

13.

Hence,

E(Xi) = E( 13∑

j=1

Xij

)

=

13∑

j=1

E(Xij) =

13∑

j=1

1

13= 1.

Thus on average there is one match in every hand. From this we finally get

E(X) = E(X1) + E(X2) + E(X3) + E(X4) = 4× 1 = 4

showing that on average there are a total of four matches in all four hands. ✷

87



Eight boys and seven girls are randomly seating in a row, say, for an example, BBGGBBGBGBGBBGG.On the average, what is the expected number of unlike adjacent pairs? What if the number of boys is b andthe number of girls is g?

Solution Take the following as an example:

BBGGBBGBGBGBBGG.

There are 9 “BG or GB” unlike adjacent pairs.

P (being BG or GB in the first two seats) =8

15× 7

14+

7

15× 8

14=

8

15.

Since there are 14 adjacent pairs in total,

E(X) = E(X1 +X2 +X3 + · · ·+X14

)

= 14× E(X1)

= 14× 8

15

= 77

15≈ 7.4667.

In general, if the number of boys is b and the number of girls is g, then

E(X) = (g + b− 1)

[gb

(g + b)(g + b− 1)+

bg

(g + b)(g + b− 1)

]

=2gb

g + b.

✷


Consider the quadratic equationAx2 +Bx+ C = 0.

The coefficients A, B and C (which are assumed to be independent) could be 1 or −1 of equal probability.Find the expected number of real roots and the corresponding variance.

Solution Denote ∆ = B2 − 4AC. Consider the following cases.

1. (A,B,C) = (1, 1, 1) =⇒ ∆ < 0 =⇒ No real root.

2. (A,B,C) = (1, 1,−1) =⇒ ∆ > 0 =⇒ Two real roots.

3. (A,B,C) = (1,−1, 1) =⇒ ∆ < 0 =⇒ No real root.

4. (A,B,C) = (−1, 1, 1) =⇒ ∆ > 0 =⇒ Two real roots.

5. (A,B,C) = (1,−1,−1) =⇒ ∆ > 0 =⇒ Two real roots.

6. (A,B,C) = (−1, 1,−1) =⇒ ∆ < 0 =⇒ No real root.

7. (A,B,C) = (−1,−1, 1) =⇒ ∆ > 0 =⇒ Two real roots.

8. (A,B,C) = (−1,−1,−1) =⇒ ∆ < 0 =⇒ No real root.

Let X denote the number of real roots.

P (X = 0) =4

8=

1

2, P (X = 2) =

4

8=

1

2.

Hence,

E(X) = 0× 1

2+ 2× 1

2= 1,

Var(X) = E(X2)−(

E(X))2

=(

02 × 1

2+ 22 × 1

2

)

− 12 = 1.

✷

88

� Example 5.12 (Mean and variance of a random variable ⋆⋆⋆ )

The random variable X follows the binomial Bin(n, p) distribution with probability mass function

f(x) = Cnx pxqn−x, x = 0, 1, 2, · · · , n, 0 < p < 1, q = 1− p.

(a) Prove that E(X) = np and Var(X) = npq.

A mathematics class in a school is divided into group A with 12 students and group B with 25 students.Both groups are given a test consisting of 16 short questions. For any student in group A, the score (thatis, the number of correct answers) is distributed as Bin(16, 0.75); for any student in group B, the score isdistributed as Bin(16, 0.5). All students answer independently.

(b) Find the probability that

(i) a given group A student gets all 16 questions right.

(ii) at least one student in group A gets all 16 questions right.

(c) Use an appropriate approximation to find the probability that a given group B student scores morethan a given group A student. Justify the approximated solution by direct calculations.

(d) Let X and Y denote the mean scores of students in group A and group B respectively. Find E(X),E(Y ), Var(X), and Var(Y ).

Solution

(a) Please refer to Lecture Notes Chapter 6 for the proofs.

(b) (i)

The required probability = P (a given group A student gets all 16 questions right)

= (0.75)16

≈ 0.010022595

≈ 0.0100.

(ii)

The required probability = P (at least one student in group A gets all 16 questions right)

= 1− P (no student gets all 16 questions right)

= 1−[

1−(0.75

)16]12

≈ 0.113857868

≈ 0.1139.

(c) Let X and Y be the respective scores of a given group A students and a given group B student. ThenX ∼ Bin(16, 0.75) and Y ∼ Bin(16, 0.5). Note that 16 × 0.75 = 12 > 5 and 16 × (1 − 0.75) = 4 ≈ 5,X follows approximately a normal distribution N(12, 3). Similarly, since 16× 0.5 = 8 > 5, Y followsapproximately N(8, 4). Now,

X − Y follows approximately N(12− 8, 3 + 4) = N(4, 7).

The required probability = P (X − Y < 0)

= P (X − Y 6 −1)

≈ P(Z 6

−1 + 0.5− 4√7

)

≈ P (Z 6 −1.70)

≈ 0.0446.

89


(d) Let Xi and Yj (where 1 6 i 6 12 and 1 6 j 6 25) be the respective scores of 12 group A students and25 group B students. By the assumption that all students answer independently, Xi’s and Yj ’s are allindependent. Denote

X =1

12

12∑

i=1

Xi, and Y =1

25

25∑

j=1

Yj .

Hence,

E(X) =1

12E( 12∑

i=1

Xi

)

=1

12

12∑

i=1

E(Xi), since Xi are independent

=1

12

12∑

i=1

16× 0.75

= 12.

Similarly,E(Y ) = 16× 0.5 = 8.

To find the variances:

Var(X) =1

122Var( 12∑

i=1

Xi

)

=1

122

12∑

i=1

Var(Xi), since Xi are independent

=1

122

12∑

i=1

16× 0.75× 0.25

=1

122× 12× 3 =

1

4.

Similarly,

Var(Y ) =1

25× 16× 0.5× 0.5 =

4

25= 0.16.

✷

Remark The approximation used in (c) may need further justifications since it fails to fulfill the conditionsn > 30 and n(1− p) > 5. In case if we do not use approximations,

P (X < Y ) =15∑

k=0

P (X = k)× P (Y > k)

=15∑

k=0

(

C16k (0.75)k(0.25)16−k

16∑

r=k+1

C16r (0.5)16

)

=1

248

15∑

k=0

(

C16k 3k ·

16∑

r=k+1

C16r

)

≈ 0.0460.

� Example 5.13 (Mean and variance of a random variable ⋆⋆ )

Of the adult population in a large city, 60% favour a new leisure centre, 30% oppose it and 10% are indifferent.A random sample of 4 adults is taken from the population and their opinions on the new centre are noted.

(a) Find the probability that

(i) all four think alike.

(ii) none of the four is opposed to the new centre.

90

(iii) all three opinions (in favour, oppose, indifferent) are represented in the sample.

(iv) all four are in favour of the new centre, if it is given that none of the four is opposed.

(b) State the expectation and variance of the number in the sample who are in favour of the new centre.

(c) In this city, one quarter of adults are classified as “young” (age < 30) and three-quarters are “older”(aged at least 30). You are told that 12% of young adults oppose the new leisure centre; deduce theproportion of older adults who are opposed.

(d) Given that the sample consists of one young adult and three older adults, find the probability thatexactly one member of the sample opposes the new centre.

Solution

(a) (i)The required probability = (0.6)4 + (0.3)4 + (0.1)4 = 0.1378.

(ii)The required probability = (1− 0.3)4 = 0.2401.

(iii)

The required probability

= (0.6)(0.3)(0.1)2 × 4!

1! 1! 2!+ (0.6)(0.1)(0.3)2 × 4!

1! 1! 2!+ (0.1)(0.3)(0.6)2 × 4!

1! 1! 2!

= 0.216.

(iv)

The required probability =(0.6)4

0.2401

=1296

2401

≈ 0.5398.

(b) Let X denote the number in the sample who are in flavour of the new centre. Then X ∼ Bin(4, 0.6).Hence,

E(X) = 4× 0.6 = 2.4,

andVar(X) = 4× 0.6× (1− 0.6) = 0.96.

(c) Let x be the proportion of older adults who oppose the new leisure centre. Then

1

4(0.12) +

3

4x = 0.3.

Solving for x gives x = 0.36. Hence, the required proportion is 0.36.

(d)The required probability = (0.12)× (1− 0.36)3 + (1− 0.12)× C3

1 (0.36)(0.64)2

=164352

390625

≈ 0.4207.

✷

91


92

Chapter 6

Joint Distribution of Two

Random Variables

� Example 6.1 (Joint distribution ⋆ )

The joint probability distribution of the random variables X and Y is summarized in the following table.

x\y 0 1 2 3 pX(x)

0 k 6k 9k 4k

1 8k 18k 12k 2k

2 k 6k 9k 4k

pY (y)

(a) Find k.

(b) Find the marginal distributions of X and Y , i.e., pX(x) and pY (y).

(c) Find the conditional distribution of X given that Y = 2.

(d) State with a reason whether or not X and Y are independent.

Solution

(a) The sum of all the entries in the table is 80k. Hence, k =1

80.

(b) Row and column sums give the marginal distributions of X and Y :

X 0 1 2

P (X) 1/4 1/2 1/4

andY 0 1 2 3

P (Y ) 1/8 3/8 3/8 1/8

(c) For P (X = x∣∣ Y = 2) =

P (X = x and Y = 2)

P (Y = 2), the conditional distribution of X given that Y = 2

is given by

X 0 1 2

Probability9/80

3/8= 9/30 = 0.3

12/80

3/8= 0.4

9/80

3/8= 0.3

93

6. Joint Distribution of Two Random Variables

(d) For independence, every individual P (X = x, Y = y) in the table must be the product of its twomarginal probabilities. However, consider x = y = 0, we have

P (X = 0, Y = 0) = k =1

80.

But

pX(0)× pY (0) =1

4× 1

8=

1

32.

So, X and Y are not independent.

✷


Two fair dice of different colors are rolled and let X be the number on the red die, Y be the number on thegreen die. Denote the new random variables U = X + Y and V = |X − Y |.(a) Find the marginal probability distributions of U and V , i.e., pU (u) and pV (v).

(b) Determine whether if U and V are independent random variables or not.

(c) Find the conditional probability of U = 8 given that V 6 3.

Solution

(a) The marginal probability distribution of U is given by

u 2 3 4 5 6 7 8 9 10 11 12

pU (u)1

36

2

36

3

36

4

36

5

36

6

36

5

36

4

36

3

36

2

36

1

36

The marginal probability distribution of V is given by

v 0 1 2 3 4 5

pV (v)6

36

10

36

8

36

6

36

4

36

2

36

(b) Since

P (U = 2) =1

36,

P (V = 5) =2

36,

P (U = 2, V = 5) = 0.

Hence,P (U = 2) · P (V = 5) 6= P (U = 2, V = 5).

Consequently, the two random variables U and V are not independent.

(c) The required conditional probability is given by

P (U = 8∣∣ V 6 3) =

P (U = 8 and V 6 3)

P (V 6 3)

=

3

366 + 10 + 8 + 6

36

= 0.1.

✷

94


Roll a balanced die and let the outcome be X. Then toss a fair coin X times and let Y denote the numberof tails. Are X and Y independent? Why? What is the joint probability density function of X and Y ?

Solution Let p(x, y) be the joint probability function of X and Y . Clearly,

X ∈ {1, 2, 3, 4, 5, 6} and Y ∈ {0, 1, 2, 3, 4, 5, 6}.

Now, if X = 1, then Y = 0 or 1, we have

p(1, 0) = P (X = 1, Y = 0) = P (X = 1)× P (Y = 0∣∣ X = 1)

=1

6× 1

2=

1

12,

p(1, 1) = P (X = 1, Y = 1) = P (X = 1)× P (Y = 1∣∣ X = 1)

=1

6× 1

2=

1

12.

If X = 2, then y = 0, 1, or 2, where

p(2, 0) = P (X = 2, Y = 0) = P (X = 2)× P (Y = 0∣∣ X = 2)

=1

6× 1

4=

1

24,

p(2, 1) = P (X = 2, Y = 1) = P (X = 2)× P (Y = 1∣∣ X = 2)

=1

6× 1

2=

1

12,

p(2, 2) = P (X = 2, Y = 2) = P (X = 2)× P (Y = 2∣∣ X = 2)

=1

6× 1

4=

1

24.

If X = 3, then y = 0, 1, 2 or 3, where

p(3, 0) = P (X = 3, Y = 0) = P (X = 3)× P (Y = 0∣∣ X = 3)

=1

6× C3

0

(1

2

)0(1

2

)3

=1

48,

p(3, 1) = P (X = 3, Y = 1) = P (X = 3)× P (Y = 1∣∣ X = 3)

=1

6× C3

1

(1

2

)1(1

2

)2

=3

48,

p(3, 2) = P (X = 3, Y = 2) = P (X = 3)× P (Y = 2∣∣ X = 3)

=1

6× C3

2

(1

2

)2(1

2

)1

=3

48,

p(3, 3) = P (X = 3, Y = 3) = P (X = 3)× P (Y = 3∣∣ X = 3)

=1

6× C3

3

(1

2

)3(1

2

)0

=1

48.

95


Similar calculations will yield the following table for p(x, y).

x\y 0 1 2 3 4 5 6 pX(x)

1 1/12 1/12 0 0 0 0 0 1/6

2 1/24 2/24 1/24 0 0 0 0 1/6

3 1/48 3/48 3/48 1/48 0 0 0 1/6

4 1/96 4/96 6/96 4/96 1/96 0 0 1/6

5 1/192 5/192 10/192 10/192 5/192 1/192 0 1/6

6 1/384 6/384 15/384 20/384 15/384 6/384 1/384 1/6

pY (y) 63/384 120/384 99/384 64/384 29/384 8/384 1/384

Note that pX(x) = P (X = x) and pY (y) = P (Y = y), the probability functions of X and Y , are obtainedby summing up the rows and the columns of this table, respectively. The two variables X and Y are clearlynot independent according to the above table. ✷


Toss a fair coin three times and let the random variable X be 0 if the outcome of the first toss is a head and1 if the outcome of the first toss is a tail. Let another random variable Y denote the number of heads. Whatis the joint probability density function of X and Y ? Are X and Y independent? Why?

Solution Let p(x, y) be the joint probability function ofX and Y . Clearly, X ∈ {0, 1} and Y ∈ {0, 1, 2, 3}.Now, if X = 0, then Y = 1, 2 or 3, we have

p(0, 1) = P (X = 0, Y = 1) = P(

{HTT})

=1

8,

p(0, 2) = P (X = 0, Y = 2) = P(

{HTH,HHT})

=2

8,

p(0, 3) = P (X = 0, Y = 3) = P(

{HHH})

=1

8.

If X = 1, then Y = 0, 1 or 2, where

p(1, 0) = P (X = 1, Y = 0) = P(

{TTT})

=1

8,

p(1, 1) = P (X = 1, Y = 1) = P(

{TTH,THT})

=2

8,

p(1, 2) = P (X = 1, Y = 2) = P(

{THH})

=1

8.

The above calculations will yield the following table for p(x, y).

x\y 0 1 2 3 pX(x)

0 0 1/8 2/8 1/8 1/2

1 1/8 2/8 1/8 0 1/2

pY (y) 1/8 3/8 3/8 1/8

It is obvious that for example,p(0, 0) 6= pX(0)× pY (0).

The two variables X and Y are not independent. ✷

96

� Example 6.5 (Joint distribution ⋆⋆ )

Two balls are drawn from an urn containing one yellow, two red and three blue balls. Let X be the number ofred balls and Y be the number of blue balls drawn. Find the joint distribution and the marginal distributionsof X and Y . Are X and Y independent? Why? Given that exactly one of the drawn balls is known to bered, use the joint distribution to find the probability that the other drawn ball is blue.

Solution The joint distribution of X and Y can be expressed as the following table for p(x, y).

x\y 0 1 2 pX(x)

0 0 1/5 1/5 2/5

1 2/15 2/5 0 8/15

2 1/15 0 0 1/15

pY (y) 1/5 3/5 1/5

Note that the marginal distributions of X and Y are given by pX(x) = P (X = x) and pY (y) = P (Y = y),respectively. These probability functions of X and Y , are obtained by summing up the rows and the columnsof this table, respectively. The variables X and Y are not independent because

p(0, 0) = 0 6= pX(0)× pY (0).

Finally we need to find the conditional probability

P (one is blue∣∣ one is red) =

p(1, 1)

pX(1)

=2

5× 15

8=

3

4= 0.75.

✷


Consider an experiment that consists of two tosses of a fair die. Denote X be the number of 4 ’s and Y bethe number of 5 ’s obtained in the two tosses of the die.

(a) Find the joint probability distribution of X and Y . Find also the marginal distributions of the randomvariables.

(b) Are X and Y independent?

(c) Find P(

(X,Y ) ∈ A)

, where A is the region{

(x, y) such that x+ 2y < 3}

.

Solution

(a) The joint distribution of X and Y can be expressed as the following table for p(x, y).

x\y 0 1 2 pX(x)

0 16/36 8/36 1/36 25/36

1 8/36 2/36 0 10/36

2 1/36 0 0 1/36

pY (y) 25/36 10/36 1/36

The marginal distributions of X and Y are given by pX(x) = P (X = x) and pY (y) = P (Y = y),respectively. These probability functions of X and Y , are obtained by summing up the rows and thecolumns of this table, respectively.

(b) The random variables X and Y are not independent because (for example)

P (X = 2, Y = 2) = 0 6= P (X = 2)× P (Y = 2).

97


(c)

P(

(X,Y ) ∈ A)

= P (X = 0, Y = 0) + P (X = 1, Y = 0) + P (X = 2, Y = 0)

+P (X = 0, Y = 1)

=16

36+

8

36+

1

36+

8

36

=33

36=

11

12.

✷

� Example 6.7 (Joint distribution ⋆⋆⋆ )

The table below shows the joint distribution of two random variables X and Y .

x\y 1 2 3 4

1 6k 3k 2k 4k

2 4k 2k 4k 0

3 2k k 0 2k

(a) Calculate the expectation E(X).

(b) New random variables U and V are defined by

U =

{

1, if X = 1 or 3,

0, if X = 2,and V =

{

1, if Y = 1 or 3,

0, if Y = 2 or 4.

Write down the joint distribution of U and V in a table and state with a reason whether or not Uand V are independent.

Solution

(a) The sum of all the entries in the table is 30k. Hence, k =1

30. The marginal probabilities of X are

given by

P (X = 1) = 15k =1

2, P (X = 2) = 10k =

1

3, P (X = 3) = 5k =

1

6.

Therefore,

E(X) = 1× 1

2+ 2× 1

3+ 3× 1

6=

5

3.

(b) The table below shows the joint distribution of the new random variables U and V .

u\v 0 1

0 2k =1

158k =

4

1510k =

1

3

1 10k =1

310k =

1

320k =

2

3

12k =2

518k =

3

5

98

Note that

P (U = 0, V = 0) =1

15,

P (U = 0)× P (V = 0) =1

3× 2

5=

2

156= P (U = 0, V = 0).

Thus, the new random variables U and V are not independent.

✷


Suppose X and Y are independent random variables having Poisson distributions with respective means λand µ, where λ, µ > 0.

(a) Show that X + Y also follows a Poisson distribution.

(b) Find P (X = k∣∣ X +Y = n) when k and n are integers with 0 6 k 6 n. For given fixed n > 0, name

the distribution you have obtained.

(c) Telephone calls arriving at a computer helpline are classed as urgent or standard; urgent calls average8 per hour, standard calls average 24 per hour. Ten calls arrive within 30 minutes; find (to twosignificant figures) the probability that at most two of them are urgent, stating any assumptions youmake.

Solution

(a) As an exercise.

(b) As an exercise. Answer: P (X = k∣

∣ X + Y = n) = Cnk pk(1− p)n−k, where p =

λ

λ+ µ.

(c) As an exercise. Answer: 0.5256.

✷


Jane chooses a number X at random from the set of numbers {1, 2, 3, 4}, so that

P (X = k) =1

4, for k = 1, 2, 3, 4.

She then chooses a number Y at random from the subset of numbers {X, · · · , 4}; for example, if X = 3,then Y is chosen at random from {3, 4}.(a) Find the joint probability distribution of X and Y and display it in the form of a two-way table.

(b) Find the marginal probability distribution of Y , and hence find E(Y ) and Var(Y ).

(c) Find the probability distribution of U = X + Y .

Solution

(a) As an exercise. Answer: P (X = x, Y = y) = P (Y = y∣

∣ X = x)× P (X = x).

(b) As an exercise. Answer: E(Y ) =13

4, Var(Y ) =

41

48.

(c) As an exercise. Answer: P (U = 2) = P (U = 3) =1

16, P (U = 4) = P (U = 5) =

7

48, P (U = 6) =

5

24,

P (U = 7) =1

8, P (U = 8) =

1

4.

✷

99



Two tennis players, A and B, are playing a match. Let X be the number of serves faster than 125 mphserved by A in one of his service games and let Y be the number of these serves returned by B. The followingprobability model is proposed:

P (X = 0) = 0.4, P (X = 1) = 0.3, P (X = 2) = 0.2 and P (X = 3) = 0.1.

The conditional distribution of Y (given that X = x > 0) is binomial with parameters x and 0.4, andP (Y = 0

∣∣ X = 0) = 1. Assume that this model is correct when answering the following questions.

(a) Find the joint probability distribution of X and Y and display it in the form of a two-way table.

(b) Find the marginal distribution of Y and evaluate E(Y ).

(c) Use your joint probability distribution table to find the probability distribution of the number of servesfaster than 125 mph that are not returned by B in a game.

Solution

(a) As an exercise. Answer: P (0, 0) = 0.4, P (1, 0) = 0.18, P (1, 1) = 0.12, P (2, 0) = 0.072,

P (2, 1) = 0.096, P (2, 2) = 0.032, P (3, 0) = 0.0216, P (3, 1) = 0.0432, P (3, 2) = 0.0288, P (3, 3) = 0.0064.

(b) As an exercise. Answer: E(Y ) = 0.4.

(c) As an exercise. Answer: U = X − Y . P (U = 0) = 0.5584, P (U = 1) = 0.3048,

P (U = 2) = 0.1152, P (U = 3) = 0.0216.

✷


The joint probability distribution of the random variables X and Y is summarized in the following table.

x\y −1 1 2 pX(x)

1 6c 12c 6c

2 3c 6c 3c

3 3c 6c 3c

pY (y)

(a) Find the value of c.

(b) Find the marginal distributions of X and Y , i.e., pX(x) and pY (y).

(c) Are the random variables X and Y independent? State with a reason.

(d) Find the conditional distribution of X given that Y = −1.

Solution

(a) The sum of all the entries in the table is 48c. Hence, c =1

48.

(b) Row and column sums give the marginal distributions of X and Y :

X 1 2 3

P (X) 1/2 1/4 1/4

andY −1 1 2

P (Y ) 1/4 1/2 1/4

100

(c) For independence, every individual P (X = x, Y = y) in the table must be the product of its twomarginal probabilities.

P (X = x, Y = y) = P (X = x) · P (Y = y), for all x, y.

By definition, X and Y are independent.

(d) For

P (X = x∣∣ Y = −1) =

P (X = x and Y = −1)

P (Y = −1),

the conditional distribution of X given that Y = −1 is given by

X 1 2 3

P (X = x, Y = −1)6c

12c= 1/2

3c

12c= 1/4

3c

12c= 1/4

✷


Assume that k is a certain constant. The joint probability density function of (X,Y ) is given by

x\y −3 −2 2 pX(x)

0 20k 10k 10k

1 10k 5k 15k

2 10k 15k 5k

pY (y)

(a) Evaluate the constant k. Find the probability density function of Y and hence the expectation of Y ,i.e., E(Y ).

(b) Are the random variables X and Y independent? Why?

(c) Find the probability P (X + Y < 0).

Solution

(a) By given, 20k + 10k + 10k + 10k + 5k + 15k + 10k + 15k + 5k = 100k = 1 =⇒ k = 0.01. Hence,

pY (y) =

40k = 0.4 when y = −3,

30k = 0.3 when y = −2,

30k = 0.3 when y = 2.

E(Y ) =∑

y × pY (y) = (−3)× 0.4 + (−2)× 0.3 + (2)× 0.3 = −1.2.

(b) Note that p(0,−3) = 0.2, pX(0) = 0.4 and pY (−3) = 0.4. Thus, X and Y are not independent because

p(0,−3) 6= pX(0)× pY (−3).

(c)P (X + Y < 0) = P (X = 0, Y = −3) + P (X = 0, Y = −2) + P (X = 1, Y = −3)

+P (X = 1, Y = −2) + P (X = 2, Y = −3)

= 0.2 + 0.1 + 0.1 + 0.05 + 0.1

= 0.55.✷

101


102

Chapter 7

Sampling Distributions

� Example 7.1 (Sampling distribution of mean ⋆ )

A population has a mean of 200 and a standard deviation of 50. Suppose a random sample of size 100 isselected and x is used to estimate µ.

(a) What is the probability that the sample mean will be within ±5 of the population mean?

(b) What is the probability that the sample mean will be within ±10 of the population mean?

Solution

(a) The sampling distribution is normal with

E(x) = µ = 200,

σx =σ√n

=50√100

= 5.

For ±5, (x− µ) = 5,

z =x− µ

σx=

5

5= 1.

The required probability is given by the area under standard normal curve from z = −1 to z = 1:

2× (0.8413− 0.5) = 0.6826.

The probability that the sample mean will be within ±5 of the population mean is 68.26%.

(b) For ±10, (x− µ) = 10,

z =x− µ

σx=

10

5= 2.

The required probability is given by the area under standard normal curve from z = −2 to z = 2:

2× (0.9772− 0.5) = 0.9544.

The probability that the sample mean will be within ±5 of the population mean is 95.44%.

✷

103

7. Sampling Distributions

104

Chapter 8

Estimation and Confidence

Interval

� Example 8.1 (Estimation of the mean ⋆ )

The mean annual starting salary for marketing majors is $34,000 (Time, May 8, 2000). Assume that forthe population of graduates with a marketing major, the mean annual starting salary is $34,000, and thestandard deviation is σ = 2,000.

(a) What is the probability that a random sample of marketing majors will have a sample mean within±$250 of the population mean for each of the following sample sizes: 30, 50, 100, 200 and 400?

(b) What is the advantage of a larger sample size when attempting to estimate the population mean?

Solution

(a) z =x− µ

σ/√n

=x− 34000

2000/√n, where error = x− 34000 = 250.

n = 30, z =250

2000/√30

≈ 0.68, 2× (0.7517− 0.5) = 0.5034;

n = 50, z =250

2000/√50

≈ 0.88, 2× (0.8106− 0.5) = 0.6212;

n = 100, z =250

2000/√100

≈ 1.25, 2× (0.8944− 0.5) = 0.7888;

n = 200, z =250

2000/√200

≈ 1.77, 2× (0.9616− 0.5) = 0.9232;

n = 400, z =250

2000/√400

≈ 2.50, 2× (0.9938− 0.5) = 0.9876.

(b) A larger sample increases the probability that the sample mean will be within a specified distancefrom the population mean. In the salary example, the probability of being within ±250 of µ rangesfrom 0.5034 for a sample of size 30 to 0.9876 for a sample of size 400.

✷

� Example 8.2 (Confidence interval ⋆ )

An electrical firm manufactures light bulbs that have a length of life that is approximately normally distributedwith at standard deviation of 40 hours. If a sample of 30 bulbs has an average life of 780 hours, find a 96%confidence interval for the population mean of all bulbs produced by this firm.

105

8. Estimation and Confidence Interval

Solution n = 30, x = 780, σ = 40, α = 0.04. Since σ2 (the population variance) is known, we evaluatethe critical value by the Standard Normal Table:

zα/2 = z0.02 ≈ 2.055.

A 96% confidence interval for µ is therefore given by

x± zα/2 ·σ√n

≈ 780± (2.055)40√30

≈ 780± 15.0076 =(

764.9924, 795.0076)

.

✷


A machine is producing metal pieces that are cylindrical in shape. A sample of pieces is taken and thediameters are 1.01, 0.97, 1.03, 1.04, 0.99, 0.98, 0.99, 1.01 and 1.03 centimeters. Find a 99% confidenceinterval for the mean diameter of pieces from this machine, assuming an approximate normal distribution.

Solution n = 9, ν = degrees of freedom = n− 1 = 8, α = 0.01, and by calculations,

x =1.01 + 0.97 + · · ·+ 1.03

9≈ 1.00556, s ≈ 0.02455.

Since σ2 (the population variance) is unknown as well as n is small, we evaluate the critical t-value by theStudent’s t-Table:

t(α/2, ν) = t(0.005,8) ≈ 3.355.

A 99% confidence interval for µ is therefore given by

x± t(α/2, ν) ·s√n

≈ 1.00556± (3.355)0.02455√

9

≈ 1.00556± 0.02745 ≈(

0.9781, 1.0330)

.

✷


Sales personnel for Skillings Distributors submit weekly reports listing the customer contacts made during theweek. A sample of 61 weekly reports showed a sample mean of 19.5 customer contacts per week. The samplestandard deviation was 5.2. Provide 90% and 95% confidence intervals for the population mean number ofweekly customer contacts for the sales personnel.

Solution n = 61 is large enough despite that the population variance is unknown, we evaluate the criticalvalue by the Standard Normal Table and will use s to approximate σ.

A 90% confidence interval for µ is given by x± zα/2 ×s√n, where α/2 = 0.1/2 = 0.05. Hence,

19.5± 1.645× 5.2√61

or 19.5± 1.0952 or(

18.4048, 20.5952)

.

A 95% confidence interval for µ is given by x± zα/2 ×s√n, where α/2 = 0.05/2 = 0.025. Hence,

19.5± 1.96× 5.2√61

or 19.5± 1.3050 or(

18.195, 20.805)

.

✷

106


The American Association of Advertising Agencies records data on nonprogram minutes on half-hour, prime-time television shows. Representative data in minutes for a sample of 20 prime-time shows on major networksat 8:30pm follow.

6.0 7.0 7.2 7.0 6.0 7.3 6.0 6.6 6.3 5.7

6.5 6.5 7.6 6.2 5.8 6.2 6.4 6.2 7.2 6.8

Assume a normal population and provide a point estimate and a 95% confidence interval for the mean numberof nonprogram minutes on half-hour, prime-time television shows at 8:30pm.

Solution By using calculator,

x ≈ 6.53 minutes, s ≈ 0.54 minutes.

Since σ2 is unknown and n = 20 is not large enough, we use the Student’s t Table. A 95% confidence interval

for µ is given by x± t(α/2,ν) ×s√n, where α/2 = 0.05/2 = 0.025, ν = n− 1 = 19. Hence,

6.53± 2.093× 0.54√20

or 6.53± 0.25 or(

6.28, 6.78)

.

✷


A machine is producing metal pieces that are cylindrical in shape. A sample of pieces is taken and thediameters are 1.01, 0.97, 1.03, 1.04, 0.99, 0.98, 0.99, 1.01 and 1.03 centimeters.

(a) Find a 90% confidence interval for the mean diameter of pieces from this machine (assume that thepopulation has an approximate normal distribution).

(b) Is this sample size large enough to be 90% confident that the estimating error of the mean diameter ofpieces from this machine is less than 0.03 and why?

(c) Are the mean diameter of all pieces from this machine certainly lie in the confidence intervals in (b)?

Solution

(a) x ≈ 1.0056, s ≈ 0.002456 and n = 9. Since σ2 is unknown, we use the Student’s t Table, we havet(8,0.05) ≈ 1.86. A 90% confidence interval is given by

x± t(8,0.05)s√n

≈(

1.0056− 1.86× 0.02456√9

, 1.0056 + 1.86× 0.02456√9

)

≈(

0.99037, 1.02083)

.

(b) Note that the length of the 90% confidence interval is 1.02083 − 0.99037 = 0.03046. So, this samplesize is large enough to be 90% confident that the estimating error of the mean diameter of pieces fromthis machine is less than 0.03. The estimating error is µ− x < 0.03.

(c) No, it is not certain but only 90% confident.

✷


Students may choose between a 3-semester-hour course in physics without labs and a 4-semester-hour coursewith labs. The final written examination is the same for each section. If 12 students in the section withlabs made an average examination grade of 84 with a standard deviation of 4, and 18 students in the sectionwithout labs made an average grade of 77 with a standard deviation of 6, find a 99% confidence interval forthe difference between the average grades for the two courses. Assume the populations to be approximatelynormally distributed with equal variances.

107

8. Estimation and Confidence Interval

Solution

With labs: n1 = 12, x1 = 84, s1 = 4. Without labs: n2 = 18, x2 = 77, s2 = 6.

Note also that

ν = degrees of freedom = n1 + n2 − 2 = 12 + 18− 2 = 28, α = 0.01.

Given that σ21 = σ2

2 , in which σ21 and σ2

2 are unknown population variances. Since the populationvariances are unknown, we evaluate the critical t-value by the Student’s t-Table:

t(α/2, ν) = t(0.005,28) = 2.763.

The pooled variance and the corresponding standard deviation are given by

s2p =(n1 − 1) s21 + (n2 − 1) s22

n1 + n2 − 2=

(11)(42) + (17)(62)

28≈ 28.1429, sp ≈ 5.3050.

A 99% confidence interval for µ1 − µ2 is therefore given by

(x1 − x2)± t(α/2, ν) · sp ·√

1

n1+

1

n2≈ (84− 77)± (2.763) (5.3050)

√

1

12+

1

18

≈ 7± 5.4626

=(

1.5374, 12.4626)

.

✷

� Example 8.8 (Confidence interval ⋆⋆ )

Samples of girls aged 6 and 7 are given a music aptitude test, with the results following (i.e., the sample meanscores and the sample standard deviations). Assume that the population variances σ2

1 and σ22 are unknown

and equal.Age 7 Age 6

n1 = 16 n2 = 12x1 = 44.0 x2 = 27.5s1 = 13.2 s2 = 10.2

(a) Construct a 99% confidence interval estimate of µ1 − µ2.

(b) Does the interval contain 0? What is the implication if, for instance, the interval does not contain 0?

Solution

(a) By calculations, the pooled variance is given by

s2p =(n1 − 1) s21 + (n2 − 1) s22

n1 + n2 − 2=

(15)(13.2)2 + (11)(10.2)2

26≈ 144.54.

ν = 16 + 12− 2 = 26, tα2; ν = t0.005; 26 ≈ 2.779.

A 99% confidence interval for the difference of the population means is

(

(x1 − x2)− tα2; ν sp

√1

n1+

1

n2, (x1 − x2) + tα

2; ν sp

√1

n1+

1

n2

)

≈(

16.5− 12.759, 16.5 + 12.759)

=(

3.741, 29.259)

.

That is, we are 99% confident that the difference between the mean score for 7-year-old girls and themean score for 6-year-old girls is between 3.741 and 29.259.

(b) This confidence interval does not contain 0, suggesting we are highly confident that the differencebetween the two means, µ1 − µ2, is significant and positive.

✷

108

Chapter 9

Hypothesis Testing

� Example 9.1 (Hypothesis testing, one-tailed ⋆ )

An university has been using brand A laser printers for several years and is considering the feasibility ofswitching to brand B. Salesman for brand B claims that their printers is better than brand A in terms oflower monthly cost. Experience over several years has shown that brand A laser printers has a mean costof 750 dollars per month. From brand B, 120 of their laser printers, purchased from regular retail sources,were tested. This sample yields the values x = 740 and s = 60.

(a) What is your opinion on the claim of the salesman for brand B at 0.05 level of significance?

(b) If we reduce the level of significance from 0.05 to 0.001, are you going to change your opinion in (a)?

Solution

(a) n = 120, x = 740, s = 60, α = 0.05. Let µ be the mean cost of the laser printers from brand B.{

H0 : µ = 750,

H1 : µ < 750.

σ2 (the population variance) is unknown but however the sample size is large (n = 120), we evaluatethe critical value by the Standard Normal table:

Critical value (one-tailed) = z0.05 ≈ 1.645.

The test statistic is given by

z =x− µ0

s/√n

=740− 750

60/√120

≈ −1.826,

the test statistic fall in the rejection region.

∴ Reject H0 =⇒ The laser printer’s cost from brand B is significantly lower than 750 dollars.

(b) n = 120, x = 740, s = 60, α = 0.001. Let µ be the mean cost of the laser printers from brand B.{

H0 : µ = 750,

H1 : µ < 750.



As just calculated before, the test statistic is z ≈ −1.826, the test statistic fall in the non-rejectionregion.

∴ Cannot reject H0 =⇒ The laser printer’s cost from brand B is not significantly lower than750 dollars.

✷

109

9. Hypothesis Testing


A random sample, of size 500, of the response time, during the peak period, of the online inquiry system ata local bank yields a sample mean of x of 21 time units with a sample standard deviation s of 12 time units.A primary performance requirement for the system is that the mean response time for the peak period shouldnot exceed 20 time units; if the system usage increases to the extent that this criterion is violated, changesare to be made to the system to bring the performance to this standard.

(a) At the 1% level of significance, test whether or not the performance criterion has been maintained.

(b) If we change the level of significance from 1% to 5%, are you going to change your opinion in (a)?

(c) Find the p-value of the test. State the p-value criterion for hypothesis testing at the level of significance,α, in general.

Solution

(a) n = 500, x = 21, s = 12, α = 0.01. Let µ be the mean response time units for the peak period.

{

H0 : µ 6 20,

H1 : µ > 20.




z =x− µ0

s/√n

=21− 20

12/√500

≈ 1.863,

the test statistic falls in the non-rejection region.

∴ Not to reject H0 at the 0.01 level of significance =⇒ we would make no change to the existingonline system based on the test.

(b) Repeat the test in (a) with the 0.05 level of significance. We evaluate the critical value by the StandardNormal table:


Based on the same data sample, as we just calculated before, the test statistic is z ≈ 1.863, the teststatistic falls in the rejection region.

∴ Reject H0 =⇒ changes would be required to the existing online system.

(c) Calculate the p-value. The > sign in the alternative hypothesis indicates that the test is one-tailed(right-tailed). The p-value is equal to the area in the right tail of the sampling distribution curve ofx to the right of x = 21. To find this area, we need to know the z-value for x = 21 for which wehave just found it: z = 1.863. Hence,

p-value = P(

Z > 1.863)

≈ 1− 0.9688 = 0.0312.

In general, the p-value criterion for hypothesis testing is:

Reject the null hypothesis H0 if the p-value < α.

Not to reject the null hypothesis H0 if the p-value > α.

✷

110

� Example 9.3 (Hypothesis testing, two-tailed ⋆ )

A city has been purchasing brand A light bulbs for several years but is contemplating switching to brand Bbecause of a better price. Salesman for brand B claims that their product is just as good as brand A.Experience over several years has shown that brand A bulbs has a mean life of 1160 hours. From brand B,100 of their bulbs, purchased from regular retail sources, were tested. This sample yields the values x = 1140and s = 80.

(a) What is your opinion on the claim of the salesman for brand B at 0.05 level of significance?

(b) If we reduce the level of significance from 0.05 to 0.001, are you going to change your opinion in (a)?

(c) Are there any possible mistakes on your opinions in (a)?

Solution

(a) n = 100, x = 1140, s = 80, α = 0.05. Let µ be the mean life of the bulbs from brand B.{

H0 : µ = 1160,

H1 : µ 6= 1160.

σ2 (the population variance) is unknown but however the sample size is large (n = 100), we evaluatethe critical value by the Standard Normal Table:

Critical value (two-tailed) = z0.025 ≈ 1.96.


z =x− µ0

s/√n

=1140− 1160

80/√100

= −2.5,

the test statistic fall in the rejection region.

∴ Reject H0 =⇒ The bulb’s life from brand B is significantly different from 1160 hours.

Since the test statistic calculated above is negative, if we change the alternative hypothesis to{

H0 : µ = 1160,

H1 : µ < 1160.

z = −2.5 < zcritical = −z0.05 ≈ −1.645 which still fall in the rejection region. Therefore we can stillreject H0 and hence the bulb’s life from brand B is significantly shorter than 1160 hours.

(b) n = 100, x = 1140, s = 80, α = 0.001. Let µ be the mean life of the bulbs from brand B.{

H0 : µ = 1160,

H1 : µ < 1160.

σ2 (the population variance) is unknown but however the sample size is large (n = 100), we evaluatethe critical value by the Standard Normal Table:



z =x− µ0

s/√n

=1140− 1160

80/√100

= −2.5,

the test statistic fall in the non-rejection region.

∴ Cannot reject H0 =⇒ The bulb’s life from brand B is not significantly shorter than 1160 hours.

(c) Yes. There is Type I error, probably.

✷

111



The average time it takes a clerk to serve a customer at a service counter is 3 minutes with a standarddeviation of 1 minute; the service time is normally distributed. To test the feasibility of installing an onlinecomputer system to provide better customer service with fewer clerks, a representative clerk is randomlyselected and is trained to use such a system. A specific clerk, using the online system, processes 16 customers,yielding a sample mean of 2 minutes with a sample standard deviation of 40 seconds. The managementofficials are convinced to implement the proposed online system unless the actual mean clerk service timeexceeds 100 seconds.

(a) At the 1% level of significance does the test indicate that the online system should be implemented?

(b) If we change the level of significance from 1% to 5%, are you going to change your opinion in (a)?

(c) Are there any possible errors on the test result in (a) and explain briefly?

Solution

(a) We assume the service time of the clerk remains normally distributed. n = 16, x = 120, s = 40,α = 0.01. Let µ be the population mean service time of clerk.

{

H0 : µ = 100,

H1 : µ > 100.

σ2 (the population variance) is unknown and the sample size is small (n = 16), we evaluate thecritical value by the Student’s t distribution:

Critical value (right-tailed) = t(α, n−1) = t(0.01,15) ≈ 2.602.


t =x− µ0

s/√n

=120− 100

40/√16

= 2 < 2.602,

the test statistic falls in the non-rejection region.

∴ Fail to reject H0 at the 1% level of significance =⇒ it is not evident that the online systemshould not be implemented based on the sample =⇒ the online system should be implementedaccordingly.

(b) Repeat the test in (a) with the 0.05 level of significance. We evaluate the critical value by the t-table:

Critical value (right-tailed) = t(0.05,15) ≈ 1.753.

Based on the same data sample, as we just calculated before, the test statistic is t = 2, the teststatistic falls in the rejection region.

∴ Can reject H0 at the 5% level of significance =⇒ it is evident that the online system shouldnot be implemented based on the sample =⇒ the online system should not be implementedaccordingly.

(c) Yes. There is Type II error, probably. That is to say, based on the given sample, it is possible thatthe result (i.e., failure to reject H0) is an error because the null hypothesis is indeed false. Of coursethe management would be wise to repeat the sampling with other clerks and more customers to testthe feasibility of installing the online system.

✷

112

� Example 9.5 (Hypothesis testing: two populations ⋆⋆ )

It is claimed that a new marketing campaign is effective in increasing the sales of the retail shops of acorporation. The following data are collected concerning the daily sales, in thousand dollars, of 6 randomlyselected retail shops of the corporation both before and after the marketing campaign is launched.

Retail shop 1 2 3 4 5 6

Before 13.9 14.7 18.2 17.5 14.9 16.6

After 14.8 15.3 17.4 16.7 15.2 17.4

(a) Assume the two distributions to be normally distributed with equal variances. Test the hypothesis, atthe 5% level of significance.

(b) Construct a 95% confidence interval for the difference of the sales.

Solution

(a) Before: n1 = 6, x1 ≈ 15.9667, s1 ≈ 1.7178. After: n2 = 6, x2 ≈ 16.1333, s2 ≈ 1.1725. ν =degrees of freedom = n1 +n2 − 2 = 6+6− 2 = 10, α = 0.05. Also given that σ2

1 = σ22 , in which σ2

1

and σ22 are unknown population variances. Let µ1, µ2 be the average daily sales of the 2 groups

(before and after the campaign), respectively.{

H0 : µ1 − µ2 = 0,

H1 : µ1 − µ2 < 0

Since σ21,2 (the population variances) are unknown, we use the t-test.

Critical value(one-tailed) = tcritical = t(α, ν) = t(0.05, 10) ≈ 1.812.

The pooled variance and the corresponding S.D. are given by

s2p =(n1 − 1) s21 + (n2 − 1) s22

n1 + n2 − 2≈ (5)(1.71782) + (5)(1.17252)

10≈ 2.1628,

sp ≈√2.1628 ≈ 1.4706.

Now the test statistic under the t-test is given by

t =x1 − x2

sp√

1n1

+ 1n2

≈ 15.9667− 16.1333

1.4706√

16+ 1

6

≈ −0.1962,

|t| ≈ 0.1962 < tcritical ≈ 1.812

which means that it falls in the non-rejection region.

∴ Cannot reject H0 =⇒ Based on the sample data, there is no evidence to show that the newmarketing campaign is effective.

(b) A 95% confidence interval for µ1 − µ2 is given by

(x1 − x2)± t(0.025,10) sp

√1

n1+

1

n2

≈ (15.9667− 16.1333)± 2.228

√

2.1628(1

6+

1

6

)

≈ −0.1666± 1.8917

=(

− 2.0583, 1.7251)

.

Note that the confidence interval contains the zero.

✷

113


� Example 9.6 (Hypothesis testing: two populations ⋆ )

A manufacturer claims that the average tensile strength of thread A exceeds the average tensile strength ofthread B by at least 12 kilograms. To test his claim, 50 pieces of each type of thread are tested under similarconditions. Type A thread had an average tensile strength of 86.7 kilograms with a standard deviation of 6.28kilograms, while type B thread had an average tensile strength of 77.8 kilograms with a standard deviation of5.61 kilograms. Test the manufacturer’s claim using a 0.05 level of significance.

Solution xA = 86.7, sA = 6.28 and nA = 50; xB = 77.8, sB = 5.61 and nB = 50.

{H0 : µA − µB = 12,

H1 : µA − µB 6= 12 (or µA − µB < 12).

xA − xB = 8.9.

sp =

√

49× 6.282 + 49× 5.612

49 + 49≈ 5.9544 and z ≈ 8.9− 12

5.9544×√

1

50+

1

50

≈ −2.60312.

For α = 0.05z0.025 ≈ 1.96 (two-tailed), z0.05 ≈ 1.65 (one-tailed).

We can reject H0 in both tests because |z| > |zc|.✷

� Example 9.7 (Hypothesis testing: two populations ⋆ )

A pilot study was conducted to determine the effectiveness of a fed supplement for cattle. 20 steers arerandomly divided into a group of 10 steers to be fed the supplement and another group of 10 steers to befed exactly the same diet except for the supplement. The weight (in lbs) over a period of 6 months were asfollows:

Supplement 330 470 540 590 600 610 730 730 740 750

Control 390 400 410 440 470 530 600 630 710 780

(a) Assume the population variance of the weight gains for the two groups are the same. Test if the meanweight gains per steer of the two groups over a period of 6 months are the same at 5% significancelevel.

(b) Construct a 95% confidence interval for the effect of the supplement.

Solution

(a) Let µs be the mean of weight gain of the supplement group and µc be the weight gain of the controlgroup. Also, let xs be the sample mean of the supplement group and xc be the sample mean ofthe control group.

From the data, xs = 609, ss ≈ 136.906; xc = 536, sc ≈ 138.259. Then

s2p = pooled variance ≈ 9(136.906)2 + 9(138.259)2

9 + 9≈ 18929.402.

{H0 : µs − µc = 0,

H1 : µs − µc 6= 0, α = 0.05.

Since the population variance σ2s (= σ2

c ) is unknown, we use the t-test. The test statistic is

t =xs − xc

sp

√110

+ 110

≈ 609− 536√

18929.402× 15

≈ 1.1864.

114

Besides, tcritical = t(0.025,18) ≈ 2.101. Since

t ≈ 1.1864 < tcritical ≈ 2.101,

the test statistic falls in the non-rejection region and hence H0 cannot be rejected. In conclusion,based on the sample data, there is no significant difference between the two groups in the weight gainsof the steers.

(b) 95% confidence interval for µs − µc is given by

(xs − xc)± t(0.025,18) sp

√1

n1+

1

n2

≈ (609− 536)± 2.101

√

18929.402 (1

10+

1

10)

≈(

− 56.273, 202.273)

.

✷

� Example 9.8 (Hypothesis testing: two populations ⋆⋆ )

To find out a new serum will arrest leukemia (i.e. blood cancer), 15 mice, all with an advanced stage of thedisease, are selected. 8 mice receive the treatment and 7 do not. Survival times, in years, from the time theexperiment commenced are as follows:

Treatment 2.1 5.3 1.4 4.6 0.9 2.0 3.3 1.2

No Treatment 1.9 0.5 2.8 2.2 3.1 0.9 2.6

(a) At the 0.05 level of significance can the serum be said to be effective? Assume the two distributions tobe normally distributed with equal variances.

(b) Construct a 95% confidence interval for the difference between the means of the survival times withand without the treatment.

(c) Explain briefly the relationship between the hypothesis testing and the confidence interval.

Solution

(a) Let µT be the mean of survival times of the group with treatment and µN be the mean of survivaltimes of the group without treatment. Also, let xT be the sample mean of the group with treatmentand xN be the sample mean of the group without treatment.

From the data, xT = 2.6, sT ≈ 1.6336, nT = 8; xN = 2, sN ≈ 0.9764, nN = 7. Then

s2p = pooled variance ≈ 7(1.6336)2 + 6(0.9764)2

7 + 6≈ 1.8770.

{H0 : µT − µN = 0,

H1 : µT − µN > 0, α = 0.05.

Since the population variance σ2T (= σ2

N ) is unknown, we use the t-test. The test statistic is

t =xT − xN

sp

√18+ 1

7

≈ 2.6− 2√

1.8770× 1556

≈ 0.8462.

Besides, tcritical = t(0.05,13) ≈ 1.771. Since

t ≈ 0.8462 < tcritical ≈ 1.771,

115


the test statistic falls in the non-rejection region and hence H0 cannot be rejected. In conclusion,based on the sample data, there is no sufficient evidence that the serum is effective.

(b) 95% confidence interval for µT − µN is given by

(xT − xN )± t(0.025,13) sp

√1

nT+

1

nN

≈ (2.6− 2)± 2.160

√

1.8770 (1

8+

1

7)

≈(

− 0.9316, 2.1316)

.

(c) When a 95% confidence interval is constructed, all values in the interval are considered as plausiblevalues for the parameter of the difference of means (i.e., µT −µN ) being estimated. Values outside theinterval are rejected as relatively implausible. Since zero, the value specified by the null hypothesis,is in the confidence interval, the null hypothesis of no difference between the means of survival timeswith and without the treatment cannot be rejected at the 0.05 level of significance since zero is oneof the plausible value of µT − µN . In more details, the interval contains both positive and negativenumbers and therefore µT may be larger or smaller than µN . None of the three possible relationshipsbetween µT and µN : µT − µN = 0, µT − µN > 0, and µT − µN < 0 can be ruled out. The dataare inconclusive.

✷

116

Chapter 10

Simple Linear Regression

� Example 10.1 (Linear regression ⋆ )

The relationship between energy consumption and household income was studied, yielding the following dataon household income x (in units of $1,000 year) and energy consumption y (in units of 108 Btu/year).

x 20 25 31 37 40 49 55 60 66 75 88 95

y 13 11 12 7 6 8 8 11 17 17 18 21

(a) Fit the linear regression line of y on x.

(b) Compute the sample correlation coefficient.

Solution

(a) By using calculator, the regression line of y on x is given by

y = a+ bx ≈ 4.8874 + 0.1410x.

(b) By using calculator, the sample correlation coefficient is

r ≈ 0.7.

This reflects a positive linear relationship which is indeed not very strong.

✷


The respective marks of a class of 8 students on the assignment (x) and on the final examination (y) are asfollows:

x 120 95 91 111 102 108 87 75

y 98 82 76 95 90 92 76 61

(a) Use the estimated regression line to predict the value of y when x = 100.

(b) Compute the sample correlation coefficient and interpret the meaning of the computed value.

Solution

(a) With the use of a calculator, the regression line is given by

y = 1.0967 + 0.8381x.

The predicted value of y when x = 100 is therefore y = 1.0967 + 0.8381× 100 = 84.9067.


r ≈ 0.9802.

This reflects a positive linear relationship which is indeed very strong.

✷

117

10. Simple Linear Regression


The data collection in the following lists a sample of heights (x) and weights (y) recorded from 10 adults.

x (cm) 169 188 175 154 166 171 180 176 190 142

y (kg) 55 62 56 48 51 50 59 61 95 49

(a) Find the linear regression line of y on x.

(b) Compute the sample correlation coefficient and interpret the meaning of the computed value.

Solution

(a) With the use of a calculator, the regression line of y on x is given by

y = a+ bx ≈ −52.9087 + 0.6517x.


r ≈ 0.6949.

This reflects a positive linear relationship which is indeed not very strong.

✷

� Example 10.4 (Linear regression ⋆⋆ )

Interest rates provide an excellent leading indicator for predicting housing starts. As interest rates decline,housing starts increase, and vice versa. Suppose the data given in the following table represent the dominantinterest rates on first mortgages and the recorded building permits in a certain region over a 12-year span.

Year 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996

Interest rates (%) 6.5 6.0 6.5 7.5 8.5 9.5 10.0 9.0 7.5 9.0 11.5 15.0

Building permits 2165 2984 2780 1940 1750 1535 962 1310 2050 1695 856 510

(a) Define clearly what are x and y in this question. Find the linear regression line of y on x, i.e., findthe constants a and b such that y = a+ bx.

(b) Compute the correlation coefficient r for these data and explain the meaning of the value of r.

Solution

(a) Define the following variables:

interest rates (x) and building permits (y).

By using calculator, the regression line of y on x is given by

y = a+ bx ≈ 4094.3980− 268.5049x.

As the interest rate is increased by 1%, the number of building permits would be decreased by 268.5049.


r ≈ −0.909.

This reflects a fairly strong negative linear relationship.

✷

118

Table of the Student's t-distribution

The table gives the values of ;tα ν where Pr(Tν > t α; ν ) = α , with ν degrees of freedom

α ν

0.1 0.05 0.025 0.01 0.005 0.001 0.0005

1 3.078 6.314 12.076 31.821 63.657 318.310 636.620 2 1.886 2.920 4.303 6.965 9.925 22.326 31.598 3 1.638 2.353 3.182 4.541 5.841 10.213 12.924 4 1.533 2.132 2.776 3.747 4.604 7.173 8.610 5 1.476 2.015 2.571 3.365 4.032 5.893 6.869 6 1.440 1.943 2.447 3.143 3.707 5.208 5.959 7 1.415 1.895 2.365 2.998 3.499 4.785 5.408 8 1.397 1.860 2.306 2.896 3.355 4.501 5.041 9 1.383 1.833 2.262 2.821 3.250 4.297 4.781 10 1.372 1.812 2.228 2.764 3.169 4.144 4.587

11 1.363 1.796 2.201 2.718 3.106 4.025 4.437 12 1.356 1.782 2.179 2.681 3.055 3.930 4.318 13 1.350 1.771 2.160 2.650 3.012 3.852 4.221 14 1.345 1.761 2.145 2.624 2.977 3.787 4.140 15 1.341 1.753 2.131 2.602 2.947 3.733 4.073

16 1.337 1.746 2.120 2.583 2.921 3.686 4.015 17 1.333 1.740 2.110 2.567 2.898 3.646 3.965 18 1.330 1.734 2.101 2.552 2.878 3.610 3.922 19 1.328 1.729 2.093 2.539 2.861 3.579 3.883 20 1.325 1.725 2.086 2.528 2.845 3.552 3.850

21 1.323 1.721 2.080 2.518 2.831 3.527 3.819 22 1.321 1.717 2.074 2.508 2.819 3.505 3.792 23 1.319 1.714 2.069 2.500 2.807 3.485 3.767 24 1.318 1.711 2.064 2.492 2.797 3.467 3.745 25 1.316 1.708 2.060 2.485 2.787 3.450 3.725

26 1.315 1.706 2.056 2.479 2.779 3.435 3.707 27 1.314 1.703 2.052 2.473 2.771 3.421 3.690 28 1.313 1.701 2.048 2.467 2.763 3.408 3.674 29 1.311 1.699 2.045 2.462 2.756 3.396 3.659 30 1.310 1.697 2.042 2.457 2.750 3.385 3.646

40 1.303 1.684 2.021 2.423 2.704 3.307 3.551 60 1.296 1.671 2.000 2.390 2.660 3.232 3.460 120 1.289 1.658 1.980 2.358 2.617 3.160 3.373 ∞ 1.282 1.645 1.960 2.326 2.576 3.090 3.291

α

;tα ν

Table of the Chi-square Distribution

α = 0.995 0.99 0.98 0.975 0.95 0.90 0.80 0.20 0.10 0.05 0.025 0.02 0.01 0.005 0.001 =α ν = 1 0 0000393. 0 000157. 0 000628. 0 000982. 0.00393 0.0158 0.0642 1.642 2.706 3.841 5.024 5.412 6.635 7.879 10.827 ν = 1

2 0.0100 0.0201 0.0404 0.0506 0.103 0.211 0.446 3.219 4.605 5.991 7.378 7.824 9.210 10.597 13.815 2 3 0.0717 0.115 0.185 0.216 0.352 0.584 1.005 4.642 6.251 7.815 9.348 9.837 11.345 12.838 16.268 3 4 0.207 0.297 0.429 0.484 0.711 1.064 1.649 5.989 7.779 9.488 11.143 11.668 13.277 14.860 18.465 4 5 0.412 0.554 0.752 0.831 1.145 1.610 2.343 7.289 9.236 11.070 12.832 13.388 15.086 16.750 20.517 5

6 0.676 0.872 1.134 1.237 1.635 2.204 3.070 8.558 10.645 12.592 14.449 15.033 16.812 18.548 22.457 6 7 0.989 1.239 1.564 1.690 2.167 2.833 3.822 9.803 12.017 14.067 16.013 16.622 18.475 20.278 24.322 7 8 1.344 1.646 2.032 2.180 2.733 3.490 4.594 11.030 13.362 15.507 17.535 18.168 20.090 21.955 26.125 8 9 1.735 2.088 2.532 2.700 3.325 4.168 5.380 12.242 14.684 16.919 19.023 19.679 21.666 23.589 27.877 9

10 2.156 2.558 3.059 3.247 3.940 4.865 6.179 13.442 15.987 18.307 20.483 21.161 23.209 25.188 29.588 10

11 2.603 3.053 3.609 3.816 4.575 5.578 6.989 14.631 17.275 19.675 21.920 22.618 24.725 26.757 31.264 11 12 3.074 3.571 4.178 4.404 5.226 6.304 7.807 15.812 18.549 21.026 23.337 24.054 26.217 28.300 32.909 12 13 3.565 4.107 4.765 5.009 5.892 7.042 8.634 16.985 19.812 22.362 24.736 25.472 27.688 29.819 34.528 13 14 4.075 4.660 5.368 5.629 6.571 7.790 9.467 18.151 21.064 23.685 26.119 26.873 29.141 31.319 36.123 14 15 4.601 5.229 5.985 6.262 7.261 8.547 10.307 19.311 22.307 24.996 27.488 28.259 30.578 32.801 37.697 15

16 5.142 5.812 6.614 6.908 7.962 9.312 11.152 20.465 23.542 26.296 28.845 29.633 32.000 34.267 39.252 16 17 5.697 6.408 7.255 7.564 8.672 10.085 12.002 21.615 24.769 27.587 30.191 30.995 33.409 35.718 40.790 17 18 6.265 7.015 7.906 8.231 9.390 10.865 12.857 22.760 25.989 28.869 31.526 32.346 34.805 37.156 42.312 18 19 6.844 7.633 8.567 8.907 10.117 11.651 13.716 23.900 27.204 30.144 32.852 33.687 36.191 38.582 43.820 19 20 7.434 8.260 9.237 9.591 10.851 12.443 14.578 25.038 28.412 31.410 34.170 35.020 37.566 39.997 45.315 20

21 8.034 8.897 9.915 10.283 11.591 13.240 15.445 26.171 29.615 32.671 35.479 36.343 38.932 41.401 46.797 21 22 8.643 9.542 10.600 10.982 12.338 14.041 16.314 27.301 30.813 33.924 36.781 37.659 40.289 42.796 48.268 22 23 9.260 10.196 11.293 11.688 13.091 14.848 17.187 28.429 32.007 35.172 38.076 38.968 41.638 44.181 49.728 23 24 9.886 10.856 11.992 12.401 13.848 15.659 18.062 29.553 33.196 36.415 39.364 40.270 42.980 45.558 51.179 24 25 10.520 11.524 12.697 13.120 14.611 16.473 18.940 30.675 34.382 37.652 40.646 41.566 44.314 46.928 52.620 25

26 11.160 12.198 13.409 13.844 15.379 17.292 19.820 31.795 35.563 38.885 41.923 42.856 45.642 48.290 54.052 26 27 11.808 12.879 14.125 14.573 16.151 18.114 20.703 32.912 36.741 40.113 43.194 44.140 46.963 49.645 55.476 27 28 12.461 13.565 14.847 15.308 16.928 18.939 21.588 34.027 37.916 41.337 44.461 45.419 48.278 50.993 56.893 28 29 13.121 14.256 15.574 16.047 17.708 19.768 22.475 35.139 39.087 42.557 45.722 46.693 49.588 52.336 58.302 29 30 13.787 14.953 16.306 16.791 18.493 20.599 23.364 36.250 40.256 43.773 46.979 47.962 50.892 53.672 59.703 30

40 20.706 22.164 23.838 24.433 26.509 29.051 32.345 47.269 51.805 55.759 59.342 60.436 63.691 66.766 73.402 40 50 27.991 29.707 31.664 32.357 34.764 37.689 41.449 58.164 63.167 67.505 71.420 72.613 76.154 79.490 86.661 50 60 35.535 37.485 39.699 40.482 43.188 46.459 50.641 68.972 74.397 79.082 83.298 84.580 88.379 91.952 99.607 60 70 43.275 45.442 47.893 48.758 51.739 55.329 59.898 79.715 85.527 90.531 95.023 96.388 100.425 104.215 112.317 70 80 51.171 53.539 56.213 57.153 60.391 64.278 69.207 90.405 96.578 101.880 106.629 108.069 112.329 116.321 124.839 80

90 59.196 61.754 64.634 65.646 69.126 73.291 78.558 101.054 107.565 113.145 118.136 119.648 124.116 128.299 137.208 90

100 67.327 70.065 73.142 74.222 77.929 82.358 87.945 111.667 118.498 124.342 129.561 131.142 135.807 140.170 149.449 100

α

2; χα ν

Formula Sheet (MTH6130 Probability and Statistics)

December 22, 2015 Preliminary Version

Summarizing Data

Sample mean:

x =1

n

n∑

i=1

xi.

Median:List the numbers in ascending order. Median is:n+12

-th value if n is odd;

mean of n2-th and n+1

2-th value if n is even.

Sample variance:

s2 =1

n− 1

n∑

i=1

(xi − x)2 =1

n− 1

( n∑

i=1

x2i − nx2

)

.

Sample standard deviation:

s =√

sample variance.

Range:

range = largest value− smallest value.

Interquartile range:

IQR = Q3 −Q1 = upper quartile− lower quartile.

Frequency table:value x1 x2 · · · xk

frequency f1 f2 · · · fk

• total number of observation: n =k∑

i=1

fi.

• sample mean: x =k∑

i=1

fin

xi.

• sample variance: s2 =1

n− 1

( k∑

i=1

fix2i − nx2

)

.

Probability

Consider two events A and B, they are said to be

• mutually exclusive if P (A ∩B) = 0.

• exhaustive if P (A ∪B) = 1.

• independent if P (A ∩B) = P (A) · P (B).

Addition principle:

P (A ∪B) = P (A) + P (B)− P (A ∩B).

Conditional probability:

P (A | B) =P (A ∩B)

P (B).

Multiplication principle:

P (A ∩B) = P (A) · P (B | A) = P (B) · P (A | B).

Total probability:

P (A) = P (A | B) · P (B) + P (A | Bc) · P (Bc).

Partition law:

P (A) =

k∑

i=1

P (A ∩Bi) =

k∑

i=1

P (A | Bi) · P (Bi),

provided that B1, B2, · · · , Bk are mutually exclusive andexhaustive events.

Baye’s formula:

P (B | A) =P (A | B) · P (B)

P (A | B) · P (B) + P (A | Bc) · P (Bc)

or more generally

P (Bi | A) =P (A | Bi) · P (Bi)

P (A)=

P (A | Bi) · P (Bi)k∑

i=1

P (A | Bi) · P (Bi)

.

Discrete distributions

Mean value:

E(X) = µ =∑

xi∈S

xi · f(xi), f(xi) := P (X = xi).

Expectation rules:

E(a) = a; E(aX) = aE(X); E(X+Y ) = E(X)+E(Y ).

Variance:

Var(X) =∑

xi∈S

(xi − µ)2 · f(xi) =∑

xi∈S

x2i · f(xi)− µ2.

Variance rules:

Var(a) = 0; Var(aX) = a2 Var(X);

Var(X + Y ) = Var(X) + Var(Y ) if X, Y independent.

Binomial distribution:

X ∼ Bin(n, p), P (X = k) = Cnk pk (1− p)n−k

with mean np and variance np(1− p).

Geometric distribution:

X ∼ G(p), P (X = k) = (1− p)k−1 p

Hypergeometric distribution:

X ∼ H(N,n, r), P (X = k) =Cr

k CN−rn−k

CNn

.

Approximating Hypergeometric by Binomial when N ≫ n:

X ∼ H(N,n, r)approx.∼ Bin(n, p), where p =

r

N.

Poisson distribution:

X ∼ Poisson(λ), P (X = k) =λk e−λ

k!.

Approximating Poisson by Normal:

X ∼ Poisson(λ)approx.∼ N(µ, σ2),

where µ = mean = λ, σ2 = variance = λ.

Negative Binomial distribution:

X ∼ NegBin(r, p), P (X = k) = Ck−1r−1 pr (1− p)k−r,

where k = r, r + 1, r + 2, · · · .

Continuous distributions

Distribution function:

F (y) = P (X 6 y) =

∫ y

−∞f(x) dx.

Evaluating probabilities:

P (a < X < b) =

∫ b

a

f(x) dx = F (b)− F (a).

Uniform distribution:

X ∼ U(α, β), f(x) =

{ 1β−α

, if α < x < β,

0, otherwise.

P (a < X < b) =

∫ b

a

f(x) dx.

Normal distribution:

X ∼ N(µ, σ2), where µ = Mean and σ = Variance.

Standardization by change of variables: Z =X − µ

σ.

Evaluating probabilities by Standard Normal Table.

Approximating Binomial by Normal:

When np(1− p) > 10, X ∼ Bin(n, p)approx.∼ N(µ, σ2),

where µ = np and σ2 = np(1− p).

Continuity correction factor:

P (a 6 X 6 b) = P( (a− 0.5)− µ

σ6 Z 6

(b+ 0.5)− µ

σ

)

.

Exponential distribution:

X ∼ Exp(µ), f(x) =1

µexp−x/µ

with mean µ and variance µ2.

Relationship between Exponential and Poisson: µ =1

λ.

Joint distributions

Joint probability distribution function:

p(x, y) = P (X = x, Y = y).

Condition for independence of X and Y :

p(x, y) = pX(x)× pY (y) holds for all x, y,

where pX(x) = P (X = x) and pY (y) = P (Y = y).

Sum of independent Binomial random variables:

X ∼ Bin(n, p), Y ∼ Bin(m, p)

=⇒ X + Y ∼ Bin(n+m, p).

Sum of independent Poisson random variables:

X ∼ Poisson(λ1), Y ∼ Poisson(λ2)

=⇒ X + Y ∼ Poisson(λ1 + λ2).

Conditional probability of X, given that X + Y = n:

X ∼ Poisson(λ1), Y ∼ Poisson(λ2), and

X, Y are independent

=⇒ P (X = k | X + Y = n) = Cnk p

k(1− p)n−k,

where p =λ1

λ1 + λ2.

Confidence interval for population mean µ

Data: Sample data normally distributed. Sample size = n.

Method (n is large, n > 30; σ2 known):

• Find sample mean = x, population variance = σ2.

• Confidence level = (1− α)× 100%, identify α.

• Look in Standard Normal Table, find zα/2.

• CI is given by x± zα/2 ·σ√n

or

(

x− zα/2 ·σ√n, x+ zα/2 ·

σ√n

)

.

Method (n small; σ2 unknown):

• Find sample mean = x, sample variance = s2.


• Look in t-Table, ν = degrees of freedom = n − 1,find t(α/2,ν).

• CI is given by x± t(α/2,ν) ·s√n.

Confidence interval for difference in population means

Data: Sample data from population 1: x1, x2, · · · , xn nor-mally distributed with Sample size = n; Sample datafrom population 2: y1, y2, · · · , ym normally distributedwith Sample size = m.

Method (σ2x, σ

2y known):

• Find x, y.


• Look in Standard Normal Table, find zα/2.

• CI is given by (x− y)± zα/2 ·√

σ2x

n+

σ2y

m.

Method (Large samples; σ2x, σ

2y unknown):

• Find x, s2x; y, s2y.

• CI is given by (x− y)± zα/2 ·√

s2xn

+s2ym

.

Method (Small samples; σ2x, σ

2y unknown and σ2

x = σ2y):

• Look in t-Table, ν = degrees of freedom = n+m−2,find t(α/2,ν).

• CI is given by (x− y)± t(α/2,ν) ·√

s2xn

+s2ym

.

Method (Small samples; σ2x, σ

2y unknown but σ2

x = σ2y):

• Find pooled variance s2p =(n− 1) s2x + (m− 1) s2y

n+m− 2.

• CI is given by (x− y)± t(α/2,ν) · sp ·√

1

n+

1

m.

Hypothesis test for population mean

Data: Sample data normally distributed. Sample size = n.

Null hypothesis: H0 : µ = µ0.

Method (n is large, n > 30; σ2 known):

• Find x.

• Test statistic is given by z =x− µ0

σ/√n.

• Identify α, the significance level of the test.

• Look for (positive) critical value, zc, in StandardNormal Table based on whether the test is one-tailedor two-tailed:

zc = zα (one-tailed) or zc = zα/2 (two-tailed).

• Reject H0 at the α level of significance if z fallsin the rejection region(s), i.e., |z| > |zc|.

Method (n is small; σ2 unknown):

• Find sample mean = x, sample variance = s2.

• Test statistic is given by t =x− µ0

s/√n.


• ν = degrees of freedom = n− 1. Look for (positive)critical value, tc, in t-Table based on whether thetest is one-tailed or two-tailed:

tc = t(α,ν) (one-tailed) or tc = t(α/2,ν) (two-tailed).

• Reject H0 at the α level of significance if t falls inthe rejection region(s), i.e., |t| > |tc|.

Errors involved in hypothesis testing:

• Type I: A Type I error occurs if we reject a nullhypothesis when it is true. The probability of com-mitting a Type I error is usually denoted by α.

• Type II: A Type II error occurs if we accept a nullhypothesis when it is false. The probability of makinga Type II error is usually denoted by β.

Hypothesis test about the difference between the meanof two populations

Data 1: Sample data from population 1 normally distributed.Sample size = n1.

Data 2: Sample data from population 2 normally distributed.Sample size = n2.

Null hypothesis: H0 : µ1 = µ2.

Method (Large samples; σ21 , σ

22 known):

• Find sample means x1 and x2.

• Test statistic is given by z =x1 − x2

√

σ21

n1+

σ22

n2

.


• Look for (positive) critical value, zc, in StandardNormal Table based on whether the test is one-tailedor two-tailed:zc = zα (one-tailed) or zc = zα/2 (two-tailed).

• Reject H0 at the α level of significance if z fallsin the rejection region(s), i.e., |z| > |zc|.

Method (Small samples; σ21 , σ

22 unknown and equal):

• Find sample variances s21 and s22 as well.

• Test statistic is given by t =x1 − x2

sp

√1

n1+

1

n2

, where

sp =

√

(n1 − 1) s21 + (n2 − 1) s22n1 + n2 − 2

.


• ν = degrees of freedom = n1 + n2 − 2. Look for(positive) critical value, tc, in t-Table based onwhether the test is one-tailed or two-tailed:tc = t(α,ν) (one-tailed) or tc = t(α/2,ν) (two-tailed).

• Reject H0 at the α level of significance if t falls inthe rejection region(s), i.e., |t| > |tc|.

Regression and correlation

Linear regression model:yi = a+ bxi

for a sample of pairs (x1, y1), (x2, y2), · · · , (xn, yn).

Sxx =n∑

i=1

x2i −

1

n

( n∑

i=1

xi

)2

, Syy =n∑

i=1

y2i − 1

n

( n∑

i=1

yi)2

,

Sxy =n∑

i=1

xiyi −1

n

( n∑

i=1

xi

)( n∑

i=1

yi)

.

b =Sxy

Sxx, a = y − b x.

SSE =n∑

i=1

(yi − yi)2, r2 =

Syy − SSE

Syy, σ2 =

SSE

n− 2.

Correlation coefficient:r

which reflects how strong the linear relationship is.

Goodness of Fit Test

Null hypothesis: The sample follows the specific distribution.

Test statistic:χ2 =

n∑

i=1

(Oi − Ei)2

Ei,

Oi = observed frequency, Ei = expected frequency as-serted by H0. The critical value is: χ2

critical = χ2α, k−p−1.

Documents

MTH6130 SemesterII,2015/16 ProbabilityandStatistics ...Bayes’ theorem / formula Dependent and independent events Chapter 3. Discrete Random Variables Random variable Probability