M1905 Topic 2A Probability

8/4/2019 M1905 Topic 2A Probability

1/101

Monday, 15 August 2011

Lecture 7 - Content

2 Sets

2 Probability and counting

2 Conditional probability

Many students are not familiar with set theory, i.e. spend some more time explaining

various operations carefully. If time left, spend some time talking about cardinality

ofN, Q andR.

Statistics (Advanced): Lecture 7 1


2/101

Sets

Before we look at probability it is necessary to understand sets because probabilities

are typically described in terms of sets where an event occurs.

Definition 1. The set of all possible outcomes of an experiment is called a sample

space, denoted by . Any subset A of the sample space , denoted by A iscalled an event.

Definition 2. The counting operator N(A) is a set function that counts how manyelements belong to the set (event) A.

Example (Sample spaces).

Coin: = {H, T} N() = 2.Dice: =

{1, 2, 3, 4, 5, 6

} N(

{1, 2, 5

}) = 3;

Weight: = R+ N(R+) = .(We want to count objects for any event)



3/101

Set Notation

Before we introduce probability we need to introduce some notation. Let A, B

.

symbol set theory probability

largest set certain event

empty set impossible eventAB union of A and B event A or event BAB intersection of A and B event A and event BAC = \A complement of A not event A



4/101

Intersection Operator

The set A

B denotes the set such that if C

A

B then C

A and C

B

( is called the intersection operator).



5/101

Intersection Operator

Examples:

2 {1, 2} {red, white} = .2 {1, 2, green} {red, white, green} = {green}.2 {1, 2} {1, 2} = {1, 2}.

Some basic properties of intersections:

2 A B = B A.2 A (B C) = (A B) C.2 A B A.2 A

A = A.

2 A = .2 A B if and only if A B = A.



6/101

Union Operator

The set A

B denotes the set such that ifC

A

B then C

A and/or C

B

( is called the union operator).



7/101

Union Operator

Examples:

2 {1, 2} {red, white} = {1, 2, red, white}.2 {1, 2, green} {red, white, green} = {1, 2, red, white, green}.2 {1, 2} {1, 2} = {1, 2}.

Some basic properties of unions:

2 A B = B A.2 A (B C) = (A B) C.2 A (A B).2 A

A = A.

2 A = A.2 A B if and only if A B = B.



8/101

Set Minus

The set A

\B denotes the set such that if C

A

\B then C

A and C /

B.



9/101

Set Complement

The set Ac =

\A denotes the set such that ifC

Ac then C /

A.



10/101

Set Minus

Examples:

2 {1, 2} \ {red, white} = {1, 2}.2 {1, 2, green} \ {red, white, green} = {1, 2}.2 {1, 2} \ {1, 2} = .2

{1, 2, 3, 4

} \ {1, 3

}=

{2, 4

}.

Some basic properties of complements:

2 A \ B = B \ A.2 A Ac = .

2 A Ac

= .2 (Ac)c = A.

2 A \ A = .Statistics (Advanced): Lecture 7 10


11/101

Theorem 1. The complement of the union of A and B equals the intersection of

the complements

(A B)C

= (A

C

) (BC

).Proof. Use Venn diagrams for LHS and RHS and colour areas.

Theorem 2. de Morgans Laws.

n

i=1 Aic

=

n

i=1 Aciand

ni=1

Ai

c=

ni=1

Aci



12/101

Counting Ordered Sampling without replacement

Example (Ordered samples without replacement). The number ofordered samples

of size r we can draw without replacement from n objects is,

n (n 1) . . . (n r + 1) = n!(n r)!

Recall: 0! = 1.



13/101

Counting Unordered Sampling without replacement

Example (Unordered samples without replacement).

nCr =n

r

=

n!

r!(n r)! = n Choose r.

Recall,nCr =

nCnr

since nn r

=

n!

(n r)!((n (n r))! = n Choose r.and so

n

0

=

n

n

= 1



14/101

Sampling without replacement (cont)

Example. Consider

{A,B,C

}and select r = 2 items:

{A, B} {A, C} {B, C} order important{B, A} {C, A} {C, B} 32 = 3 samples

order not important 3 2 = 6 samples



15/101

Sampling in R

# Creating ordered lists

n = 158;x = 1:n;

set.seed(6) # set random seed to 6 to reproduce results

sample(x) # random permutation of nos 1,2,...,158: n! possibilities

sample(x,10) # choose 10 numbers without replacement

sample(x,10,TRUE) # choose 10 numbers with replacement = bootstrap sampling



16/101

What is Probability?

1. Subjective probability expresses the strength of ones belief (the basis of Bayesian

Statistics a bit on that later).

2. Classical probability concept, mathematical answer for equally likely outcomes.

Theorem 3. If there are n equally likely possibilities of which one must occur

and s are regarded as favourable ( = successes), then the probability P of a

success is given by s/n.



17/101


3. The frequency interpretation of probability:

Theorem 4. The probability of an event (or outcome) is the proportion of times

the event occur in a long run of repeated experiments.

or in words:

If an experiment is repeated n times under identical conditions, and if

the event A occurs m times, then as n becomes large (i.e. in the long-run)the probability of A occurring is the ratio m/n.



18/101


2 The constancy of the gender ratio at birth. In Australia, the proportion of male

births is fairly stable at 0.51. This long run relative frequency is used to estimatethe probability that a randomly chosen birth is male.

2 Cancer council records show the age standardised mortality rate from breast

cancer in NSW was close to 20 per 100,000 over the years 1972-2000. For a

randomly chosen woman, we use 0.0002 as the probability of breast cancer.

Example (Coin tossing).

Buffon (1707-1788): n = 4, 040 P({H}) = 50.7%.Pearson (1857-1936): n = 24, 000 P({H}) = 50.05%.

Coin Tossing in Rtable(sample(c("H","T"),4040,T))/4040

table(sample(c("H","T"),24000,T))/24000



19/101

Coin Tossing 2010s

In the 2010s Stanford Professor Persi Diaconis developed the Coin Tosser 3000.

However, the machine is designed to flip a coin with the same result every time!



20/101


4. Mathematical formulation of probability

Definition 3 (due to Andrey Kolmogorov, 1933). Given a sample space

A , we define P(A), the probability of A, to be a value of a non-negativeadditive set function that satisfies the following three axioms:

A1: For any event A, 0 P(A) 1,A2: P() = 1,

A3: If A and B are mutually exclusive events (A B = ), thenP(A B) = P(A) + P(B).

A3: IfA1, A2, A3, . . . is a finite or infinite sequence of mutually exclusive events

in , then

P(A1 A2 A3 . . .) = P(A1) + P(A2) + P(A2) + . . . .



21/101

Example (Lotto). A lotto type barrel contains 10 balls numbered 1, 2, . . . , 10. Three

balls are drawn.

i. How many distinct samples can be drawn?

n = 10C3 =

10

3

=

10 9 81 2 3 = 120.

ii. Event A = {1, 2, . . . , 7} (all numbers less than seven).7

3 = 7 6 5

1 2 3 = 35 successes P(A) = 35120 = 724.

iii. B = all drawn numbers are even: P(B) = 1120

53

= 10120 =

112.

A B = {(2, 4, 6)} P(A B) = 1/120.

iv. P(A B)? To answer this we need our next theorem.



22/101

Addition Theorem

Theorem 5 (Addition Theorem). If A and B are any events in , then

P(A B) = P(A) + P(B) P(A B)Proof. Use Venn diagrams, i.e. draw pictures OO and colour regions.

Alternatively use axioms only. First note that

P(A) = P((A

\(A

B))

(A

B))

A3= P(A

\(A

B)) + P(A

B). (1)

Similarly, P(B) = P(B \ (A B)) + P(A B). Next,P(A B) = P((A \ (A B)) (B \ (A B)) (A B))

A3= P(A \ (A B)) + P(B \ (A B)) + P(A B)= [P(A \ (A B)) + P(A B)]

+[P(B \ (A B)) + P(A B)] P(A B)= P(A) + P(B) P(A B)

which follows from the result (1).



23/101

Example (Lotto). A lotto type barrel contains 10 balls numbered 1, 2, . . . , 10. Three

balls are drawn.

i. How many distinct samples can be drawn? 120.

ii. Event A = {1, 2, . . . , 7} (all numbers less than seven). P(A) = 724.iii. B = all drawn numbers are even: P(B) = 112.

Also P(A B) = 1/120.iv. P(A B)?

P(A B) = P(A) + P(B) P(A B) = 724

+1

12 1

120=

44

120=

11

30.



24/101

Poincares Theorem

Theorem 6 (Poincares formula, not part of M1905). Let A1, A2, . . . , An be any

events in . Then,

P

n

i=1

Ai

=

ni=1

P(Ai) i


25/101

(Unconditional) probability

2 Recall 3 Axioms of probability.

2 P(AC) = 1 P(A) since A AC = hence, 1 = P() = P(A AC) =P(A) + P(AC).

2 P() = 0 because = C, hence P() = 1 P().2 etc.



26/101

Conditional Probability Motivating Example

Consider the following (fictional) table of Sports Mortality Rates compiled over the

last decade:SPORT Description DEATHS

Chess Board Game considered

the national sport of Russia 0



27/101






Boxing Barbaric Sport where two

people hit each other 5



28/101







people hit each other 5Chess Boxing 5 minutes of Chess followed

by 2 minutes of Boxing 0



29/101









Sky Diving Jumping out of a plane

with a parachute 10



30/101









Sky Diving Jumping out of a plane

with a parachute 10

Lawn Bowls Rolling a Ball acrossgrass to hit other balls 1000



31/101

Hence, Lawn Bowls is the most dangerous sport by far!



32/101


However, the number of deaths given that the sportsperson is young is zero so

thatP(Dying from Lawn Bowls|sportsperson is young) 0

even though

P(Dying from Lawn Bowls) is large.



33/101

Conditional Probability Another Motivating Example

What is the probability of the important event

A = (starting salary after uni 60k)?

What is the sample space ?

Possibilities are:

1 = {all students},2 = {all male students},3 = {all students with a maths degree}.



34/101

Conclusion

2 Probability depends on the underlying sample space !

2 Hence, if it is unclear to what sample space A refers to then make it clear by

writing

P(A|) instead of P(A)which we read as the conditional probability of A relative to or given ,

respectively.

Definition 4. If A and B are any events in and P(B) = 0 then, the conditionalprobability of A given B is

P(A|B) = P(A B)P(B)

.



35/101

Additional material for Lecture 7

A combinatorial proof of the binomial theorem

The binomial theorem says

(x + y)n =n

k=0

n

k

xkynk.

Consider the more complicated product

(x1 + y1)(x2 + y2) (xn + yn)Its expansion consists of the sum of 2n terms, each term being the product of n

factors. Each term consists either xk or yk, for each k = 1, . . . , n. For example,

(x1 + y1)(x2 + y2) = x1x2 + x1y2 + y1x2 + y1y2

Now, there is 1 =n

0

term with y terms only, n =

n1

with one x term and (n1)

y terms etc. In general, there are

nk

terms with exactly k xs and (n k) ys. The

theorem follows by letting xk = x and yk = y.



36/101

More on set theory

The operation of forming unions, intersections and complements of events obey rules

similar to the rules of algebra. Following some examples for events A, B and C:Commutative law: A B = B A and A B = B AAssociative law: (AB)C = A(B C) and (AB)C = A(B C)Distributive law: (A B) C = (A C) (B C) and (A B) C =(A C) (B C).



37/101

extra page



38/101

Tuesday, 16 August 2011

Lecture 8 - Content2 Conditional probability

2 Bayes rule

2 Integer valued random variables

Conditional probability equation

P(A|) = P(A )P()

= P(A) for P(B) > 0 : P(A|B) = P(A B)P(B)



39/101

Conditional probability (cont)

Example (Defect machine parts). Suppose that 500 machine parts are inspected

before they are shipped.

2 I = (a machine part is improperly assembled)

2 D = (a machine part contains one or more defective components)

N(S) = 500, N(I) = 30, N(D) = 15, N(I D) = 10



40/101

Example (cont)

Assumption: equal probabilities in the selection of one of the machine parts.

Using the classical concept of probability we get:P(D) = P(D|) = N(D)

N()=

15

500=

3

100,

P(D|I) = N(D I)N(I)

=10

30=

1

3>

3

100,

note that ifN() > 0, then

=N(D I) 1N()

N(I) 1N()=

P(D I)P(I)

.



41/101

General multiplication rule of probability

Theorem 7 (General multiplication rule of probability). IfA and B are any events

in , then

P(A B) = P(B) P(A|B), if P(B) = 0, changing A and B yields= P(A) P(B|A), if P(A) = 0.

Proof. This holds because,

P(A|B) := P(A B)P(B)

etc.

What happens if P(A|B) = P(A)?

additional information ofB is of no use

special multiplication rule!

P(A B) = P(A) P(B).



42/101

Definition of independence of events

Definition 5. If A and B are any two events in a sample space , we say that A

is independent of B if and only if P(A|B) = P(A).From the general multiplication rule it follows that ifP(A|B) = P(A) then P(B|A) =P(B) and we say simply that A and B are independent.



43/101

Alternative View of Independence

Alternatively, if A and B are independent then P(A B) = P(A) P(B) andhence,

P(B|A) = P(A B)P(A)

(using Bayes rule)

=P(A) P(B)

P(A)(using independence)

= P(B).

which can also be interpreted as saying that knowing A does not effecting the

probability of B.



44/101

Independence

In other words the events A and B are independent if the chance that one happens

remains the same regardless of how the other turns out.

Example. Suppose that we toss a fair coin twice. Let

A = {heads of the first toss}and

B ={

heads of the second toss}

.

Now suppose A occurred. Then

P({B knowning A has happened}) = 12.



45/101

Independence Example 2

Example. Consider the following 6 boxes

1 2 3 1 2 3

Suppose we select a box at random, as it is drawn you see that it is green. Then

P(A = {getting a 2}) = 26

=1

3

P(B = {getting a 2 if I know it is green}) = 13

Knowing the selected box is green has not changed our knowledge about which

numbers might be drawn.

Hence, the events A and B are independent.



46/101


Example. Consider the following 6 boxes

1 1 2 1 2 2

Suppose we select a box at random, as it is drawn you see that it is green. Then

P(A = {getting a 2}) = 36

=1

2

P(B = {getting a 2 if I know it is green}) = 13

Knowing the selected box is green HAS CHANGED our knowledge about which

numbers might be drawn.

Hence, the events A and B are NOT independent.



47/101


Example. Two cards are drawn at random from an ordinary deck of 52 playing

cards. What is the probability of getting two aces if(a) the first card is replaced before the second is drawn?

(Solution: 4/52 4/52 = 1/169 since here P(A1 A2) = P(A1) P(A2) )(b) The first card is not replaced before the second card is drawn?

(Solution: 4/52

3/51 = 1/221 but unlike above P(A2|

A1)= P(A

2))

Independence is violated when the sampling is without replacement.



48/101


Medical records indicate that the proportion of children who have had measles by the

age of8 is 0.4. The corresponding proportion for chicken pox is 0.5. The proportionwho have had both diseases by the age of 8 is 0.3. An infant is randomly selected.

Let A represent the event that he contracts measles, and B that he contracts chicken

pox, by the age of 8 years.

2 Estimate P(A), P(B) and P(A

B).

P(A) = 0.4, P(B) = 0.5 and P(A B) = 0.3.2 Are A and B independent?

P(A)P(B) = 0.2 = P(AB) = 0.3, so NO, A and B are not independent.



49/101

Bayes rule

Example (The burgers are better...). Assume you get your burgers

2 60% from supplier B1 [HJ]

2 30% from supplier B2 [McD]

2 10% from supplier B3 [RR]

P(B1) = 0.6, P(B2) = 0.3, and P(B3) = 0.1.

Interested in the event A =(good burger).

[Draw a picture that shows thatB1, B2, andB3 are mutually exclusive events with

B1 B2 B3 = .]



50/101

Example (cont)

It follows that,

A = A (B1 B2 B3) = (A B1) (A B2) (A B3).Note that (A B1), (A B2) and (A B3) are mutually exclusive.By Axiom 3 we get

P(A) = P((A

B1)

(A

B2)

(A

B3))

= P(A B1) + P(A B2) + P(A B3).Remember the general multiplication rule:

We already know that

P(A B) = P(B) P(A|B), if P(B) = 0,= P(A) P(B|A), if P(A) = 0.



51/101

Example (cont)

So we can write

P(A) = P(B1) P(A|B1) + P(B2) P(A|B2) + P(B3) P(A|B3)= 0.6 P(A|B1)

0.95, very good

+0.3 P(A|B2) 0.80, sufficient

+0.1 P(A|B3) 0.65, insufficient

= 0.875.

[The above probabilities 0.95, 0.8, 0.65 are from personal experience, i.e. subjective probability.]

What did the example teach us?

Strategy: decompose complicated events into mutually exclusive simple(r) events!



52/101

Total probability rule

Theorem 8 (Total probability rule). IfB1, B2, . . . , Bn are mutually exclusive events

such that B1 B2 . . . Bn = then for any event A ,P(A) =

ni=1

P(Bi) P(A|Bi).

Example (Burger, cont). We know already that supplier B3 is bad. So what is

P(B3|A) (if a burger is good is it from B3)? By definition of the conditional prob-ability, since P(A) > 0,

P(B3|A) = P(A B3)P(A)

=P(B3 A)

P(A)=

P(B3) P(A|B3)3i=1 P(Bi) P(A|Bi)

=0.1 0.65

0.875

= 0.074.

After we know that a burger is good the probability that it comes from B3 decreases

from 0.1 to 0.074.



53/101

Bayes rule or Theorem

What we just derived is the famous formula, called Bayes rule or theorem.

Theorem 9 (Bayes Rule). If B1, B2, . . . , Bn are mutually exclusive events suchthat B1 B2 . . . Bn = then for any event A ,

P(Bj|A) = P(A|Bj) P(Bj)

ni=1 P(A|Bi) P(Bi)

.

The probabilities P(Bi) are called the priori probabilities and the probabilities P(Bi|A)the posteriori probabilities, i = 1, . . . , n.



54/101

Reverend Thomas Bayes (1701 - 1761)

2 Born in Hertfordshire (London, England),

2 was a Presbyterian minister,

2 studied: theology and mathematics,

2 best known for Essay Towards Solving a

Problem in the Doctrine of Chances ,

2 where Bayes Theorem was first proposed.

2 Words: Bayes rule, Bayes Theorem,

Bayesian Statistics.



55/101

Example of Bayes Rule Screening test for Tuberculosis

TB (D+) No TB (D)X-ray Positive (S+) 22 51 73X-ray Negative (S) 8 1739 1747

30 1790 1820

What is the probability that a randomly selected individual has tuberculosis given

that his or her X-ray is positive given that P(D+

) = 0.000093?2 P(D+) = 0.000093 which implies that P(D) = 0.999907.

2 P(S+|D+) = 22/30 = 0.73332 P(S+|D) = 51/1790 = 0.0285

P(D+

|S+

) =P(S+

|D+)P(D+)

P(S+|D+)P(D+) + P(S+|D)P(D)=

0.7333 0.0000930.7333 0.000093 + 0.0285 0.999907 = 0.00239



56/101

Integer valued random variables

Many observed numbers are the random result of many possible numbers.

Definition 6. A random variable X is a real-valued function of the elements of asample space .

Note that such functions are denoted with capital letters and their images (out-

comes) with lower case letters, e.g. x.

Examples.

2 How many times (X) will you be caught speeding?

2 What will your final mark (Y) for MATH1905 be?

2 How old (Z, in years) do you think your stats lecturer is?



57/101

Random Variable Example 3 Coins

Consider tossing three coins. The number of heads showing when the coins land is

a random variable: it assigns the number 0 to the outcome {T, T, T}, the number1 to the outcome {T, T, H}, the number 2 to the outcome {T, H, H}, and thenumber 3 to the outcome {H, H, H }.



58/101

Random Variable Example 3 Coins

Events Random Variable Probability

T T T

T T H

T HT

T HH

HT T

HT HHHT

HHH

X = { Number of Heads }P(X = 0) = 1

8

P(X = 1) = 38

P(X = 2) = 38

P(X = 3) = 18



59/101

Random Variable Notation 3 Coins

We use upper case letters to denote unobserved random variables, say X, and

lower case letters to their observed values, in this case x.For example, in the above example before the three coins land we denote the number

of heads X, after the coins have landed we denote the number of coins x so that

we can write P(X = x).



60/101

The mother of all examples: Bernoulli trials!

Definition 7. Bernoulli trials satisfy the following assumptions:

(i) there are only two possible outcomes for each trial,

(ii) the probability of success is the same for each trial,

(iii) the outcomes from different trials are independent,

(iv) there are a fixed number n of Bernoulli trials conducted.

Example (n = 1, coin). : Head or Tail. We can describe the trial (before flipping

the coin) in full detail. Consider a function

X : {H, T} {0, 1} s.t. X(H) = xH = 1 and X(T) = xT = 0.What is the probability that X = xH = 1?

P(X = 1) = P(X = xH) = P(H) = p = 1/2 P(X = 0) = 1/2.



61/101

Jacob Bernoulli (16541705)

2Born in Basel (Switzerland),

2 1 of 8 mathematicians in his family,

2 studied: theology maths & astro,2 best known for Ars Conjectandi (The Art of

Conjecture),2 application of probability theory to games

of chance, introduction of the law of large

numbers.

2 Words: Bernoulli trial, Bernoulli numbers.



62/101

extra page



63/101

Monday, 22 August 2011

Lecture 9 - Content2 Distribution of a random variable

2 Binomial distribution

2 Mean of a distribution



64/101

Revised Axioms of Probability

In Lecture 7 we used the following definition of probability

Definition 8 (due to Andrey Kolmogorov, 1933). Given a sample space A ,we define P(A), the probability of A, to be a value of a non-negative additive set

function that satisfies the following three axioms:

A1: For any event A, 0 P(A) 1,A2: P() = 1,

A3: If A1, A2, A3, . . . is a finite or infinite sequence of mutually exclusive events in

, then

P(A1 A2 A3 . . .) = P(A1) + P(A2) + P(A2) + . . . .However, nothing is lost if we replace A1 : 0 P(A) 1 with A1 : 0 P(A).



65/101

Proof by Contradiction

Assume the following 3 axioms:

A1: For any event A , 0 P(A),A2: P() = 1,


, then

P(A1

A2

A3

. . .) = P(A1) + P(A2) + P(A2) + . . . .

Now let us assume that A4: For any event A , P(A) > 1, then2 1 = P() = P(A Ac) = P(A) + P(Ac).2 Rearranging we have P(Ac) = 1 P(A).2 By A4 we have P(Ac) < 0. This contradicts A1, hence A4 cannot be assumed.



66/101

Revised Axioms of Probability

Definition 9 (due to Andrey Kolmogorov, 1933). Given a sample space A

,

we define P(A), the probability of A, to be a value of a non-negative additive setfunction that satisfies the following three axioms:

A1: For any event A, 0 P(A),A2: P() = 1,


, thenP(A1 A2 A3 . . .) = P(A1) + P(A2) + P(A2) + . . . .

This is the minimal set of axioms needed to define probability.

Random Variables Reminder

n = 1 (Coin): = {H, T} X {0, 1} R. Thus X(H) = xH = 1 and X(T) =xT = 0 P(X = 1) = P(H) = p.Statistics (Advanced): Lecture 9 66


67/101

Distribution of a random variable

Definition 10. The probability distribution of a integer-valued random variable X

is a list of the possible values of X together with their probabilitiespi = P(X = i) 0 and

i

pi = 1.

There is nothing special with the subscript i; we could and will equally well use j,

k, x etc.

Definition 11. The probability that the value of a random variable X is less than

or equal to x, that is

F(x) = P(X x),is called the cumulative distribution function or just the distribution function.

Also, note that for integer valued random variables that

P(X = x) = F(x) F(x 1).



68/101

Example (n = 3, IT problems). A network is fragile. By experience: P(F) = 0.1 =

1p that in any given week 1 major problem; P(S) = 0.9 = p that there is none,respectively. Out of3 weeks, how many weeks, X, had

1 problem and with what

probability?

(a) All possible outcomes:FFF SFF FSF FFS

SSF FSS SFS SSS

(b) What is the probability of each outcome? Use special multiplication rule ofprobability because sessions are independent!?

(c) What is the probability distribution of the number of successes, X, among the

3 sessions.



69/101

Example (cont)

P(X = 0) = P(FFF) = P(F)

P(F)

P(F) = (1

p)3

P(X = 1) = P(SFF FSF FFS) mutually exclusive events

= P(SFF) + P(FSF) + P(FFS)

= 3 (1 p)2p =

3

1

(1 p)2p, select one S out of3 trials.

similarly we get for X = 2 and X = 3

P(X = 2) =

3

2

(1 p)p2, select two S out of 3 trials,

P(X = 3) =

3

3

p3, select three S out of 3 trials.



70/101

Binomial distribution

We can generalise this result for any n 1 and success probability p [0, 1].Definition 12. The probability distribution of the number of successes X = i inn N independent Bernoulli trials is called the binomial distribution,

pi = P(X = i) =

n

i

pi(1 p)ni.

The success probability of a single Bernoulli trial is p and i = 0, 1, . . . , n.

To say that the random variable X has the binomial distribution with parameters n

and p we write X B(n, p).This defines a family of probability distributions, with each member characterized

by a given value of the parameter p and the number of trials n.



71/101

Binomial distribution

Since pi, 0 i n is a probability distribution we have the identity (which we willuse later on) n

i=0

n

i

pi(1 p)ni = 1

for any 0 p 1.

A special case of the Binomial distribution is the Bernoulli distribution where n = 1and

P(Xi = i) = pi(1 p)1i.

There is another special relationship between the Bernoulli distribution and the

Binomial distribution.

If Xi Bernoulli(p) for 1 i n and Y = ni=1 Xi thenY B(n, p).



72/101

Example (Dice). Roll a fair dice 9 times. Let X be the probability of sixes obtained.

Then X B(9, 1/6); that is

pi = P(X = i)=

n

i

pi(1 p)ni

=

9

i

1

6

i5

6

9i

= 9i59i

69 , i = 0, 1, . . . , 9.



73/101

With your table calculator or with R:

> n = 9 ;

> p = 1/6;

> round(dbinom(0:n,n,p),4) # dbinom for B(n,p) probs

[1] 0.1938 0.3489 0.2791 0.1302 0.0391

[5] 0.0078 0.0010 0.0001 0.0000 0.0000

> pbinom(1,n,p) # for B(n,p) cumulative probabilities

[1] 0.5426588

Hence, P(X = 4) = 0.0391 and P(X < 2) = F(1) = 0.5426588.



74/101

Shape of the binomial distribution

2 We get a binomial distribution if

1. we are counting something over a fixed number of trials or repetitions,2. the trials are independent and

3. the probability of the outcome of interest is constant across trials.

2 The binomial distribution is centred at n p,2 the closer p to 1/2 the more symmetric the distribution/histogram,

2 the larger n the closer the shape to a bell (normal).



75/101


76/101

0.

00

0.

05

0.

10

0.

15

0

.20

Probabilities for X~B(n=10,p=0.5)

0.

0

0.

1

0.

2

0

.3

Probabilities for X~B(10,0.1)

0.

00

0.

10

0.

20

0.

30


0.

00

0.

05

0.

10

0.1

5

0.

20

0.

25




77/101

Example. In a small pond there are 50 fish, 20 of which have been tagged. Seven

fish are caught and X represents the number of tagged fish in the catch. Assume

each fish in the pond has the same chance of being caught. Is X binomial

(a) if each fish is returned before the next catch?

Yes, provided the fish do not learn from their experience, i.e. the probability of

catching each fish stays the same for each of the 7 trials.

P(X = 1) = 7120

501

30506

0.131 = dbinom(1,7,0.4)



78/101

(b) if the fish are not returned once they are caught?

This situation cannot be modelled by a binomial as the proportion of tagged fish

changes at each trial.

If there were 5,000 fish, 2,000 of which had been tagged then the change in theproportion was negligible and we could model with a binomial.

P(X = 1) =

201

306

507 = choose(20,1)*choose(30,6)/choose(50,7) (in R)

= 0.119 (to 3 d.p.)

P(X = 1) =

2000

1

30006 5000

7

= choose(2000,1)*choose(3000,6)/choose(5000,7) (in R)

= 0.131 (to 3 d.p.)



79/101

Mean of a distribution

Definition 13. For a random variable X taking values 0, 1, 2, . . . with

P(X = i) = pi i = 0, 1, 2, . . .

the mean or expected value of X is defined to be

= E(X) =

i

i pi.

Interpretation of E(X)

2 Long run average of observations of X because pi fi/n.2 Centre of balance of the probability density (histogram). (draw picture)

2 Measure of location of the distribution.

Definition 14.For any function g(X) we define the expected value E(g(X)) by

E(g(X)) =

i

g(i) pi.



80/101

Expectation of a Dice Roll

Let X = {Face showing from a dice roll} where pi = P(X = i) = 1/6 for i =

1, 2, . . . , 6. Then = E(X)

=6

i=1

i pi= i

i 1/6

= 3.5.

Note: the expected value in this case is not one of the observed values.



81/101

Mean of a distribution (cont)

Theorem 10. For constants a and b

E(aX+ b) = a E(X) + b.

Proof.

E(aX+ b) =all i

g(i) pi; where g(i) = a i + b,

= all i

(a i)pi + b pi= a

all i

i pi + ball i

pi

= a E(X) + b.



82/101

Expectation of X B(n, p)Theorem 11. The expectation of X B(n, p) is E(X) = np.

Proof.

E(X) =n

i=0

i pi =n

i=0

i n!i!(n i)!p

i(1 pi)(ni); change to i = 1, . . . , n ,

=n

i=1

i n!i!(n i)!p

i(1 pi)(ni); simplify,

=n

i=1

i n (n 1)!i(i 1)!(n i)!p

i(1 pi)(ni),

= n pn

i=1

(n 1)!(i 1)!(n i)!p

i1(1 pi)(ni); sub. j = i 1, m = n 1.

Hence, E(X) = np m

j=0mj pj(1 p)mj

sums to 1 because probabilities from Y B(m, p)

.



83/101

Example (Multiple choice section in M1905 exam is worth 35%).

20 questions and each question has 5 possible answers. A student decides to answer

the questions by selecting an answer at random.

(a) What is the expected number of correct responses? Let X denote the number

of correct answers. X B(20, 0.2). The expected number of correct answers isnp = 4

(b) Probability that the student has more than 10 correct answers?

P(X > 10) = 1 P(X 10)= 1 0.9994, with 1-pbinom(10,20,0.2)= 0.0006

(c) If the student scores 4 for a correct answer but -1 for a wrong response, what is

his expected score?

E[4 X+ (1) (20 X)] = E(5X 20) = 0.



84/101

Tuesday, 23 August 2011

Lecture 10 - Content

2 Variance of a distribution

2 More integer-valued distributions

2 Probability generating functions



85/101

Expectation of a distribution Reminders

The expectation of a distribution (or expectation of a random variable) is the mean

of the probability distribution (a measure of distribution location).Note that

2 E(X) =

i i pi =

i i P(X = i) and2 E(g(X)) =

ig(i) pi =

ig(i) P(X = i).



86/101

Variance of a distribution

Example. Suppose X (e.g. number of shoes in suitcase) takes the values 2, 4 and

6 with probabilities i 2 4 6pi 0.1 0.3 0.6

Hence, = E(X)

= i i pi=

i

i pi= 2 0.1 + 4 0.3 + 6 0.6= 5.



87/101

What is E(X2)?

Suppose X (e.g. number of shoes in suitcase) takes the values 2, 4 and 6 with

probabilities i 2 4 6pi 0.1 0.3 0.6

What is E(X2)?

Solution 1: E(X2)Def= g(i)pi = i2pi = 26.8 = 5

2.

Solution 2: i i2 = j and X X2 = Y, use E(Y) = j jpjj 4 16 36

pj 0.1 0.3 0.6

The distribution of Y can be hard to get (e.g. for continuous rvs).



88/101

Definition 15. The variance of the random variable X is defined by

Var(X) = 2 = E(X )2 = E(X2) 2,

where = E(X) and 2 is also a measure of spread.

This is like the large sample limit of a sample variance.

The standard deviation of X is =

2.



89/101

Variance of a Linear Transformation

Theorem 12. For any constants a and b

Var(aX+ b) = a2 Var(X).

Proof.

Var(aX+ b) = E[(aX+ b)2] (E[aX+ b])2= E[a2X2 + abX+ b2] (a E[X] + b)2

= a2

E[X2

] + 2ab E[X] + b2

(a E[X]2

+ 2ab E[X] + b2

)= a2(E[X2] E[X]2)= a2 Var(X).


( ) ( ) ( )


90/101

Example. If X B(n, p) then well show later that Var(X) = n p (1 p).2 Hence, ifp = 0 or 1 then the variance is 0.

2 the variance is largest when p = 0.5 and in this case it is 2

= n/4.


M i l d di ib i


91/101

More integer-valued distributions

Geometric distribution

The binomial random variable is just one possible integer-valued random variable.Suppose we have an infinite sequence of independent trials, each of which gives a

success with probability p and failure with probability q = 1 p.Definition 16. The geometric distribution with parameter p (= success prob.) has

probabilities for the number of failures X before the first success,

pi = P(X = i) = qi p, i= 0, 1, 2, . . . .

Note the probabilities add to 1:

P(X = 0) + p1 + . . . = p + qp + q2p + . . . = p(1 + q + q2 + . . .) = p

1

1 q

= 1

[By induction we can prove that 1 + q + . . . + qn = 1qn+1

1q .]


E l A f i di i h dl il i h i


92/101

Example. A fair die is thrown repeatedly until it shows a six.

(a) What is the probability that more than 7 throws are required?

P(X > 7) = 1 P(X 7) = 1 7

i=0

56i 1

6= 0.232 (3dp)

with 1-pgeom(7,1/6) or with 1-sum(dgeom(0:7,1/6)).

(b) Is it more likely that an odd number of throws is required or an even number?

Because 0 P(X = i) 1 and F() = 1 we find,P(even) P(odd) =

j=1

P(X = 2(j 1))

k=1

P(X = 2k 1)

=

j=1

P(X = 2(j 1)) P(X = 2j 1) =

j=1

q2(j1)p q2j1p

= p j=1

(q2(j1) q2j1) 0

odd number of throws are more likely.


Th P i i ti t th Bi i l


93/101

The Poisson approximation to the Binomial

The Poisson distribution often serves as a first theoretical model for counts which

do not have a natural upper bound.Possible examples

2 modeling of number of accidents, crashes, breakdowns

2 modeling radioactivity measured by the Geiger counter

2 modeling of so-called rare events (meteorite impacts, heart attacks)

Whether or not the Poisson distribution sufficiently describes count data is not

answered at this early stage but postponed to later lectures in statistics.


Th P i di ib i b h li i i di ib i f B( )


94/101

The Poisson distribution can be seen as the limiting distribution ofB(n, p):Let n , while p 0 and np (0, ).For X

B(n, p) we know that

P(X = k) =

n

k

pk

=()

(1 p)nk =()

.

Then, () = nkpk = nkk

nk =

n(n

1)

(n

k + 1)

n n nk

k! k

k!

and () = (1 p)nk =

1 n

n1

n

k

1

e .

Hence,

P(X = k) e kk!

, for k = 0, 1, 2, . . . .


A i i i d if 2 i ll!


95/101

Approximation is good if n p2 is small!X B(158, 1365) and n p2 = 0.001186:> # What is the probability that of 158 people, exactly k have a birthday today?> n = 158; p=1/365;

> round(dbinom(0:7,n,p),5);

[1] 0.64826 0.28139 0.06068 0.00867 0.00092 0.00008 0.00001 0.00000

> round(dpois(0:7,p*n),5);

[1] 0.64864 0.28078 0.06077 0.00877 0.00095 0.00008 0.00001 0.00000

But for n = 10

> n = 10; p=1/3;

> round(dbinom(0:4,n,p),5);

[1] 0.01734 0.08671 0.19509 0.26012 0.22761

> round(dpois(0:4,p*n),5);

[1] 0.03567 0.11891 0.19819 0.22021 0.18351


P b bilit di t ib ti f X B( ) d X P()


96/101

Probability distribution for X B(n, p) and X P()

0.

0

0.

2

0.

4

0.

6

Probabilities for X~B(158,1/365)

0.

0

0.

2

0.

4

0.

6

Probabilities for X~Poi(158/365)

0.

00

0.

10

0.

20

Probabilities for X~B(10,1/3)

0.

00

0.

05

0.

10

0.

15

0.

20

Probabilities for X~Poi(10/3)


P b bilit g n ting f n ti ns


97/101

Probability generating functions

Let X N and pi = P(X = i), i = 0, 1, 2, . . .

Definition 17. The probability generating function is defined as

(s) = p0 + p1s + p2s2 + p3s

3 + . . ..

Example. If X only takes a finite number of values (e.g. X B(n, p)) then (s)is a polynomial.

Alternatively (e.g. X P()) (s) is a power series.


Properties of (s)


98/101

Properties of (s)

Let s [0, 1] then2 0

(s)

1,

2 (1) = p0 + p1 + . . . = 1,

2 (s) = p1 + 2p2s + 3p3s2 + . . . 0, s 0.2 (1) = p1 + 2p2 + 3p3 + . . . = E(X) (if E(X) is finite),

2 (s) = 2p2 + 6p3 + 4 3p4 + . . . at s = 1, so (1) = E(X(X 1)) andVar(X) = E(X2) (E X)2 = (1) + (1) ((1))2.


Example (Poisson distribution) For X P()


99/101

Example (Poisson distribution). For X P(),

(s) =

i=0 e

i

i!si

= e

i=0

es

es(s)i

i!

= ees = e(s1).

Hence,

(s) = e(s1) so E(X) = (= (1))(s) = 2e(s1) so E[X(X 1)] = 2

and

Var(X) = E(X2) (E X)2 = 2 + 2 = .


Example (Binomial distribution) Let X B(n p)


100/101

Example (Binomial distribution). Let X B(n, p).First, note that

(x + y)n =n

i=0

nixiyni. (2)Then

(s) =n

i=0

si

n

i

pi(1 p)ni

=n

i=0

ni(sp)i(1 p)ni= (1 p + ps)n

which follows from (2). Then

(s) = np(1 p + ps)n1 so that (1) = E(X) = np,

(s) = np2

(n 1)(1 p + ps)n

2

so that (1) = np2

(n 1)and finally,

Var(X) = (1) ((1))2 + (1) = np2(n 1) n2p2 + np = np(1 p).Statistics (Advanced): Lecture 10 100


101/101


Documents

M1905 Topic 2A Probability