Upload
moskvax
View
217
Download
0
Embed Size (px)
Citation preview
8/4/2019 M1905 Topic 2A Probability
1/101
Monday, 15 August 2011
Lecture 7 - Content
2 Sets
2 Probability and counting
2 Conditional probability
Many students are not familiar with set theory, i.e. spend some more time explaining
various operations carefully. If time left, spend some time talking about cardinality
ofN, Q andR.
Statistics (Advanced): Lecture 7 1
8/4/2019 M1905 Topic 2A Probability
2/101
Sets
Before we look at probability it is necessary to understand sets because probabilities
are typically described in terms of sets where an event occurs.
Definition 1. The set of all possible outcomes of an experiment is called a sample
space, denoted by . Any subset A of the sample space , denoted by A iscalled an event.
Definition 2. The counting operator N(A) is a set function that counts how manyelements belong to the set (event) A.
Example (Sample spaces).
Coin: = {H, T} N() = 2.Dice: =
{1, 2, 3, 4, 5, 6
} N(
{1, 2, 5
}) = 3;
Weight: = R+ N(R+) = .(We want to count objects for any event)
Statistics (Advanced): Lecture 7 2
8/4/2019 M1905 Topic 2A Probability
3/101
Set Notation
Before we introduce probability we need to introduce some notation. Let A, B
.
symbol set theory probability
largest set certain event
empty set impossible eventAB union of A and B event A or event BAB intersection of A and B event A and event BAC = \A complement of A not event A
Statistics (Advanced): Lecture 7 3
8/4/2019 M1905 Topic 2A Probability
4/101
Intersection Operator
The set A
B denotes the set such that if C
A
B then C
A and C
B
( is called the intersection operator).
Statistics (Advanced): Lecture 7 4
8/4/2019 M1905 Topic 2A Probability
5/101
Intersection Operator
Examples:
2 {1, 2} {red, white} = .2 {1, 2, green} {red, white, green} = {green}.2 {1, 2} {1, 2} = {1, 2}.
Some basic properties of intersections:
2 A B = B A.2 A (B C) = (A B) C.2 A B A.2 A
A = A.
2 A = .2 A B if and only if A B = A.
Statistics (Advanced): Lecture 7 5
8/4/2019 M1905 Topic 2A Probability
6/101
Union Operator
The set A
B denotes the set such that ifC
A
B then C
A and/or C
B
( is called the union operator).
Statistics (Advanced): Lecture 7 6
8/4/2019 M1905 Topic 2A Probability
7/101
Union Operator
Examples:
2 {1, 2} {red, white} = {1, 2, red, white}.2 {1, 2, green} {red, white, green} = {1, 2, red, white, green}.2 {1, 2} {1, 2} = {1, 2}.
Some basic properties of unions:
2 A B = B A.2 A (B C) = (A B) C.2 A (A B).2 A
A = A.
2 A = A.2 A B if and only if A B = B.
Statistics (Advanced): Lecture 7 7
8/4/2019 M1905 Topic 2A Probability
8/101
Set Minus
The set A
\B denotes the set such that if C
A
\B then C
A and C /
B.
Statistics (Advanced): Lecture 7 8
8/4/2019 M1905 Topic 2A Probability
9/101
Set Complement
The set Ac =
\A denotes the set such that ifC
Ac then C /
A.
Statistics (Advanced): Lecture 7 9
8/4/2019 M1905 Topic 2A Probability
10/101
Set Minus
Examples:
2 {1, 2} \ {red, white} = {1, 2}.2 {1, 2, green} \ {red, white, green} = {1, 2}.2 {1, 2} \ {1, 2} = .2
{1, 2, 3, 4
} \ {1, 3
}=
{2, 4
}.
Some basic properties of complements:
2 A \ B = B \ A.2 A Ac = .
2 A Ac
= .2 (Ac)c = A.
2 A \ A = .Statistics (Advanced): Lecture 7 10
8/4/2019 M1905 Topic 2A Probability
11/101
Theorem 1. The complement of the union of A and B equals the intersection of
the complements
(A B)C
= (A
C
) (BC
).Proof. Use Venn diagrams for LHS and RHS and colour areas.
Theorem 2. de Morgans Laws.
n
i=1 Aic
=
n
i=1 Aciand
ni=1
Ai
c=
ni=1
Aci
Statistics (Advanced): Lecture 7 11
8/4/2019 M1905 Topic 2A Probability
12/101
Counting Ordered Sampling without replacement
Example (Ordered samples without replacement). The number ofordered samples
of size r we can draw without replacement from n objects is,
n (n 1) . . . (n r + 1) = n!(n r)!
Recall: 0! = 1.
Statistics (Advanced): Lecture 7 12
8/4/2019 M1905 Topic 2A Probability
13/101
Counting Unordered Sampling without replacement
Example (Unordered samples without replacement).
nCr =n
r
=
n!
r!(n r)! = n Choose r.
Recall,nCr =
nCnr
since nn r
=
n!
(n r)!((n (n r))! = n Choose r.and so
n
0
=
n
n
= 1
Statistics (Advanced): Lecture 7 13
8/4/2019 M1905 Topic 2A Probability
14/101
Sampling without replacement (cont)
Example. Consider
{A,B,C
}and select r = 2 items:
{A, B} {A, C} {B, C} order important{B, A} {C, A} {C, B} 32 = 3 samples
order not important 3 2 = 6 samples
Statistics (Advanced): Lecture 7 14
8/4/2019 M1905 Topic 2A Probability
15/101
Sampling in R
# Creating ordered lists
n = 158;x = 1:n;
set.seed(6) # set random seed to 6 to reproduce results
sample(x) # random permutation of nos 1,2,...,158: n! possibilities
sample(x,10) # choose 10 numbers without replacement
sample(x,10,TRUE) # choose 10 numbers with replacement = bootstrap sampling
Statistics (Advanced): Lecture 7 15
8/4/2019 M1905 Topic 2A Probability
16/101
What is Probability?
1. Subjective probability expresses the strength of ones belief (the basis of Bayesian
Statistics a bit on that later).
2. Classical probability concept, mathematical answer for equally likely outcomes.
Theorem 3. If there are n equally likely possibilities of which one must occur
and s are regarded as favourable ( = successes), then the probability P of a
success is given by s/n.
Statistics (Advanced): Lecture 7 16
8/4/2019 M1905 Topic 2A Probability
17/101
What is Probability?
3. The frequency interpretation of probability:
Theorem 4. The probability of an event (or outcome) is the proportion of times
the event occur in a long run of repeated experiments.
or in words:
If an experiment is repeated n times under identical conditions, and if
the event A occurs m times, then as n becomes large (i.e. in the long-run)the probability of A occurring is the ratio m/n.
Statistics (Advanced): Lecture 7 17
8/4/2019 M1905 Topic 2A Probability
18/101
What is Probability?
2 The constancy of the gender ratio at birth. In Australia, the proportion of male
births is fairly stable at 0.51. This long run relative frequency is used to estimatethe probability that a randomly chosen birth is male.
2 Cancer council records show the age standardised mortality rate from breast
cancer in NSW was close to 20 per 100,000 over the years 1972-2000. For a
randomly chosen woman, we use 0.0002 as the probability of breast cancer.
Example (Coin tossing).
Buffon (1707-1788): n = 4, 040 P({H}) = 50.7%.Pearson (1857-1936): n = 24, 000 P({H}) = 50.05%.
Coin Tossing in Rtable(sample(c("H","T"),4040,T))/4040
table(sample(c("H","T"),24000,T))/24000
Statistics (Advanced): Lecture 7 18
8/4/2019 M1905 Topic 2A Probability
19/101
Coin Tossing 2010s
In the 2010s Stanford Professor Persi Diaconis developed the Coin Tosser 3000.
However, the machine is designed to flip a coin with the same result every time!
Statistics (Advanced): Lecture 7 19
8/4/2019 M1905 Topic 2A Probability
20/101
What is Probability?
4. Mathematical formulation of probability
Definition 3 (due to Andrey Kolmogorov, 1933). Given a sample space
A , we define P(A), the probability of A, to be a value of a non-negativeadditive set function that satisfies the following three axioms:
A1: For any event A, 0 P(A) 1,A2: P() = 1,
A3: If A and B are mutually exclusive events (A B = ), thenP(A B) = P(A) + P(B).
A3: IfA1, A2, A3, . . . is a finite or infinite sequence of mutually exclusive events
in , then
P(A1 A2 A3 . . .) = P(A1) + P(A2) + P(A2) + . . . .
Statistics (Advanced): Lecture 7 20
8/4/2019 M1905 Topic 2A Probability
21/101
Example (Lotto). A lotto type barrel contains 10 balls numbered 1, 2, . . . , 10. Three
balls are drawn.
i. How many distinct samples can be drawn?
n = 10C3 =
10
3
=
10 9 81 2 3 = 120.
ii. Event A = {1, 2, . . . , 7} (all numbers less than seven).7
3 = 7 6 5
1 2 3 = 35 successes P(A) = 35120 = 724.
iii. B = all drawn numbers are even: P(B) = 1120
53
= 10120 =
112.
A B = {(2, 4, 6)} P(A B) = 1/120.
iv. P(A B)? To answer this we need our next theorem.
Statistics (Advanced): Lecture 7 21
8/4/2019 M1905 Topic 2A Probability
22/101
Addition Theorem
Theorem 5 (Addition Theorem). If A and B are any events in , then
P(A B) = P(A) + P(B) P(A B)Proof. Use Venn diagrams, i.e. draw pictures OO and colour regions.
Alternatively use axioms only. First note that
P(A) = P((A
\(A
B))
(A
B))
A3= P(A
\(A
B)) + P(A
B). (1)
Similarly, P(B) = P(B \ (A B)) + P(A B). Next,P(A B) = P((A \ (A B)) (B \ (A B)) (A B))
A3= P(A \ (A B)) + P(B \ (A B)) + P(A B)= [P(A \ (A B)) + P(A B)]
+[P(B \ (A B)) + P(A B)] P(A B)= P(A) + P(B) P(A B)
which follows from the result (1).
Statistics (Advanced): Lecture 7 22
8/4/2019 M1905 Topic 2A Probability
23/101
Example (Lotto). A lotto type barrel contains 10 balls numbered 1, 2, . . . , 10. Three
balls are drawn.
i. How many distinct samples can be drawn? 120.
ii. Event A = {1, 2, . . . , 7} (all numbers less than seven). P(A) = 724.iii. B = all drawn numbers are even: P(B) = 112.
Also P(A B) = 1/120.iv. P(A B)?
P(A B) = P(A) + P(B) P(A B) = 724
+1
12 1
120=
44
120=
11
30.
Statistics (Advanced): Lecture 7 23
8/4/2019 M1905 Topic 2A Probability
24/101
Poincares Theorem
Theorem 6 (Poincares formula, not part of M1905). Let A1, A2, . . . , An be any
events in . Then,
P
n
i=1
Ai
=
ni=1
P(Ai) i
8/4/2019 M1905 Topic 2A Probability
25/101
(Unconditional) probability
2 Recall 3 Axioms of probability.
2 P(AC) = 1 P(A) since A AC = hence, 1 = P() = P(A AC) =P(A) + P(AC).
2 P() = 0 because = C, hence P() = 1 P().2 etc.
Statistics (Advanced): Lecture 7 25
8/4/2019 M1905 Topic 2A Probability
26/101
Conditional Probability Motivating Example
Consider the following (fictional) table of Sports Mortality Rates compiled over the
last decade:SPORT Description DEATHS
Chess Board Game considered
the national sport of Russia 0
Statistics (Advanced): Lecture 7 26
8/4/2019 M1905 Topic 2A Probability
27/101
Conditional Probability Motivating Example
Consider the following (fictional) table of Sports Mortality Rates compiled over the
last decade:SPORT Description DEATHS
Chess Board Game considered
the national sport of Russia 0
Boxing Barbaric Sport where two
people hit each other 5
Statistics (Advanced): Lecture 7 27
8/4/2019 M1905 Topic 2A Probability
28/101
Conditional Probability Motivating Example
Consider the following (fictional) table of Sports Mortality Rates compiled over the
last decade:SPORT Description DEATHS
Chess Board Game considered
the national sport of Russia 0
Boxing Barbaric Sport where two
people hit each other 5Chess Boxing 5 minutes of Chess followed
by 2 minutes of Boxing 0
Statistics (Advanced): Lecture 7 28
8/4/2019 M1905 Topic 2A Probability
29/101
Conditional Probability Motivating Example
Consider the following (fictional) table of Sports Mortality Rates compiled over the
last decade:SPORT Description DEATHS
Chess Board Game considered
the national sport of Russia 0
Boxing Barbaric Sport where two
people hit each other 5Chess Boxing 5 minutes of Chess followed
by 2 minutes of Boxing 0
Sky Diving Jumping out of a plane
with a parachute 10
Statistics (Advanced): Lecture 7 29
8/4/2019 M1905 Topic 2A Probability
30/101
Conditional Probability Motivating Example
Consider the following (fictional) table of Sports Mortality Rates compiled over the
last decade:SPORT Description DEATHS
Chess Board Game considered
the national sport of Russia 0
Boxing Barbaric Sport where two
people hit each other 5Chess Boxing 5 minutes of Chess followed
by 2 minutes of Boxing 0
Sky Diving Jumping out of a plane
with a parachute 10
Lawn Bowls Rolling a Ball acrossgrass to hit other balls 1000
Statistics (Advanced): Lecture 7 30
8/4/2019 M1905 Topic 2A Probability
31/101
Hence, Lawn Bowls is the most dangerous sport by far!
Statistics (Advanced): Lecture 7 31
8/4/2019 M1905 Topic 2A Probability
32/101
Conditional Probability Motivating Example
However, the number of deaths given that the sportsperson is young is zero so
thatP(Dying from Lawn Bowls|sportsperson is young) 0
even though
P(Dying from Lawn Bowls) is large.
Statistics (Advanced): Lecture 7 32
8/4/2019 M1905 Topic 2A Probability
33/101
Conditional Probability Another Motivating Example
What is the probability of the important event
A = (starting salary after uni 60k)?
What is the sample space ?
Possibilities are:
1 = {all students},2 = {all male students},3 = {all students with a maths degree}.
Statistics (Advanced): Lecture 7 33
8/4/2019 M1905 Topic 2A Probability
34/101
Conclusion
2 Probability depends on the underlying sample space !
2 Hence, if it is unclear to what sample space A refers to then make it clear by
writing
P(A|) instead of P(A)which we read as the conditional probability of A relative to or given ,
respectively.
Definition 4. If A and B are any events in and P(B) = 0 then, the conditionalprobability of A given B is
P(A|B) = P(A B)P(B)
.
Statistics (Advanced): Lecture 7 34
8/4/2019 M1905 Topic 2A Probability
35/101
Additional material for Lecture 7
A combinatorial proof of the binomial theorem
The binomial theorem says
(x + y)n =n
k=0
n
k
xkynk.
Consider the more complicated product
(x1 + y1)(x2 + y2) (xn + yn)Its expansion consists of the sum of 2n terms, each term being the product of n
factors. Each term consists either xk or yk, for each k = 1, . . . , n. For example,
(x1 + y1)(x2 + y2) = x1x2 + x1y2 + y1x2 + y1y2
Now, there is 1 =n
0
term with y terms only, n =
n1
with one x term and (n1)
y terms etc. In general, there are
nk
terms with exactly k xs and (n k) ys. The
theorem follows by letting xk = x and yk = y.
Statistics (Advanced): Lecture 7 35
8/4/2019 M1905 Topic 2A Probability
36/101
More on set theory
The operation of forming unions, intersections and complements of events obey rules
similar to the rules of algebra. Following some examples for events A, B and C:Commutative law: A B = B A and A B = B AAssociative law: (AB)C = A(B C) and (AB)C = A(B C)Distributive law: (A B) C = (A C) (B C) and (A B) C =(A C) (B C).
Statistics (Advanced): Lecture 7 36
8/4/2019 M1905 Topic 2A Probability
37/101
extra page
Statistics (Advanced): Lecture 7 37
8/4/2019 M1905 Topic 2A Probability
38/101
Tuesday, 16 August 2011
Lecture 8 - Content2 Conditional probability
2 Bayes rule
2 Integer valued random variables
Conditional probability equation
P(A|) = P(A )P()
= P(A) for P(B) > 0 : P(A|B) = P(A B)P(B)
Statistics (Advanced): Lecture 8 38
8/4/2019 M1905 Topic 2A Probability
39/101
Conditional probability (cont)
Example (Defect machine parts). Suppose that 500 machine parts are inspected
before they are shipped.
2 I = (a machine part is improperly assembled)
2 D = (a machine part contains one or more defective components)
N(S) = 500, N(I) = 30, N(D) = 15, N(I D) = 10
Statistics (Advanced): Lecture 8 39
8/4/2019 M1905 Topic 2A Probability
40/101
Example (cont)
Assumption: equal probabilities in the selection of one of the machine parts.
Using the classical concept of probability we get:P(D) = P(D|) = N(D)
N()=
15
500=
3
100,
P(D|I) = N(D I)N(I)
=10
30=
1
3>
3
100,
note that ifN() > 0, then
=N(D I) 1N()
N(I) 1N()=
P(D I)P(I)
.
Statistics (Advanced): Lecture 8 40
8/4/2019 M1905 Topic 2A Probability
41/101
General multiplication rule of probability
Theorem 7 (General multiplication rule of probability). IfA and B are any events
in , then
P(A B) = P(B) P(A|B), if P(B) = 0, changing A and B yields= P(A) P(B|A), if P(A) = 0.
Proof. This holds because,
P(A|B) := P(A B)P(B)
etc.
What happens if P(A|B) = P(A)?
additional information ofB is of no use
special multiplication rule!
P(A B) = P(A) P(B).
Statistics (Advanced): Lecture 8 41
8/4/2019 M1905 Topic 2A Probability
42/101
Definition of independence of events
Definition 5. If A and B are any two events in a sample space , we say that A
is independent of B if and only if P(A|B) = P(A).From the general multiplication rule it follows that ifP(A|B) = P(A) then P(B|A) =P(B) and we say simply that A and B are independent.
Statistics (Advanced): Lecture 8 42
8/4/2019 M1905 Topic 2A Probability
43/101
Alternative View of Independence
Alternatively, if A and B are independent then P(A B) = P(A) P(B) andhence,
P(B|A) = P(A B)P(A)
(using Bayes rule)
=P(A) P(B)
P(A)(using independence)
= P(B).
which can also be interpreted as saying that knowing A does not effecting the
probability of B.
Statistics (Advanced): Lecture 8 43
8/4/2019 M1905 Topic 2A Probability
44/101
Independence
In other words the events A and B are independent if the chance that one happens
remains the same regardless of how the other turns out.
Example. Suppose that we toss a fair coin twice. Let
A = {heads of the first toss}and
B ={
heads of the second toss}
.
Now suppose A occurred. Then
P({B knowning A has happened}) = 12.
Statistics (Advanced): Lecture 8 44
8/4/2019 M1905 Topic 2A Probability
45/101
Independence Example 2
Example. Consider the following 6 boxes
1 2 3 1 2 3
Suppose we select a box at random, as it is drawn you see that it is green. Then
P(A = {getting a 2}) = 26
=1
3
P(B = {getting a 2 if I know it is green}) = 13
Knowing the selected box is green has not changed our knowledge about which
numbers might be drawn.
Hence, the events A and B are independent.
Statistics (Advanced): Lecture 8 45
8/4/2019 M1905 Topic 2A Probability
46/101
Independence Example 3
Example. Consider the following 6 boxes
1 1 2 1 2 2
Suppose we select a box at random, as it is drawn you see that it is green. Then
P(A = {getting a 2}) = 36
=1
2
P(B = {getting a 2 if I know it is green}) = 13
Knowing the selected box is green HAS CHANGED our knowledge about which
numbers might be drawn.
Hence, the events A and B are NOT independent.
Statistics (Advanced): Lecture 8 46
8/4/2019 M1905 Topic 2A Probability
47/101
Independence Example 4
Example. Two cards are drawn at random from an ordinary deck of 52 playing
cards. What is the probability of getting two aces if(a) the first card is replaced before the second is drawn?
(Solution: 4/52 4/52 = 1/169 since here P(A1 A2) = P(A1) P(A2) )(b) The first card is not replaced before the second card is drawn?
(Solution: 4/52
3/51 = 1/221 but unlike above P(A2|
A1)= P(A
2))
Independence is violated when the sampling is without replacement.
Statistics (Advanced): Lecture 8 47
8/4/2019 M1905 Topic 2A Probability
48/101
Independence Example 5
Medical records indicate that the proportion of children who have had measles by the
age of8 is 0.4. The corresponding proportion for chicken pox is 0.5. The proportionwho have had both diseases by the age of 8 is 0.3. An infant is randomly selected.
Let A represent the event that he contracts measles, and B that he contracts chicken
pox, by the age of 8 years.
2 Estimate P(A), P(B) and P(A
B).
P(A) = 0.4, P(B) = 0.5 and P(A B) = 0.3.2 Are A and B independent?
P(A)P(B) = 0.2 = P(AB) = 0.3, so NO, A and B are not independent.
Statistics (Advanced): Lecture 8 48
8/4/2019 M1905 Topic 2A Probability
49/101
Bayes rule
Example (The burgers are better...). Assume you get your burgers
2 60% from supplier B1 [HJ]
2 30% from supplier B2 [McD]
2 10% from supplier B3 [RR]
P(B1) = 0.6, P(B2) = 0.3, and P(B3) = 0.1.
Interested in the event A =(good burger).
[Draw a picture that shows thatB1, B2, andB3 are mutually exclusive events with
B1 B2 B3 = .]
Statistics (Advanced): Lecture 8 49
8/4/2019 M1905 Topic 2A Probability
50/101
Example (cont)
It follows that,
A = A (B1 B2 B3) = (A B1) (A B2) (A B3).Note that (A B1), (A B2) and (A B3) are mutually exclusive.By Axiom 3 we get
P(A) = P((A
B1)
(A
B2)
(A
B3))
= P(A B1) + P(A B2) + P(A B3).Remember the general multiplication rule:
We already know that
P(A B) = P(B) P(A|B), if P(B) = 0,= P(A) P(B|A), if P(A) = 0.
Statistics (Advanced): Lecture 8 50
8/4/2019 M1905 Topic 2A Probability
51/101
Example (cont)
So we can write
P(A) = P(B1) P(A|B1) + P(B2) P(A|B2) + P(B3) P(A|B3)= 0.6 P(A|B1)
0.95, very good
+0.3 P(A|B2) 0.80, sufficient
+0.1 P(A|B3) 0.65, insufficient
= 0.875.
[The above probabilities 0.95, 0.8, 0.65 are from personal experience, i.e. subjective probability.]
What did the example teach us?
Strategy: decompose complicated events into mutually exclusive simple(r) events!
Statistics (Advanced): Lecture 8 51
8/4/2019 M1905 Topic 2A Probability
52/101
Total probability rule
Theorem 8 (Total probability rule). IfB1, B2, . . . , Bn are mutually exclusive events
such that B1 B2 . . . Bn = then for any event A ,P(A) =
ni=1
P(Bi) P(A|Bi).
Example (Burger, cont). We know already that supplier B3 is bad. So what is
P(B3|A) (if a burger is good is it from B3)? By definition of the conditional prob-ability, since P(A) > 0,
P(B3|A) = P(A B3)P(A)
=P(B3 A)
P(A)=
P(B3) P(A|B3)3i=1 P(Bi) P(A|Bi)
=0.1 0.65
0.875
= 0.074.
After we know that a burger is good the probability that it comes from B3 decreases
from 0.1 to 0.074.
Statistics (Advanced): Lecture 8 52
8/4/2019 M1905 Topic 2A Probability
53/101
Bayes rule or Theorem
What we just derived is the famous formula, called Bayes rule or theorem.
Theorem 9 (Bayes Rule). If B1, B2, . . . , Bn are mutually exclusive events suchthat B1 B2 . . . Bn = then for any event A ,
P(Bj|A) = P(A|Bj) P(Bj)
ni=1 P(A|Bi) P(Bi)
.
The probabilities P(Bi) are called the priori probabilities and the probabilities P(Bi|A)the posteriori probabilities, i = 1, . . . , n.
Statistics (Advanced): Lecture 8 53
8/4/2019 M1905 Topic 2A Probability
54/101
Reverend Thomas Bayes (1701 - 1761)
2 Born in Hertfordshire (London, England),
2 was a Presbyterian minister,
2 studied: theology and mathematics,
2 best known for Essay Towards Solving a
Problem in the Doctrine of Chances ,
2 where Bayes Theorem was first proposed.
2 Words: Bayes rule, Bayes Theorem,
Bayesian Statistics.
Statistics (Advanced): Lecture 8 54
8/4/2019 M1905 Topic 2A Probability
55/101
Example of Bayes Rule Screening test for Tuberculosis
TB (D+) No TB (D)X-ray Positive (S+) 22 51 73X-ray Negative (S) 8 1739 1747
30 1790 1820
What is the probability that a randomly selected individual has tuberculosis given
that his or her X-ray is positive given that P(D+
) = 0.000093?2 P(D+) = 0.000093 which implies that P(D) = 0.999907.
2 P(S+|D+) = 22/30 = 0.73332 P(S+|D) = 51/1790 = 0.0285
P(D+
|S+
) =P(S+
|D+)P(D+)
P(S+|D+)P(D+) + P(S+|D)P(D)=
0.7333 0.0000930.7333 0.000093 + 0.0285 0.999907 = 0.00239
Statistics (Advanced): Lecture 8 55
8/4/2019 M1905 Topic 2A Probability
56/101
Integer valued random variables
Many observed numbers are the random result of many possible numbers.
Definition 6. A random variable X is a real-valued function of the elements of asample space .
Note that such functions are denoted with capital letters and their images (out-
comes) with lower case letters, e.g. x.
Examples.
2 How many times (X) will you be caught speeding?
2 What will your final mark (Y) for MATH1905 be?
2 How old (Z, in years) do you think your stats lecturer is?
Statistics (Advanced): Lecture 8 56
8/4/2019 M1905 Topic 2A Probability
57/101
Random Variable Example 3 Coins
Consider tossing three coins. The number of heads showing when the coins land is
a random variable: it assigns the number 0 to the outcome {T, T, T}, the number1 to the outcome {T, T, H}, the number 2 to the outcome {T, H, H}, and thenumber 3 to the outcome {H, H, H }.
Statistics (Advanced): Lecture 8 57
8/4/2019 M1905 Topic 2A Probability
58/101
Random Variable Example 3 Coins
Events Random Variable Probability
T T T
T T H
T HT
T HH
HT T
HT HHHT
HHH
X = { Number of Heads }P(X = 0) = 1
8
P(X = 1) = 38
P(X = 2) = 38
P(X = 3) = 18
Statistics (Advanced): Lecture 8 58
8/4/2019 M1905 Topic 2A Probability
59/101
Random Variable Notation 3 Coins
We use upper case letters to denote unobserved random variables, say X, and
lower case letters to their observed values, in this case x.For example, in the above example before the three coins land we denote the number
of heads X, after the coins have landed we denote the number of coins x so that
we can write P(X = x).
Statistics (Advanced): Lecture 8 59
8/4/2019 M1905 Topic 2A Probability
60/101
The mother of all examples: Bernoulli trials!
Definition 7. Bernoulli trials satisfy the following assumptions:
(i) there are only two possible outcomes for each trial,
(ii) the probability of success is the same for each trial,
(iii) the outcomes from different trials are independent,
(iv) there are a fixed number n of Bernoulli trials conducted.
Example (n = 1, coin). : Head or Tail. We can describe the trial (before flipping
the coin) in full detail. Consider a function
X : {H, T} {0, 1} s.t. X(H) = xH = 1 and X(T) = xT = 0.What is the probability that X = xH = 1?
P(X = 1) = P(X = xH) = P(H) = p = 1/2 P(X = 0) = 1/2.
Statistics (Advanced): Lecture 8 60
8/4/2019 M1905 Topic 2A Probability
61/101
Jacob Bernoulli (16541705)
2Born in Basel (Switzerland),
2 1 of 8 mathematicians in his family,
2 studied: theology maths & astro,2 best known for Ars Conjectandi (The Art of
Conjecture),2 application of probability theory to games
of chance, introduction of the law of large
numbers.
2 Words: Bernoulli trial, Bernoulli numbers.
Statistics (Advanced): Lecture 8 61
8/4/2019 M1905 Topic 2A Probability
62/101
extra page
Statistics (Advanced): Lecture 8 62
8/4/2019 M1905 Topic 2A Probability
63/101
Monday, 22 August 2011
Lecture 9 - Content2 Distribution of a random variable
2 Binomial distribution
2 Mean of a distribution
Statistics (Advanced): Lecture 9 63
8/4/2019 M1905 Topic 2A Probability
64/101
Revised Axioms of Probability
In Lecture 7 we used the following definition of probability
Definition 8 (due to Andrey Kolmogorov, 1933). Given a sample space A ,we define P(A), the probability of A, to be a value of a non-negative additive set
function that satisfies the following three axioms:
A1: For any event A, 0 P(A) 1,A2: P() = 1,
A3: If A1, A2, A3, . . . is a finite or infinite sequence of mutually exclusive events in
, then
P(A1 A2 A3 . . .) = P(A1) + P(A2) + P(A2) + . . . .However, nothing is lost if we replace A1 : 0 P(A) 1 with A1 : 0 P(A).
Statistics (Advanced): Lecture 9 64
8/4/2019 M1905 Topic 2A Probability
65/101
Proof by Contradiction
Assume the following 3 axioms:
A1: For any event A , 0 P(A),A2: P() = 1,
A3: If A1, A2, A3, . . . is a finite or infinite sequence of mutually exclusive events in
, then
P(A1
A2
A3
. . .) = P(A1) + P(A2) + P(A2) + . . . .
Now let us assume that A4: For any event A , P(A) > 1, then2 1 = P() = P(A Ac) = P(A) + P(Ac).2 Rearranging we have P(Ac) = 1 P(A).2 By A4 we have P(Ac) < 0. This contradicts A1, hence A4 cannot be assumed.
Statistics (Advanced): Lecture 9 65
8/4/2019 M1905 Topic 2A Probability
66/101
Revised Axioms of Probability
Definition 9 (due to Andrey Kolmogorov, 1933). Given a sample space A
,
we define P(A), the probability of A, to be a value of a non-negative additive setfunction that satisfies the following three axioms:
A1: For any event A, 0 P(A),A2: P() = 1,
A3: If A1, A2, A3, . . . is a finite or infinite sequence of mutually exclusive events in
, thenP(A1 A2 A3 . . .) = P(A1) + P(A2) + P(A2) + . . . .
This is the minimal set of axioms needed to define probability.
Random Variables Reminder
n = 1 (Coin): = {H, T} X {0, 1} R. Thus X(H) = xH = 1 and X(T) =xT = 0 P(X = 1) = P(H) = p.Statistics (Advanced): Lecture 9 66
8/4/2019 M1905 Topic 2A Probability
67/101
Distribution of a random variable
Definition 10. The probability distribution of a integer-valued random variable X
is a list of the possible values of X together with their probabilitiespi = P(X = i) 0 and
i
pi = 1.
There is nothing special with the subscript i; we could and will equally well use j,
k, x etc.
Definition 11. The probability that the value of a random variable X is less than
or equal to x, that is
F(x) = P(X x),is called the cumulative distribution function or just the distribution function.
Also, note that for integer valued random variables that
P(X = x) = F(x) F(x 1).
Statistics (Advanced): Lecture 9 67
8/4/2019 M1905 Topic 2A Probability
68/101
Example (n = 3, IT problems). A network is fragile. By experience: P(F) = 0.1 =
1p that in any given week 1 major problem; P(S) = 0.9 = p that there is none,respectively. Out of3 weeks, how many weeks, X, had
1 problem and with what
probability?
(a) All possible outcomes:FFF SFF FSF FFS
SSF FSS SFS SSS
(b) What is the probability of each outcome? Use special multiplication rule ofprobability because sessions are independent!?
(c) What is the probability distribution of the number of successes, X, among the
3 sessions.
Statistics (Advanced): Lecture 9 68
8/4/2019 M1905 Topic 2A Probability
69/101
Example (cont)
P(X = 0) = P(FFF) = P(F)
P(F)
P(F) = (1
p)3
P(X = 1) = P(SFF FSF FFS) mutually exclusive events
= P(SFF) + P(FSF) + P(FFS)
= 3 (1 p)2p =
3
1
(1 p)2p, select one S out of3 trials.
similarly we get for X = 2 and X = 3
P(X = 2) =
3
2
(1 p)p2, select two S out of 3 trials,
P(X = 3) =
3
3
p3, select three S out of 3 trials.
Statistics (Advanced): Lecture 9 69
8/4/2019 M1905 Topic 2A Probability
70/101
Binomial distribution
We can generalise this result for any n 1 and success probability p [0, 1].Definition 12. The probability distribution of the number of successes X = i inn N independent Bernoulli trials is called the binomial distribution,
pi = P(X = i) =
n
i
pi(1 p)ni.
The success probability of a single Bernoulli trial is p and i = 0, 1, . . . , n.
To say that the random variable X has the binomial distribution with parameters n
and p we write X B(n, p).This defines a family of probability distributions, with each member characterized
by a given value of the parameter p and the number of trials n.
Statistics (Advanced): Lecture 9 70
8/4/2019 M1905 Topic 2A Probability
71/101
Binomial distribution
Since pi, 0 i n is a probability distribution we have the identity (which we willuse later on) n
i=0
n
i
pi(1 p)ni = 1
for any 0 p 1.
A special case of the Binomial distribution is the Bernoulli distribution where n = 1and
P(Xi = i) = pi(1 p)1i.
There is another special relationship between the Bernoulli distribution and the
Binomial distribution.
If Xi Bernoulli(p) for 1 i n and Y = ni=1 Xi thenY B(n, p).
Statistics (Advanced): Lecture 9 71
8/4/2019 M1905 Topic 2A Probability
72/101
Example (Dice). Roll a fair dice 9 times. Let X be the probability of sixes obtained.
Then X B(9, 1/6); that is
pi = P(X = i)=
n
i
pi(1 p)ni
=
9
i
1
6
i5
6
9i
= 9i59i
69 , i = 0, 1, . . . , 9.
Statistics (Advanced): Lecture 9 72
8/4/2019 M1905 Topic 2A Probability
73/101
With your table calculator or with R:
> n = 9 ;
> p = 1/6;
> round(dbinom(0:n,n,p),4) # dbinom for B(n,p) probs
[1] 0.1938 0.3489 0.2791 0.1302 0.0391
[5] 0.0078 0.0010 0.0001 0.0000 0.0000
> pbinom(1,n,p) # for B(n,p) cumulative probabilities
[1] 0.5426588
Hence, P(X = 4) = 0.0391 and P(X < 2) = F(1) = 0.5426588.
Statistics (Advanced): Lecture 9 73
8/4/2019 M1905 Topic 2A Probability
74/101
Shape of the binomial distribution
2 We get a binomial distribution if
1. we are counting something over a fixed number of trials or repetitions,2. the trials are independent and
3. the probability of the outcome of interest is constant across trials.
2 The binomial distribution is centred at n p,2 the closer p to 1/2 the more symmetric the distribution/histogram,
2 the larger n the closer the shape to a bell (normal).
Statistics (Advanced): Lecture 9 74
8/4/2019 M1905 Topic 2A Probability
75/101
8/4/2019 M1905 Topic 2A Probability
76/101
0.
00
0.
05
0.
10
0.
15
0
.20
Probabilities for X~B(n=10,p=0.5)
0.
0
0.
1
0.
2
0
.3
Probabilities for X~B(10,0.1)
0.
00
0.
10
0.
20
0.
30
Probabilities for X~B(10,0.8)
0.
00
0.
05
0.
10
0.1
5
0.
20
0.
25
Probabilities for X~B(10,0.4)
Statistics (Advanced): Lecture 9 76
8/4/2019 M1905 Topic 2A Probability
77/101
Example. In a small pond there are 50 fish, 20 of which have been tagged. Seven
fish are caught and X represents the number of tagged fish in the catch. Assume
each fish in the pond has the same chance of being caught. Is X binomial
(a) if each fish is returned before the next catch?
Yes, provided the fish do not learn from their experience, i.e. the probability of
catching each fish stays the same for each of the 7 trials.
P(X = 1) = 7120
501
30506
0.131 = dbinom(1,7,0.4)
Statistics (Advanced): Lecture 9 77
8/4/2019 M1905 Topic 2A Probability
78/101
(b) if the fish are not returned once they are caught?
This situation cannot be modelled by a binomial as the proportion of tagged fish
changes at each trial.
If there were 5,000 fish, 2,000 of which had been tagged then the change in theproportion was negligible and we could model with a binomial.
P(X = 1) =
201
306
507 = choose(20,1)*choose(30,6)/choose(50,7) (in R)
= 0.119 (to 3 d.p.)
P(X = 1) =
2000
1
30006 5000
7
= choose(2000,1)*choose(3000,6)/choose(5000,7) (in R)
= 0.131 (to 3 d.p.)
Statistics (Advanced): Lecture 9 78
8/4/2019 M1905 Topic 2A Probability
79/101
Mean of a distribution
Definition 13. For a random variable X taking values 0, 1, 2, . . . with
P(X = i) = pi i = 0, 1, 2, . . .
the mean or expected value of X is defined to be
= E(X) =
i
i pi.
Interpretation of E(X)
2 Long run average of observations of X because pi fi/n.2 Centre of balance of the probability density (histogram). (draw picture)
2 Measure of location of the distribution.
Definition 14.For any function g(X) we define the expected value E(g(X)) by
E(g(X)) =
i
g(i) pi.
Statistics (Advanced): Lecture 9 79
8/4/2019 M1905 Topic 2A Probability
80/101
Expectation of a Dice Roll
Let X = {Face showing from a dice roll} where pi = P(X = i) = 1/6 for i =
1, 2, . . . , 6. Then = E(X)
=6
i=1
i pi= i
i 1/6
= 3.5.
Note: the expected value in this case is not one of the observed values.
Statistics (Advanced): Lecture 9 80
8/4/2019 M1905 Topic 2A Probability
81/101
Mean of a distribution (cont)
Theorem 10. For constants a and b
E(aX+ b) = a E(X) + b.
Proof.
E(aX+ b) =all i
g(i) pi; where g(i) = a i + b,
= all i
(a i)pi + b pi= a
all i
i pi + ball i
pi
= a E(X) + b.
Statistics (Advanced): Lecture 9 81
8/4/2019 M1905 Topic 2A Probability
82/101
Expectation of X B(n, p)Theorem 11. The expectation of X B(n, p) is E(X) = np.
Proof.
E(X) =n
i=0
i pi =n
i=0
i n!i!(n i)!p
i(1 pi)(ni); change to i = 1, . . . , n ,
=n
i=1
i n!i!(n i)!p
i(1 pi)(ni); simplify,
=n
i=1
i n (n 1)!i(i 1)!(n i)!p
i(1 pi)(ni),
= n pn
i=1
(n 1)!(i 1)!(n i)!p
i1(1 pi)(ni); sub. j = i 1, m = n 1.
Hence, E(X) = np m
j=0mj pj(1 p)mj
sums to 1 because probabilities from Y B(m, p)
.
Statistics (Advanced): Lecture 9 82
8/4/2019 M1905 Topic 2A Probability
83/101
Example (Multiple choice section in M1905 exam is worth 35%).
20 questions and each question has 5 possible answers. A student decides to answer
the questions by selecting an answer at random.
(a) What is the expected number of correct responses? Let X denote the number
of correct answers. X B(20, 0.2). The expected number of correct answers isnp = 4
(b) Probability that the student has more than 10 correct answers?
P(X > 10) = 1 P(X 10)= 1 0.9994, with 1-pbinom(10,20,0.2)= 0.0006
(c) If the student scores 4 for a correct answer but -1 for a wrong response, what is
his expected score?
E[4 X+ (1) (20 X)] = E(5X 20) = 0.
Statistics (Advanced): Lecture 9 83
8/4/2019 M1905 Topic 2A Probability
84/101
Tuesday, 23 August 2011
Lecture 10 - Content
2 Variance of a distribution
2 More integer-valued distributions
2 Probability generating functions
Statistics (Advanced): Lecture 10 84
8/4/2019 M1905 Topic 2A Probability
85/101
Expectation of a distribution Reminders
The expectation of a distribution (or expectation of a random variable) is the mean
of the probability distribution (a measure of distribution location).Note that
2 E(X) =
i i pi =
i i P(X = i) and2 E(g(X)) =
ig(i) pi =
ig(i) P(X = i).
Statistics (Advanced): Lecture 10 85
8/4/2019 M1905 Topic 2A Probability
86/101
Variance of a distribution
Example. Suppose X (e.g. number of shoes in suitcase) takes the values 2, 4 and
6 with probabilities i 2 4 6pi 0.1 0.3 0.6
Hence, = E(X)
= i i pi=
i
i pi= 2 0.1 + 4 0.3 + 6 0.6= 5.
Statistics (Advanced): Lecture 10 86
8/4/2019 M1905 Topic 2A Probability
87/101
What is E(X2)?
Suppose X (e.g. number of shoes in suitcase) takes the values 2, 4 and 6 with
probabilities i 2 4 6pi 0.1 0.3 0.6
What is E(X2)?
Solution 1: E(X2)Def= g(i)pi = i2pi = 26.8 = 5
2.
Solution 2: i i2 = j and X X2 = Y, use E(Y) = j jpjj 4 16 36
pj 0.1 0.3 0.6
The distribution of Y can be hard to get (e.g. for continuous rvs).
Statistics (Advanced): Lecture 10 87
8/4/2019 M1905 Topic 2A Probability
88/101
Definition 15. The variance of the random variable X is defined by
Var(X) = 2 = E(X )2 = E(X2) 2,
where = E(X) and 2 is also a measure of spread.
This is like the large sample limit of a sample variance.
The standard deviation of X is =
2.
Statistics (Advanced): Lecture 10 88
8/4/2019 M1905 Topic 2A Probability
89/101
Variance of a Linear Transformation
Theorem 12. For any constants a and b
Var(aX+ b) = a2 Var(X).
Proof.
Var(aX+ b) = E[(aX+ b)2] (E[aX+ b])2= E[a2X2 + abX+ b2] (a E[X] + b)2
= a2
E[X2
] + 2ab E[X] + b2
(a E[X]2
+ 2ab E[X] + b2
)= a2(E[X2] E[X]2)= a2 Var(X).
Statistics (Advanced): Lecture 10 89
( ) ( ) ( )
8/4/2019 M1905 Topic 2A Probability
90/101
Example. If X B(n, p) then well show later that Var(X) = n p (1 p).2 Hence, ifp = 0 or 1 then the variance is 0.
2 the variance is largest when p = 0.5 and in this case it is 2
= n/4.
Statistics (Advanced): Lecture 10 90
M i l d di ib i
8/4/2019 M1905 Topic 2A Probability
91/101
More integer-valued distributions
Geometric distribution
The binomial random variable is just one possible integer-valued random variable.Suppose we have an infinite sequence of independent trials, each of which gives a
success with probability p and failure with probability q = 1 p.Definition 16. The geometric distribution with parameter p (= success prob.) has
probabilities for the number of failures X before the first success,
pi = P(X = i) = qi p, i= 0, 1, 2, . . . .
Note the probabilities add to 1:
P(X = 0) + p1 + . . . = p + qp + q2p + . . . = p(1 + q + q2 + . . .) = p
1
1 q
= 1
[By induction we can prove that 1 + q + . . . + qn = 1qn+1
1q .]
Statistics (Advanced): Lecture 10 91
E l A f i di i h dl il i h i
8/4/2019 M1905 Topic 2A Probability
92/101
Example. A fair die is thrown repeatedly until it shows a six.
(a) What is the probability that more than 7 throws are required?
P(X > 7) = 1 P(X 7) = 1 7
i=0
56i 1
6= 0.232 (3dp)
with 1-pgeom(7,1/6) or with 1-sum(dgeom(0:7,1/6)).
(b) Is it more likely that an odd number of throws is required or an even number?
Because 0 P(X = i) 1 and F() = 1 we find,P(even) P(odd) =
j=1
P(X = 2(j 1))
k=1
P(X = 2k 1)
=
j=1
P(X = 2(j 1)) P(X = 2j 1) =
j=1
q2(j1)p q2j1p
= p j=1
(q2(j1) q2j1) 0
odd number of throws are more likely.
Statistics (Advanced): Lecture 10 92
Th P i i ti t th Bi i l
8/4/2019 M1905 Topic 2A Probability
93/101
The Poisson approximation to the Binomial
The Poisson distribution often serves as a first theoretical model for counts which
do not have a natural upper bound.Possible examples
2 modeling of number of accidents, crashes, breakdowns
2 modeling radioactivity measured by the Geiger counter
2 modeling of so-called rare events (meteorite impacts, heart attacks)
Whether or not the Poisson distribution sufficiently describes count data is not
answered at this early stage but postponed to later lectures in statistics.
Statistics (Advanced): Lecture 10 93
Th P i di ib i b h li i i di ib i f B( )
8/4/2019 M1905 Topic 2A Probability
94/101
The Poisson distribution can be seen as the limiting distribution ofB(n, p):Let n , while p 0 and np (0, ).For X
B(n, p) we know that
P(X = k) =
n
k
pk
=()
(1 p)nk =()
.
Then, () = nkpk = nkk
nk =
n(n
1)
(n
k + 1)
n n nk
k! k
k!
and () = (1 p)nk =
1 n
n1
n
k
1
e .
Hence,
P(X = k) e kk!
, for k = 0, 1, 2, . . . .
Statistics (Advanced): Lecture 10 94
A i i i d if 2 i ll!
8/4/2019 M1905 Topic 2A Probability
95/101
Approximation is good if n p2 is small!X B(158, 1365) and n p2 = 0.001186:> # What is the probability that of 158 people, exactly k have a birthday today?> n = 158; p=1/365;
> round(dbinom(0:7,n,p),5);
[1] 0.64826 0.28139 0.06068 0.00867 0.00092 0.00008 0.00001 0.00000
> round(dpois(0:7,p*n),5);
[1] 0.64864 0.28078 0.06077 0.00877 0.00095 0.00008 0.00001 0.00000
But for n = 10
> n = 10; p=1/3;
> round(dbinom(0:4,n,p),5);
[1] 0.01734 0.08671 0.19509 0.26012 0.22761
> round(dpois(0:4,p*n),5);
[1] 0.03567 0.11891 0.19819 0.22021 0.18351
Statistics (Advanced): Lecture 10 95
P b bilit di t ib ti f X B( ) d X P()
8/4/2019 M1905 Topic 2A Probability
96/101
Probability distribution for X B(n, p) and X P()
0.
0
0.
2
0.
4
0.
6
Probabilities for X~B(158,1/365)
0.
0
0.
2
0.
4
0.
6
Probabilities for X~Poi(158/365)
0.
00
0.
10
0.
20
Probabilities for X~B(10,1/3)
0.
00
0.
05
0.
10
0.
15
0.
20
Probabilities for X~Poi(10/3)
Statistics (Advanced): Lecture 10 96
P b bilit g n ting f n ti ns
8/4/2019 M1905 Topic 2A Probability
97/101
Probability generating functions
Let X N and pi = P(X = i), i = 0, 1, 2, . . .
Definition 17. The probability generating function is defined as
(s) = p0 + p1s + p2s2 + p3s
3 + . . ..
Example. If X only takes a finite number of values (e.g. X B(n, p)) then (s)is a polynomial.
Alternatively (e.g. X P()) (s) is a power series.
Statistics (Advanced): Lecture 10 97
Properties of (s)
8/4/2019 M1905 Topic 2A Probability
98/101
Properties of (s)
Let s [0, 1] then2 0
(s)
1,
2 (1) = p0 + p1 + . . . = 1,
2 (s) = p1 + 2p2s + 3p3s2 + . . . 0, s 0.2 (1) = p1 + 2p2 + 3p3 + . . . = E(X) (if E(X) is finite),
2 (s) = 2p2 + 6p3 + 4 3p4 + . . . at s = 1, so (1) = E(X(X 1)) andVar(X) = E(X2) (E X)2 = (1) + (1) ((1))2.
Statistics (Advanced): Lecture 10 98
Example (Poisson distribution) For X P()
8/4/2019 M1905 Topic 2A Probability
99/101
Example (Poisson distribution). For X P(),
(s) =
i=0 e
i
i!si
= e
i=0
es
es(s)i
i!
= ees = e(s1).
Hence,
(s) = e(s1) so E(X) = (= (1))(s) = 2e(s1) so E[X(X 1)] = 2
and
Var(X) = E(X2) (E X)2 = 2 + 2 = .
Statistics (Advanced): Lecture 10 99
Example (Binomial distribution) Let X B(n p)
8/4/2019 M1905 Topic 2A Probability
100/101
Example (Binomial distribution). Let X B(n, p).First, note that
(x + y)n =n
i=0
nixiyni. (2)Then
(s) =n
i=0
si
n
i
pi(1 p)ni
=n
i=0
ni(sp)i(1 p)ni= (1 p + ps)n
which follows from (2). Then
(s) = np(1 p + ps)n1 so that (1) = E(X) = np,
(s) = np2
(n 1)(1 p + ps)n
2
so that (1) = np2
(n 1)and finally,
Var(X) = (1) ((1))2 + (1) = np2(n 1) n2p2 + np = np(1 p).Statistics (Advanced): Lecture 10 100
8/4/2019 M1905 Topic 2A Probability
101/101
Statistics (Advanced): Lecture 10 101