Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Introduction to Probability Theory
JESPER LARSSON TRÄFF FRANCESCO VERSACI
– Lectures on Parallel Algorithms –
11 November, 2013
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 1 / 44
References
C.M. Grinstead and J.L. Snell. Introduction to Probability.http://math.dartmouth.edu/~prob/prob/prob.pdf. Amer.Math. Soc., 1997.
M. Loève. Probability Theory I. Springer, 1977.
M. Mitzenmacher and E. Upfal. Probability and Computing:Randomized Algorithms and Probabilistic Analysis. CambridgeUniversity Press, 2005.
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 2 / 44
Probability Space – 1/2
Sample space ΩThe set of outcomes of some random process
E.g. Heads, Tails for a coin toss or
, , , , ,
if we rolla die
Measurable events FA family of Ω subsets which represent all the possible events forwhich we would like to compute the probability
E.g.,
, ,
should be in F if we want to compute what’s theprobability that, by rolling a die, we get an even number as result
More formally F is a σ-algebra over Ω
We will stick to discrete probability spaces, so we can take F tobe the family of all the subsets of Ω, i.e., F = 2Ω
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 3 / 44
Probability Space – 2/2
Probability measure PrPr : F → RIt assigns probabilities toevents
E.g., if we roll a die, theprobability to get an evennumber is one half
Pr(
, ,)
=12
Andrey Kolmogorov
σ-algebra F over ΩE ∈ F ⇒ Ω \ E ∈ F (closed under complementation)
E1,E2, . . . ∈ F ⇒ ⋃i Ei ∈ F (closed under countable unions)
F is non empty (at least ∅ and Ω are in F)
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 4 / 44
Probability Measure
Pr : F → RNon-negativity ∀E ∈ F Pr(E) > 0σ-additivity For all countable sequences of pairwise disjoint eventsE1,E2, . . .
Pr
(⋃i
Ei
)=∑i
Pr (Ei)
Normalization Pr(Ω) = 1Null empty set Pr(∅) = 0 (follows from the axioms above)
Banach–Tarski paradox
In general, we define probability on a σ-algebra and not simply on 2Ω
because if Ω is infinite weird things can happen. E.g., it is possible todivide a sphere in R3 into a finite number of pairwise disjoint subsetsand, by recombining these subsets (just by moving and rotating them),get two spheres, as big as the original one.
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 5 / 44
Probability of Complementary Events
Let E be an event and E := Ω \ E its complement. Then we have
Pr(E) = 1 − Pr(E)
Proof.Ω = E ∪ E, with E and E disjoint. Then
Pr(Ω) = Pr(E) + Pr(E)
and hencePr(E) = 1 − Pr(E)
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 6 / 44
Probability of Non-Disjoint EventsSubadditivity
Two events
Pr(E1 ∪ E2) = Pr(E1) + Pr(E2) − Pr(E1 ∩ E2)
E1 E2E1 ∩ E2
Three events
Pr(E1 ∪ E2 ∪ E3) = Pr(E1) + Pr(E2) + Pr(E3)
− Pr(E1 ∩ E2) − Pr(E2 ∩ E3) − Pr(E1 ∩ E3)
+ Pr(E1 ∩ E2 ∩ E3)
n events (general case)
Pr
(n⋃i=1
Ei
)=
n∑l=1
(−1)l+1∑
16i1<···<il6nPr
(l⋂r=1
Eir
)F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 7 / 44
Independence
Independent eventsTwo events E and F are independentif and only if
Pr(E ∩ F) = Pr(E) Pr(F)
n events E1, . . . ,En are mutuallyindependent if and only if
∀I ⊆ 1, . . . ,n
Pr
(⋂i∈IEi
)=∏i∈I
Pr(Ei)
Conditional probabilityThe conditional probabilitythat event E occurs giventhat event F occurs is
Pr(E|F) =Pr(E ∩ F)
Pr(F)
We assume Pr(F) > 0
NoteIf E and F are twoindependent events, than
Pr(E|F) = Pr(E)
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 8 / 44
Law of Total Probability
Simple case
Let E and B be events, and E := Ω \ E. Then
Pr(B) = Pr(B ∩ E) + Pr(B ∩ E) = Pr(B|E) Pr(E) + Pr(B|E) Pr(E)
General caseLet E1, . . . ,En be mutually disjoint events which partition Ω (i.e.,⋃ni=1 Ei = Ω). Then, for all events B,
Pr(B) =n∑i=1
Pr(B ∩ Ei) =n∑i=1
Pr(B|Ei) Pr(Ei)
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 9 / 44
Bayes’ Law
Simple case
Let E and B be events, and E := Ω \ E. Then
Pr(E|B) =Pr(E ∩ B)
Pr(B)=
Pr(B|E) Pr(E)Pr(B)
=Pr(B|E) Pr(E)
Pr(B|E) Pr(E) + Pr(B|E) Pr(E)
General caseLet E1, . . . ,En be mutually disjoint events whichpartition Ω (i.e.,
⋃ni=1 Ei = Ω). Then, for all j
and all events B,
Pr(Ej|B) =Pr(Ej ∩ B)
Pr(B)=
Pr(B|Ej) Pr(E)∑ni=1 Pr(B|Ei) Pr(Ei) Thomas Bayes
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 10 / 44
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)?
Answer: 136
2 White and black outcomes are equals? Answer: 636 = 1
63 White and black outcomes are different? Answer: 1 − 1
6 = 56
4 The maximum of the two outcomes is less or equal to 3?Answer: 3·3
36 = 14
5 White is larger than black? Answer: 5+4+3+2+136 = 15
366 White is odd? Answer: 1
27 Both outcomes are odd? Answer: 3·3
36 = 14
8 At least one outcome is odd? Answer: 12 + 1
2 − 14 = 3
4
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals?
Answer: 636 = 1
63 White and black outcomes are different? Answer: 1 − 1
6 = 56
4 The maximum of the two outcomes is less or equal to 3?Answer: 3·3
36 = 14
5 White is larger than black? Answer: 5+4+3+2+136 = 15
366 White is odd? Answer: 1
27 Both outcomes are odd? Answer: 3·3
36 = 14
8 At least one outcome is odd? Answer: 12 + 1
2 − 14 = 3
4
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different?
Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black?
Answer: 5+4+3+2+136 = 15
366 White is odd? Answer: 1
27 Both outcomes are odd? Answer: 3·3
36 = 14
8 At least one outcome is odd? Answer: 12 + 1
2 − 14 = 3
4
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd?
Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd?
Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd?
Answer: 12 + 1
2 − 14 = 3
4
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
Examples
We roll two dice, a white one and a black one. LetΩ1 :=
, , , , ,
and Ω2 :=
, , , , ,
.
What is the global sample space Ω?
Ω = Ω1 ×Ω2 (Cartesian product)
=(
,)
,(
,)
, . . . ,(
,)
,(
,)
,(
,)
, . . .(
,)
The dice are fair. What’s the probability that1 The outcome is
(,)? Answer: 1
362 White and black outcomes are equals? Answer: 6
36 = 16
3 White and black outcomes are different? Answer: 1 − 16 = 5
64 The maximum of the two outcomes is less or equal to 3?
Answer: 3·336 = 1
45 White is larger than black? Answer: 5+4+3+2+1
36 = 1536
6 White is odd? Answer: 12
7 Both outcomes are odd? Answer: 3·336 = 1
48 At least one outcome is odd? Answer: 1
2 + 12 − 1
4 = 34
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44
Examples
A data packet travels through n = 10 routers. Each relay hasprobability p = 1% to corrupt the packet. What’s the probabilityPbad that the packet arrives corrupted at the destination?
AnswerThe n events “packet is corrupted at router i” (with 1 6 i 6 n) areindependent. It is easier to compute the probability Pok for the packetto arrive unaltered, and then take the complementary event. Theprobability that at a given relay the packet remains unaltered is 1 − p,and hence
Pok = (1 − p)n .
Finally we have
Pbad = 1 − Pok = 1 − (1 − p)n ≈ 9.56% .
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 12 / 44
Examples
A data packet travels through n = 10 routers. Each relay hasprobability p = 1% to corrupt the packet. What’s the probabilityPbad that the packet arrives corrupted at the destination?
AnswerThe n events “packet is corrupted at router i” (with 1 6 i 6 n) areindependent. It is easier to compute the probability Pok for the packetto arrive unaltered, and then take the complementary event. Theprobability that at a given relay the packet remains unaltered is 1 − p,and hence
Pok = (1 − p)n .
Finally we have
Pbad = 1 − Pok = 1 − (1 − p)n ≈ 9.56% .
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 12 / 44
Examples
A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent?
Answers: Yes, no.We roll a fair die.
1 What’s the probability that the outcome is ?Answer: Pr(
) = 1
62 We are told that the outcome is greater than 3. What is now the
probability that the outcome is ?Answer:
Pr(
|
, ,)
=Pr(
∩
, ,)
Pr(
, ,)
=Pr( )
Pr(
, ,) =
1/61/2
=13
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 13 / 44
Examples
A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent? Answers: Yes, no.We roll a fair die.
1 What’s the probability that the outcome is ?
Answer: Pr(
) = 16
2 We are told that the outcome is greater than 3. What is now theprobability that the outcome is ?Answer:
Pr(
|
, ,)
=Pr(
∩
, ,)
Pr(
, ,)
=Pr( )
Pr(
, ,) =
1/61/2
=13
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 13 / 44
Examples
A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent? Answers: Yes, no.We roll a fair die.
1 What’s the probability that the outcome is ?Answer: Pr(
) = 1
62 We are told that the outcome is greater than 3. What is now the
probability that the outcome is ?
Answer:
Pr(
|
, ,)
=Pr(
∩
, ,)
Pr(
, ,)
=Pr( )
Pr(
, ,) =
1/61/2
=13
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 13 / 44
Examples
A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent? Answers: Yes, no.We roll a fair die.
1 What’s the probability that the outcome is ?Answer: Pr(
) = 1
62 We are told that the outcome is greater than 3. What is now the
probability that the outcome is ?Answer:
Pr(
|
, ,)
=Pr(
∩
, ,)
Pr(
, ,)
=Pr( )
Pr(
, ,) =
1/61/2
=13
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 13 / 44
Examples
A medical test for some disease has probability qF = 1% of falsepositives and qN = 2% of false negatives. The percentage ofpopulation having the disease is qD = 5%.
1 What is the probability that someone, chosen at random, ispositive to the test?
2 What is the probability that someone, who is negative to the test,nonetheless has the disease?
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 14 / 44
Examples
Answer – 1/2We consider the following events:
T : The person is positive to the test
D : The person has the disease
We know that
Pr(T |D) = qF , Pr(T |D) = qN , Pr(D) = qD .
and we want to find Pr(T). The law of total probability tells us that
Pr(T) = Pr(T |D) Pr(D) + Pr(T |D) Pr(D) .
Since Pr(T |D) = 1 − Pr(T |D) we have
Pr(T) = (1 − qN)qD + qF(1 − qD) = 5.85% .
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 15 / 44
Examples
Answer – 2/2We now want to find Pr(D|T). Bayes’ law gives us
Pr(D|T) =Pr(T |D) Pr(D)
Pr(T)
=qNqD
1 − (1 − qN)qD + qF(1 − qD)≈ 1.06%
(If you don’t test, you have 5% probability to have the disease, if youtest and come out negative, you still have more than 1% of probabilityof having the disease.)
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 16 / 44
CombinatoricsPermutations
Let A = a1, . . . ,an be a set of n (distinct) elementsLet k 6 n, consider the ordered sequence of length k(ai1 , . . . ,aik)The number of such possible sequences (k-permutations of n) is
P(n, k) = n(n− 1) · · · (n− k+ 1) =n!
(n− k)!
In particular, the permutations of n elements are n!
ExampleLet A = ♠,♣,F, the ordered sequences of 2 elements are:
(♠,♣) , (♠,F) , (♣,♠) , (♣,F) , (F,♠) , (F,♣) .
P(3, 2) =3!1!
= 2 · 3 = 6
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 17 / 44
CombinatoricsCombinations
Let A = a1, . . . ,an be a set of n (distinct) elements
Let k 6 n, consider the non-ordered sequence of length kai1 , . . . ,aik
The number of such possible sequences (k-combinations of n) is
C(n,k) =(n
k
)=
n!k!(n− k)!
ExampleLet A = ♠,♣,F, the non-ordered sequences of 2 elements are:
♠,♣ = ♣,♠ , ♠,F = F,♠ , ♣,F = F,♣ .
C(3, 2) =(
32
)3!
1!2!=
63= 3
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 18 / 44
Random Variables
A random variable X on a sample space Ω is a real function on Ω:
X : Ω→ R
RemarkIn general we should also require the preimage of Borel sets to bemapped into the σ-algebra of Ω (i.e., X should be measurable), butthat’s not an issue for discrete probability spaces.
Independence of random variablesTwo random variables X and Y are independent if and only if
∀x∀y Pr ((X = x) ∩ (Y = y)) = Pr(X = x) Pr(Y = y)
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 19 / 44
Random VariablesExample
We roll two dice, thus having Ω =(
,)
,(
,)
, . . . ,(
,)
assample space. Consider the random variable X = product of the twooutcomes.
We have, e.g., X(
,)= 6 and X
(,)= 18
We write X = a to refer to the set ω ∈ Ω : X(ω) = a
Pr(X = 12) = Pr((
,)
,(
,)
,(
,)
,(
,))
= 19
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 20 / 44
Probability Distribution Functions
A random variable X is typically defined using some distributionfunctions:
Discrete randomvariablesNon-cumulative
Probability mass function(or pmf)
fX(a) := Pr(X = a)
Cumulative Cumulativedistribution function (orcdf)
FX(a) := Pr(X 6 a)
Continuous random variablesNon-cumulative Probability density
function (or pdf)∫ba
fX(t) dt = Pr(a < X 6 b)
Cumulative Cumulative distributionfunction (or cdf)
FX(a) := Pr(X 6 a) =∫a−∞ fX(t) dt
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 21 / 44
Expectation
The expectation (or expected value) E[X] of a random variable X is
Discrete r.v.
E[X] :=∑x∈X
x fX(x)
The sum is done over the image of XX := x ∈ R : ∃ω ∈ Ω s.t. X(ω) = x.
Continuous r.v.
E[X] :=∫+∞−∞ t fX(t) dt
Absolute convergence of the series/integral is required
ExampleA die is rolled. If the outcome is a prime number, you win 10e,otherwise you lose 4e. What’s the expected value of the game?
E[X] = −416+ 10
16+ 10
16− 4
16+ 10
16− 4
16= −4
12+ 10
12= 3
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 22 / 44
Sum of Random Variables
Let X and Y be two random variables, defined on the sample spacesΩX and ΩY and let a, b and c be real parameters. Let
ΩZ = ΩX ×ΩY Z : ΩZ → R Z = aX+ bY + c .
Then Z is a random variable.
Linearity of expectationFurthermore, we have,
E[Z] = E[aX+ bY + c] = a E[X] + b E[Y] + c
Note: this holds even if X and Y are not independent.
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 23 / 44
Product of Random Variables
Let X and Y be two random variables, defined on the sample spacesΩX and ΩY and let a, b and c be real parameters.
DefinitionLet
ΩZ = ΩX ×ΩY Z : ΩZ → R Z = XY .
Then Z is a random variable.
ExpectationIn general, we have,
E[Z] = E[XY] 6= E[X]E[Y]
Note: but if X and Y are independent, then E[XY] = E[X]E[Y].
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 24 / 44
Functions of a Random Variable
Let X be a random variable on the sample space Ω and let g : R→ Rbe a (measurable) function. Then g(X) is also a random variable.
Theorem (Law of the unconscious statistician)
E[g(X)] =∑x∈X
g(x)fX(x)
(E[g(X)] =
∫+∞−∞ g(t)fX(t) dt
)
NoteIn general, E[g(X)] 6= g (E[X]).
Jensen’s inequality
If g is a convex function (e.g., g : x 7→ x2)
E[g(X)] > g (E[X])
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 25 / 44
Conditional Expectation
Let X and Y be two discrete random variables and y ∈ R. We definethe conditional expectation of X, given Y = y as
E[X|Y = y] =∑x∈X
x Pr(X = x|Y = y)
E[X] =∑y∈Y
fY(y) E[X|Y = y] (total prob.)
E
[n∑i=1
Xi
∣∣∣∣∣Y = y
]=
n∑i=1
E[Xi|Y = y] (linearity)
Continuous r.v.The extension to continuous random variables is somewhat morecomplicated, and since we do not need it we are going to skip it. . .
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 26 / 44
Examples
We roll 4 dice. What is the expectation of the sum of theoutcomes?
Answer:
E[Z] = E
[4∑i=1
Xi
]=
4∑i=1
E[Xi] = 4 · 3.5 = 14
We pay 10e to play a game: two dice are rolled and we win (ine) the sum of the two outcomes. Additionally, if the twooutcomes are equal, we win further 12e. What’s the expectedvalue of the game? Answer: Let ω1 and ω2 be the two outcomes.Consider the following random variables:
X(ω1,ω2) = ω1 +ω2 Y(ω1,ω2) =
12 ω1 = ω2
0 otherwise
Note that X and Y are not independent. However, because of thelinearity of expectation we have
E[−10 + X+ Y] = −10 + E[X] + E[Y] = −10 + 7 + 2 = −1
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 27 / 44
Examples
We roll 4 dice. What is the expectation of the sum of theoutcomes? Answer:
E[Z] = E
[4∑i=1
Xi
]=
4∑i=1
E[Xi] = 4 · 3.5 = 14
We pay 10e to play a game: two dice are rolled and we win (ine) the sum of the two outcomes. Additionally, if the twooutcomes are equal, we win further 12e. What’s the expectedvalue of the game?
Answer: Let ω1 and ω2 be the two outcomes.Consider the following random variables:
X(ω1,ω2) = ω1 +ω2 Y(ω1,ω2) =
12 ω1 = ω2
0 otherwise
Note that X and Y are not independent. However, because of thelinearity of expectation we have
E[−10 + X+ Y] = −10 + E[X] + E[Y] = −10 + 7 + 2 = −1
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 27 / 44
Examples
We roll 4 dice. What is the expectation of the sum of theoutcomes? Answer:
E[Z] = E
[4∑i=1
Xi
]=
4∑i=1
E[Xi] = 4 · 3.5 = 14
We pay 10e to play a game: two dice are rolled and we win (ine) the sum of the two outcomes. Additionally, if the twooutcomes are equal, we win further 12e. What’s the expectedvalue of the game? Answer: Let ω1 and ω2 be the two outcomes.Consider the following random variables:
X(ω1,ω2) = ω1 +ω2 Y(ω1,ω2) =
12 ω1 = ω2
0 otherwise
Note that X and Y are not independent. However, because of thelinearity of expectation we have
E[−10 + X+ Y] = −10 + E[X] + E[Y] = −10 + 7 + 2 = −1
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 27 / 44
Examples
We roll one die. If the outcome is 6, then we also roll a seconddie. What is the expectation of the sum of the outcomes?
Answer: LetX be describing the first outcome (i.e., X
( )= 1, X
( )= 2,
etc.)Y be describing the second outcome (with Y = 0 if the die is not
rolled)Again, X and Y are not independent. We want to computeE[X+ Y] = E[X] + E[Y]. E[X] is clearly equal to 3.5, as for E[Y]
E[Y] = fX(6) E[Y|X = 6] + (1 − fX(6)) E[Y|X 6= 6]
Since E[Y|X = 6] = 3.5 and E[Y|X 6= 6] = 0, we finally obtain
E[X+ Y] = 3.5 +3.56
=4912≈ 4.08
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 28 / 44
Examples
We roll one die. If the outcome is 6, then we also roll a seconddie. What is the expectation of the sum of the outcomes?Answer: LetX be describing the first outcome (i.e., X
( )= 1, X
( )= 2,
etc.)Y be describing the second outcome (with Y = 0 if the die is not
rolled)Again, X and Y are not independent. We want to computeE[X+ Y] = E[X] + E[Y]. E[X] is clearly equal to 3.5, as for E[Y]
E[Y] = fX(6) E[Y|X = 6] + (1 − fX(6)) E[Y|X 6= 6]
Since E[Y|X = 6] = 3.5 and E[Y|X 6= 6] = 0, we finally obtain
E[X+ Y] = 3.5 +3.56
=4912≈ 4.08
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 28 / 44
Binomial DistributionDefinition
Bernoulli indicatorAn experiment has probability p to succeeds and 1− p to fail (e.g., wetoss a biased coin). We define the following random variable X:
X :=
1 if the experiment succeeds
0 otherwise
Binomial distributionWe repeat the Bernoulli experiment n times (independently, and withthe same distribution). We now want to count the number ofsuccessfull experiments, and hence define the new random variable Y:
Y :=
n∑i=1
Xi
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 29 / 44
Binomial DistributionProperties
Expected valuesBernoulli ∀i E[Xi] = 1 · p+ 0 · (1 − p) = p
Binomial E[Y] = E
[n∑i=1
Xi
]=
n∑i=1
E[Xi] = np
Distribution function of Binomial r.v.
fY(k) = Pr(Y = k) =
(n
k
)pk(1 − p)n−k
Reminder: Binomial theorem
∀x∀y (x+ y)n =
n∑k=0
(n
k
)xkyn−k
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 30 / 44
Geometric DistributionDefinition
We repeat a Bernoulli trials process until an experiment succeeds. LetX be the random variable which gives the number of tests we have toperform until the experiment finally succeeds.
Distribution function of the Geometric r.v.
fX(n) = Pr(X = n) = (1 − p)n−1p
Why geometric?Let’s prove that fX is really a distribution (i.e., that is sums to one). Letq := 1 − p.
∞∑n=1
fX(n) =
∞∑n=1
(1 − p)n−1p = p
∞∑n=0
qn =p
1 − q= 1
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 31 / 44
Geometric DistributionProperties
Expectation
E[X] =∞∑n=1
nfX(n) =
∞∑n=1
n(1 − p)n−1p
= p
∞∑n=1
nqn−1 = p
∞∑n=1
d(qn)dq
= pd
dq
∞∑n=1
qn
= pd
dq
[1
1 − q− 1]= p
ddq
[q
1 − q
]= p
1(1 − q)2 =
1p
The geometric r.v. is memoryless
Pr(X = n+ k |X > k) = Pr(X = n)
Which means that if some number has not come out for 20 weeks inthe Lotto game, it isn’t more luckily to be extracted. . .
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 32 / 44
Example: Coupon Collector’s Problem
There are n different coupons and we want to collect all of them.Each day we get a new (uniformly random) coupon. How long does ittake, on average, to finish the collection?
Answer – 1/2Consider the following random variables:
Xi Number of coupons we get while we have exactly i− 1 coupons
X Number of coupons we get until we have all the coupons
We have
X =
n∑i=1
Xi ⇒ E[X] =n∑i=1
E[Xi]
Each Xi is a geometric r.v., with
pi = 1 −i− 1n
=n− i+ 1
n
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 33 / 44
Example: Coupon Collector’s Problem
Answer – 2/2We finally have
E[Xi] =1pi
=n
n− i+ 1
E[X] =n∑i=1
E[Xi] =n∑i=1
n
n− i+ 1= n
n∑i=1
1i= nHn
If n = 80 coupons, then E[X] ≈ 397 days.
Harmonic numbers
Hn :=
n∑i=1
1i= lnn+ γ+O
(1n
)with γ ≈ 0.577 being the Euler-Mascheroni constant
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 34 / 44
Example: Randomized Quicksort
When sorting n elements, mergesort makes, on both the worst andaverage cases, M(n) = n log2(n) +O(n) comparisons. Weimplement quicksort choosing the pivot uniformly at random. What isthe average number of comparisons made by quicksort?
Answer – 1/3Let [x1, x2, . . . , xn] be the sorted list of elements and ∀j∀i < j let Xijbe the following Bernoulli random variable
Xij :=
1 if xi and xj get compared during the execution
0 otherwise
and let X be
X :=
n−1∑i=1
n∑j=i+1
Xij
We want to compute E[X].F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 35 / 44
Example: Randomized Quicksort
Answer – 2/3By linearity of expectation we have
E[X] =n−1∑i=1
n∑j=i+1
E[Xij]
,
and then we just need to compute the probability that a generic pair(xi, xj) gets compared during the execution of the algorithm.Consider the segment [xi, xi+1, . . . , xj]; xi and xj are compared ifand only if one of the two is chosen as pivot before all theintermediate values xi+1, . . . , xj−1 (otherwise they are split indifferent classes and never compared against each other). Theprobability for this to happen is
pij =2
j− i+ 1⇒ E
[Xij]=
2j− i+ 1
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 36 / 44
Example: Randomized Quicksort
Answer – 3/3
E[X] =n−1∑i=1
n∑j=i+1
E[Xij]=
n−1∑i=1
n∑j=i+1
2j− i+ 1
=
n−1∑i=1
n−i+1∑k=2
2k=
n∑k=2
n+1−k∑i=1
2k=
n∑k=2
2(n+ 1 − k)
k
= (n+ 1)
[n∑k=2
2k
]− 2(n− 1) = 2(n+ 1)
[n∑k=1
1k
]− 4n
= 2(n+ 1)Hn−4n = 2n ln(n) +O(n)
=2
log2(e)n log2(n) +O(n) ≈ 1.39M(n)
Randomized quicksort makes about 39% more comparisons thanmergesort on the average cases.
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 37 / 44
Moments
Let X be a discrete random variable
Raw momentsThe n-th (raw) moment of X is
E [Xn] =∑x∈X
xnfX(x)
Central momentsThe n-th central moment of X is
E [(X− E[X])n] =∑x∈X
(x− µ)n fX(x)
where µ := E[X].
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 38 / 44
Variance
Definition
Var[X] = σ2[X] := E[(X− E[X])2
]Note
Var[X] = E[(X− E[X])2
]= E
[X2 − 2X E[X] + (E[X])2
]= E
[X2]− 2 E[X] E[X] + (E[X])2
= E[X2]− (E[X])2
Sum of independent variablesLet X1,X2, . . . ,Xn be mutually independent random variables. Then
Var
[n∑i=1
Xi
]=
n∑i=1
Var [Xi]
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 39 / 44
Examples
Bernoulli distribution
Var[X] = p(1 − p)
Binomial distribution
Var[X] = np(1 − p)
Geometric distribution
Var[X] =1 − p
p2
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 40 / 44
Markov’s Inequality
TheoremLet X be a non-negative r.v. Then,
∀a ∈ R,a > 0 Pr(X > a) 6E[X]a
ExampleA coin is tossed 100 times. Let X be the number of heads outcomes(binomial variable with p = 1
2 ). We have
E[X] = 50 Var[X] = 100 · 12· 1
2= 25
An upper bound on the probability that we have at least 80 heads isgiven by
Pr(X > 80) 6E[X]80
=58= 62.5%
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 41 / 44
Chebyshev’s Inequality
TheoremLet X be a r.v. Then,
∀a ∈ R,a > 0 Pr (|X− E[X]| > a) 6Var[X]a2
ExampleAgain, a coin is tossed 100 times. An upper bound on the probabilitythat we have at least 80 heads is given by
Pr(X > 80) =Pr (|X− E[X]| > 30)
2
612
Var[X]302 =
12
25900
=1
72≈ 1.39%
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 42 / 44
Chernoff BoundFor the Sum of Poisson Trials
Theorem
X :=
n∑i=1
Xi, with Xi Bernoulli r.v. with parameter pi
Let µ := E[X], then
∀δ ∈ ]0, 1[ Pr (|X− µ| > δµ) 6 2 exp(−µδ2
3
)ExampleAgain, a coin is tossed 100 times. Give an upper bound on theprobability that we have at least 80 heads. Let µδ = 30⇒ δ = 3
5 , then
Pr(X > 80) 6 exp(−µδ2
3
)= exp
(−50 · 93 · 25
)= exp(−6) ≈ 0.248%
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 43 / 44
References
C.M. Grinstead and J.L. Snell. Introduction to Probability.http://math.dartmouth.edu/~prob/prob/prob.pdf. Amer.Math. Soc., 1997.
M. Loève. Probability Theory I. Springer, 1977.
M. Mitzenmacher and E. Upfal. Probability and Computing:Randomized Algorithms and Probabilistic Analysis. CambridgeUniversity Press, 2005.
F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 44 / 44