61
Introduction to Probability Theory JESPER LARSSON TRÄFF FRANCESCO VERSACI – Lectures on Parallel Algorithms – 11 November, 2013 F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 1 / 44

Introduction to Probability Theory - TU Wien

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Probability Theory - TU Wien

Introduction to Probability Theory

JESPER LARSSON TRÄFF FRANCESCO VERSACI

– Lectures on Parallel Algorithms –

11 November, 2013

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 1 / 44

Page 2: Introduction to Probability Theory - TU Wien

References

C.M. Grinstead and J.L. Snell. Introduction to Probability.http://math.dartmouth.edu/~prob/prob/prob.pdf. Amer.Math. Soc., 1997.

M. Loève. Probability Theory I. Springer, 1977.

M. Mitzenmacher and E. Upfal. Probability and Computing:Randomized Algorithms and Probabilistic Analysis. CambridgeUniversity Press, 2005.

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 2 / 44

Page 3: Introduction to Probability Theory - TU Wien

Probability Space – 1/2

Sample space ΩThe set of outcomes of some random process

E.g. Heads, Tails for a coin toss or

, , , , ,

if we rolla die

Measurable events FA family of Ω subsets which represent all the possible events forwhich we would like to compute the probability

E.g.,

, ,

should be in F if we want to compute what’s theprobability that, by rolling a die, we get an even number as result

More formally F is a σ-algebra over Ω

We will stick to discrete probability spaces, so we can take F tobe the family of all the subsets of Ω, i.e., F = 2Ω

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 3 / 44

Page 4: Introduction to Probability Theory - TU Wien

Probability Space – 2/2

Probability measure PrPr : F → RIt assigns probabilities toevents

E.g., if we roll a die, theprobability to get an evennumber is one half

Pr(

, ,)

=12

Andrey Kolmogorov

σ-algebra F over ΩE ∈ F ⇒ Ω \ E ∈ F (closed under complementation)

E1,E2, . . . ∈ F ⇒ ⋃i Ei ∈ F (closed under countable unions)

F is non empty (at least ∅ and Ω are in F)

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 4 / 44

Page 5: Introduction to Probability Theory - TU Wien

Probability Measure

Pr : F → RNon-negativity ∀E ∈ F Pr(E) > 0σ-additivity For all countable sequences of pairwise disjoint eventsE1,E2, . . .

Pr

(⋃i

Ei

)=∑i

Pr (Ei)

Normalization Pr(Ω) = 1Null empty set Pr(∅) = 0 (follows from the axioms above)

Banach–Tarski paradox

In general, we define probability on a σ-algebra and not simply on 2Ω

because if Ω is infinite weird things can happen. E.g., it is possible todivide a sphere in R3 into a finite number of pairwise disjoint subsetsand, by recombining these subsets (just by moving and rotating them),get two spheres, as big as the original one.

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 5 / 44

Page 6: Introduction to Probability Theory - TU Wien

Probability of Complementary Events

Let E be an event and E := Ω \ E its complement. Then we have

Pr(E) = 1 − Pr(E)

Proof.Ω = E ∪ E, with E and E disjoint. Then

Pr(Ω) = Pr(E) + Pr(E)

and hencePr(E) = 1 − Pr(E)

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 6 / 44

Page 7: Introduction to Probability Theory - TU Wien

Probability of Non-Disjoint EventsSubadditivity

Two events

Pr(E1 ∪ E2) = Pr(E1) + Pr(E2) − Pr(E1 ∩ E2)

E1 E2E1 ∩ E2

Three events

Pr(E1 ∪ E2 ∪ E3) = Pr(E1) + Pr(E2) + Pr(E3)

− Pr(E1 ∩ E2) − Pr(E2 ∩ E3) − Pr(E1 ∩ E3)

+ Pr(E1 ∩ E2 ∩ E3)

n events (general case)

Pr

(n⋃i=1

Ei

)=

n∑l=1

(−1)l+1∑

16i1<···<il6nPr

(l⋂r=1

Eir

)F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 7 / 44

Page 8: Introduction to Probability Theory - TU Wien

Independence

Independent eventsTwo events E and F are independentif and only if

Pr(E ∩ F) = Pr(E) Pr(F)

n events E1, . . . ,En are mutuallyindependent if and only if

∀I ⊆ 1, . . . ,n

Pr

(⋂i∈IEi

)=∏i∈I

Pr(Ei)

Conditional probabilityThe conditional probabilitythat event E occurs giventhat event F occurs is

Pr(E|F) =Pr(E ∩ F)

Pr(F)

We assume Pr(F) > 0

NoteIf E and F are twoindependent events, than

Pr(E|F) = Pr(E)

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 8 / 44

Page 9: Introduction to Probability Theory - TU Wien

Law of Total Probability

Simple case

Let E and B be events, and E := Ω \ E. Then

Pr(B) = Pr(B ∩ E) + Pr(B ∩ E) = Pr(B|E) Pr(E) + Pr(B|E) Pr(E)

General caseLet E1, . . . ,En be mutually disjoint events which partition Ω (i.e.,⋃ni=1 Ei = Ω). Then, for all events B,

Pr(B) =n∑i=1

Pr(B ∩ Ei) =n∑i=1

Pr(B|Ei) Pr(Ei)

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 9 / 44

Page 10: Introduction to Probability Theory - TU Wien

Bayes’ Law

Simple case

Let E and B be events, and E := Ω \ E. Then

Pr(E|B) =Pr(E ∩ B)

Pr(B)=

Pr(B|E) Pr(E)Pr(B)

=Pr(B|E) Pr(E)

Pr(B|E) Pr(E) + Pr(B|E) Pr(E)

General caseLet E1, . . . ,En be mutually disjoint events whichpartition Ω (i.e.,

⋃ni=1 Ei = Ω). Then, for all j

and all events B,

Pr(Ej|B) =Pr(Ej ∩ B)

Pr(B)=

Pr(B|Ej) Pr(E)∑ni=1 Pr(B|Ei) Pr(Ei) Thomas Bayes

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 10 / 44

Page 11: Introduction to Probability Theory - TU Wien

Examples

We roll two dice, a white one and a black one. LetΩ1 :=

, , , , ,

and Ω2 :=

, , , , ,

.

What is the global sample space Ω?

Ω = Ω1 ×Ω2 (Cartesian product)

=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)

The dice are fair. What’s the probability that1 The outcome is

(,)? Answer: 1

362 White and black outcomes are equals? Answer: 6

36 = 16

3 White and black outcomes are different? Answer: 1 − 16 = 5

64 The maximum of the two outcomes is less or equal to 3?

Answer: 3·336 = 1

45 White is larger than black? Answer: 5+4+3+2+1

36 = 1536

6 White is odd? Answer: 12

7 Both outcomes are odd? Answer: 3·336 = 1

48 At least one outcome is odd? Answer: 1

2 + 12 − 1

4 = 34

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44

Page 12: Introduction to Probability Theory - TU Wien

Examples

We roll two dice, a white one and a black one. LetΩ1 :=

, , , , ,

and Ω2 :=

, , , , ,

.

What is the global sample space Ω?

Ω = Ω1 ×Ω2 (Cartesian product)

=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)

The dice are fair. What’s the probability that1 The outcome is

(,)? Answer: 1

362 White and black outcomes are equals? Answer: 6

36 = 16

3 White and black outcomes are different? Answer: 1 − 16 = 5

64 The maximum of the two outcomes is less or equal to 3?

Answer: 3·336 = 1

45 White is larger than black? Answer: 5+4+3+2+1

36 = 1536

6 White is odd? Answer: 12

7 Both outcomes are odd? Answer: 3·336 = 1

48 At least one outcome is odd? Answer: 1

2 + 12 − 1

4 = 34

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44

Page 13: Introduction to Probability Theory - TU Wien

Examples

We roll two dice, a white one and a black one. LetΩ1 :=

, , , , ,

and Ω2 :=

, , , , ,

.

What is the global sample space Ω?

Ω = Ω1 ×Ω2 (Cartesian product)

=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)

The dice are fair. What’s the probability that1 The outcome is

(,)?

Answer: 136

2 White and black outcomes are equals? Answer: 636 = 1

63 White and black outcomes are different? Answer: 1 − 1

6 = 56

4 The maximum of the two outcomes is less or equal to 3?Answer: 3·3

36 = 14

5 White is larger than black? Answer: 5+4+3+2+136 = 15

366 White is odd? Answer: 1

27 Both outcomes are odd? Answer: 3·3

36 = 14

8 At least one outcome is odd? Answer: 12 + 1

2 − 14 = 3

4

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44

Page 14: Introduction to Probability Theory - TU Wien

Examples

We roll two dice, a white one and a black one. LetΩ1 :=

, , , , ,

and Ω2 :=

, , , , ,

.

What is the global sample space Ω?

Ω = Ω1 ×Ω2 (Cartesian product)

=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)

The dice are fair. What’s the probability that1 The outcome is

(,)? Answer: 1

362 White and black outcomes are equals?

Answer: 636 = 1

63 White and black outcomes are different? Answer: 1 − 1

6 = 56

4 The maximum of the two outcomes is less or equal to 3?Answer: 3·3

36 = 14

5 White is larger than black? Answer: 5+4+3+2+136 = 15

366 White is odd? Answer: 1

27 Both outcomes are odd? Answer: 3·3

36 = 14

8 At least one outcome is odd? Answer: 12 + 1

2 − 14 = 3

4

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44

Page 15: Introduction to Probability Theory - TU Wien

Examples

We roll two dice, a white one and a black one. LetΩ1 :=

, , , , ,

and Ω2 :=

, , , , ,

.

What is the global sample space Ω?

Ω = Ω1 ×Ω2 (Cartesian product)

=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)

The dice are fair. What’s the probability that1 The outcome is

(,)? Answer: 1

362 White and black outcomes are equals? Answer: 6

36 = 16

3 White and black outcomes are different?

Answer: 1 − 16 = 5

64 The maximum of the two outcomes is less or equal to 3?

Answer: 3·336 = 1

45 White is larger than black? Answer: 5+4+3+2+1

36 = 1536

6 White is odd? Answer: 12

7 Both outcomes are odd? Answer: 3·336 = 1

48 At least one outcome is odd? Answer: 1

2 + 12 − 1

4 = 34

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44

Page 16: Introduction to Probability Theory - TU Wien

Examples

We roll two dice, a white one and a black one. LetΩ1 :=

, , , , ,

and Ω2 :=

, , , , ,

.

What is the global sample space Ω?

Ω = Ω1 ×Ω2 (Cartesian product)

=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)

The dice are fair. What’s the probability that1 The outcome is

(,)? Answer: 1

362 White and black outcomes are equals? Answer: 6

36 = 16

3 White and black outcomes are different? Answer: 1 − 16 = 5

64 The maximum of the two outcomes is less or equal to 3?

Answer: 3·336 = 1

45 White is larger than black? Answer: 5+4+3+2+1

36 = 1536

6 White is odd? Answer: 12

7 Both outcomes are odd? Answer: 3·336 = 1

48 At least one outcome is odd? Answer: 1

2 + 12 − 1

4 = 34

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44

Page 17: Introduction to Probability Theory - TU Wien

Examples

We roll two dice, a white one and a black one. LetΩ1 :=

, , , , ,

and Ω2 :=

, , , , ,

.

What is the global sample space Ω?

Ω = Ω1 ×Ω2 (Cartesian product)

=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)

The dice are fair. What’s the probability that1 The outcome is

(,)? Answer: 1

362 White and black outcomes are equals? Answer: 6

36 = 16

3 White and black outcomes are different? Answer: 1 − 16 = 5

64 The maximum of the two outcomes is less or equal to 3?

Answer: 3·336 = 1

45 White is larger than black?

Answer: 5+4+3+2+136 = 15

366 White is odd? Answer: 1

27 Both outcomes are odd? Answer: 3·3

36 = 14

8 At least one outcome is odd? Answer: 12 + 1

2 − 14 = 3

4

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44

Page 18: Introduction to Probability Theory - TU Wien

Examples

We roll two dice, a white one and a black one. LetΩ1 :=

, , , , ,

and Ω2 :=

, , , , ,

.

What is the global sample space Ω?

Ω = Ω1 ×Ω2 (Cartesian product)

=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)

The dice are fair. What’s the probability that1 The outcome is

(,)? Answer: 1

362 White and black outcomes are equals? Answer: 6

36 = 16

3 White and black outcomes are different? Answer: 1 − 16 = 5

64 The maximum of the two outcomes is less or equal to 3?

Answer: 3·336 = 1

45 White is larger than black? Answer: 5+4+3+2+1

36 = 1536

6 White is odd?

Answer: 12

7 Both outcomes are odd? Answer: 3·336 = 1

48 At least one outcome is odd? Answer: 1

2 + 12 − 1

4 = 34

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44

Page 19: Introduction to Probability Theory - TU Wien

Examples

We roll two dice, a white one and a black one. LetΩ1 :=

, , , , ,

and Ω2 :=

, , , , ,

.

What is the global sample space Ω?

Ω = Ω1 ×Ω2 (Cartesian product)

=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)

The dice are fair. What’s the probability that1 The outcome is

(,)? Answer: 1

362 White and black outcomes are equals? Answer: 6

36 = 16

3 White and black outcomes are different? Answer: 1 − 16 = 5

64 The maximum of the two outcomes is less or equal to 3?

Answer: 3·336 = 1

45 White is larger than black? Answer: 5+4+3+2+1

36 = 1536

6 White is odd? Answer: 12

7 Both outcomes are odd?

Answer: 3·336 = 1

48 At least one outcome is odd? Answer: 1

2 + 12 − 1

4 = 34

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44

Page 20: Introduction to Probability Theory - TU Wien

Examples

We roll two dice, a white one and a black one. LetΩ1 :=

, , , , ,

and Ω2 :=

, , , , ,

.

What is the global sample space Ω?

Ω = Ω1 ×Ω2 (Cartesian product)

=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)

The dice are fair. What’s the probability that1 The outcome is

(,)? Answer: 1

362 White and black outcomes are equals? Answer: 6

36 = 16

3 White and black outcomes are different? Answer: 1 − 16 = 5

64 The maximum of the two outcomes is less or equal to 3?

Answer: 3·336 = 1

45 White is larger than black? Answer: 5+4+3+2+1

36 = 1536

6 White is odd? Answer: 12

7 Both outcomes are odd? Answer: 3·336 = 1

48 At least one outcome is odd?

Answer: 12 + 1

2 − 14 = 3

4

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44

Page 21: Introduction to Probability Theory - TU Wien

Examples

We roll two dice, a white one and a black one. LetΩ1 :=

, , , , ,

and Ω2 :=

, , , , ,

.

What is the global sample space Ω?

Ω = Ω1 ×Ω2 (Cartesian product)

=(

,)

,(

,)

, . . . ,(

,)

,(

,)

,(

,)

, . . .(

,)

The dice are fair. What’s the probability that1 The outcome is

(,)? Answer: 1

362 White and black outcomes are equals? Answer: 6

36 = 16

3 White and black outcomes are different? Answer: 1 − 16 = 5

64 The maximum of the two outcomes is less or equal to 3?

Answer: 3·336 = 1

45 White is larger than black? Answer: 5+4+3+2+1

36 = 1536

6 White is odd? Answer: 12

7 Both outcomes are odd? Answer: 3·336 = 1

48 At least one outcome is odd? Answer: 1

2 + 12 − 1

4 = 34

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 11 / 44

Page 22: Introduction to Probability Theory - TU Wien

Examples

A data packet travels through n = 10 routers. Each relay hasprobability p = 1% to corrupt the packet. What’s the probabilityPbad that the packet arrives corrupted at the destination?

AnswerThe n events “packet is corrupted at router i” (with 1 6 i 6 n) areindependent. It is easier to compute the probability Pok for the packetto arrive unaltered, and then take the complementary event. Theprobability that at a given relay the packet remains unaltered is 1 − p,and hence

Pok = (1 − p)n .

Finally we have

Pbad = 1 − Pok = 1 − (1 − p)n ≈ 9.56% .

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 12 / 44

Page 23: Introduction to Probability Theory - TU Wien

Examples

A data packet travels through n = 10 routers. Each relay hasprobability p = 1% to corrupt the packet. What’s the probabilityPbad that the packet arrives corrupted at the destination?

AnswerThe n events “packet is corrupted at router i” (with 1 6 i 6 n) areindependent. It is easier to compute the probability Pok for the packetto arrive unaltered, and then take the complementary event. Theprobability that at a given relay the packet remains unaltered is 1 − p,and hence

Pok = (1 − p)n .

Finally we have

Pbad = 1 − Pok = 1 − (1 − p)n ≈ 9.56% .

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 12 / 44

Page 24: Introduction to Probability Theory - TU Wien

Examples

A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent?

Answers: Yes, no.We roll a fair die.

1 What’s the probability that the outcome is ?Answer: Pr(

) = 1

62 We are told that the outcome is greater than 3. What is now the

probability that the outcome is ?Answer:

Pr(

|

, ,)

=Pr(

, ,)

Pr(

, ,)

=Pr( )

Pr(

, ,) =

1/61/2

=13

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 13 / 44

Page 25: Introduction to Probability Theory - TU Wien

Examples

A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent? Answers: Yes, no.We roll a fair die.

1 What’s the probability that the outcome is ?

Answer: Pr(

) = 16

2 We are told that the outcome is greater than 3. What is now theprobability that the outcome is ?Answer:

Pr(

|

, ,)

=Pr(

, ,)

Pr(

, ,)

=Pr( )

Pr(

, ,) =

1/61/2

=13

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 13 / 44

Page 26: Introduction to Probability Theory - TU Wien

Examples

A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent? Answers: Yes, no.We roll a fair die.

1 What’s the probability that the outcome is ?Answer: Pr(

) = 1

62 We are told that the outcome is greater than 3. What is now the

probability that the outcome is ?

Answer:

Pr(

|

, ,)

=Pr(

, ,)

Pr(

, ,)

=Pr( )

Pr(

, ,) =

1/61/2

=13

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 13 / 44

Page 27: Introduction to Probability Theory - TU Wien

Examples

A coin is tossed twice. Consider the following events:A: Heads on the first tossB: Heads on the second tossC: The two tosses come out the sameAre A,B and C pairwise independent? Are they mutuallyindependent? Answers: Yes, no.We roll a fair die.

1 What’s the probability that the outcome is ?Answer: Pr(

) = 1

62 We are told that the outcome is greater than 3. What is now the

probability that the outcome is ?Answer:

Pr(

|

, ,)

=Pr(

, ,)

Pr(

, ,)

=Pr( )

Pr(

, ,) =

1/61/2

=13

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 13 / 44

Page 28: Introduction to Probability Theory - TU Wien

Examples

A medical test for some disease has probability qF = 1% of falsepositives and qN = 2% of false negatives. The percentage ofpopulation having the disease is qD = 5%.

1 What is the probability that someone, chosen at random, ispositive to the test?

2 What is the probability that someone, who is negative to the test,nonetheless has the disease?

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 14 / 44

Page 29: Introduction to Probability Theory - TU Wien

Examples

Answer – 1/2We consider the following events:

T : The person is positive to the test

D : The person has the disease

We know that

Pr(T |D) = qF , Pr(T |D) = qN , Pr(D) = qD .

and we want to find Pr(T). The law of total probability tells us that

Pr(T) = Pr(T |D) Pr(D) + Pr(T |D) Pr(D) .

Since Pr(T |D) = 1 − Pr(T |D) we have

Pr(T) = (1 − qN)qD + qF(1 − qD) = 5.85% .

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 15 / 44

Page 30: Introduction to Probability Theory - TU Wien

Examples

Answer – 2/2We now want to find Pr(D|T). Bayes’ law gives us

Pr(D|T) =Pr(T |D) Pr(D)

Pr(T)

=qNqD

1 − (1 − qN)qD + qF(1 − qD)≈ 1.06%

(If you don’t test, you have 5% probability to have the disease, if youtest and come out negative, you still have more than 1% of probabilityof having the disease.)

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 16 / 44

Page 31: Introduction to Probability Theory - TU Wien

CombinatoricsPermutations

Let A = a1, . . . ,an be a set of n (distinct) elementsLet k 6 n, consider the ordered sequence of length k(ai1 , . . . ,aik)The number of such possible sequences (k-permutations of n) is

P(n, k) = n(n− 1) · · · (n− k+ 1) =n!

(n− k)!

In particular, the permutations of n elements are n!

ExampleLet A = ♠,♣,F, the ordered sequences of 2 elements are:

(♠,♣) , (♠,F) , (♣,♠) , (♣,F) , (F,♠) , (F,♣) .

P(3, 2) =3!1!

= 2 · 3 = 6

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 17 / 44

Page 32: Introduction to Probability Theory - TU Wien

CombinatoricsCombinations

Let A = a1, . . . ,an be a set of n (distinct) elements

Let k 6 n, consider the non-ordered sequence of length kai1 , . . . ,aik

The number of such possible sequences (k-combinations of n) is

C(n,k) =(n

k

)=

n!k!(n− k)!

ExampleLet A = ♠,♣,F, the non-ordered sequences of 2 elements are:

♠,♣ = ♣,♠ , ♠,F = F,♠ , ♣,F = F,♣ .

C(3, 2) =(

32

)3!

1!2!=

63= 3

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 18 / 44

Page 33: Introduction to Probability Theory - TU Wien

Random Variables

A random variable X on a sample space Ω is a real function on Ω:

X : Ω→ R

RemarkIn general we should also require the preimage of Borel sets to bemapped into the σ-algebra of Ω (i.e., X should be measurable), butthat’s not an issue for discrete probability spaces.

Independence of random variablesTwo random variables X and Y are independent if and only if

∀x∀y Pr ((X = x) ∩ (Y = y)) = Pr(X = x) Pr(Y = y)

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 19 / 44

Page 34: Introduction to Probability Theory - TU Wien

Random VariablesExample

We roll two dice, thus having Ω =(

,)

,(

,)

, . . . ,(

,)

assample space. Consider the random variable X = product of the twooutcomes.

We have, e.g., X(

,)= 6 and X

(,)= 18

We write X = a to refer to the set ω ∈ Ω : X(ω) = a

Pr(X = 12) = Pr((

,)

,(

,)

,(

,)

,(

,))

= 19

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 20 / 44

Page 35: Introduction to Probability Theory - TU Wien

Probability Distribution Functions

A random variable X is typically defined using some distributionfunctions:

Discrete randomvariablesNon-cumulative

Probability mass function(or pmf)

fX(a) := Pr(X = a)

Cumulative Cumulativedistribution function (orcdf)

FX(a) := Pr(X 6 a)

Continuous random variablesNon-cumulative Probability density

function (or pdf)∫ba

fX(t) dt = Pr(a < X 6 b)

Cumulative Cumulative distributionfunction (or cdf)

FX(a) := Pr(X 6 a) =∫a−∞ fX(t) dt

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 21 / 44

Page 36: Introduction to Probability Theory - TU Wien

Expectation

The expectation (or expected value) E[X] of a random variable X is

Discrete r.v.

E[X] :=∑x∈X

x fX(x)

The sum is done over the image of XX := x ∈ R : ∃ω ∈ Ω s.t. X(ω) = x.

Continuous r.v.

E[X] :=∫+∞−∞ t fX(t) dt

Absolute convergence of the series/integral is required

ExampleA die is rolled. If the outcome is a prime number, you win 10e,otherwise you lose 4e. What’s the expected value of the game?

E[X] = −416+ 10

16+ 10

16− 4

16+ 10

16− 4

16= −4

12+ 10

12= 3

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 22 / 44

Page 37: Introduction to Probability Theory - TU Wien

Sum of Random Variables

Let X and Y be two random variables, defined on the sample spacesΩX and ΩY and let a, b and c be real parameters. Let

ΩZ = ΩX ×ΩY Z : ΩZ → R Z = aX+ bY + c .

Then Z is a random variable.

Linearity of expectationFurthermore, we have,

E[Z] = E[aX+ bY + c] = a E[X] + b E[Y] + c

Note: this holds even if X and Y are not independent.

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 23 / 44

Page 38: Introduction to Probability Theory - TU Wien

Product of Random Variables

Let X and Y be two random variables, defined on the sample spacesΩX and ΩY and let a, b and c be real parameters.

DefinitionLet

ΩZ = ΩX ×ΩY Z : ΩZ → R Z = XY .

Then Z is a random variable.

ExpectationIn general, we have,

E[Z] = E[XY] 6= E[X]E[Y]

Note: but if X and Y are independent, then E[XY] = E[X]E[Y].

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 24 / 44

Page 39: Introduction to Probability Theory - TU Wien

Functions of a Random Variable

Let X be a random variable on the sample space Ω and let g : R→ Rbe a (measurable) function. Then g(X) is also a random variable.

Theorem (Law of the unconscious statistician)

E[g(X)] =∑x∈X

g(x)fX(x)

(E[g(X)] =

∫+∞−∞ g(t)fX(t) dt

)

NoteIn general, E[g(X)] 6= g (E[X]).

Jensen’s inequality

If g is a convex function (e.g., g : x 7→ x2)

E[g(X)] > g (E[X])

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 25 / 44

Page 40: Introduction to Probability Theory - TU Wien

Conditional Expectation

Let X and Y be two discrete random variables and y ∈ R. We definethe conditional expectation of X, given Y = y as

E[X|Y = y] =∑x∈X

x Pr(X = x|Y = y)

E[X] =∑y∈Y

fY(y) E[X|Y = y] (total prob.)

E

[n∑i=1

Xi

∣∣∣∣∣Y = y

]=

n∑i=1

E[Xi|Y = y] (linearity)

Continuous r.v.The extension to continuous random variables is somewhat morecomplicated, and since we do not need it we are going to skip it. . .

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 26 / 44

Page 41: Introduction to Probability Theory - TU Wien

Examples

We roll 4 dice. What is the expectation of the sum of theoutcomes?

Answer:

E[Z] = E

[4∑i=1

Xi

]=

4∑i=1

E[Xi] = 4 · 3.5 = 14

We pay 10e to play a game: two dice are rolled and we win (ine) the sum of the two outcomes. Additionally, if the twooutcomes are equal, we win further 12e. What’s the expectedvalue of the game? Answer: Let ω1 and ω2 be the two outcomes.Consider the following random variables:

X(ω1,ω2) = ω1 +ω2 Y(ω1,ω2) =

12 ω1 = ω2

0 otherwise

Note that X and Y are not independent. However, because of thelinearity of expectation we have

E[−10 + X+ Y] = −10 + E[X] + E[Y] = −10 + 7 + 2 = −1

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 27 / 44

Page 42: Introduction to Probability Theory - TU Wien

Examples

We roll 4 dice. What is the expectation of the sum of theoutcomes? Answer:

E[Z] = E

[4∑i=1

Xi

]=

4∑i=1

E[Xi] = 4 · 3.5 = 14

We pay 10e to play a game: two dice are rolled and we win (ine) the sum of the two outcomes. Additionally, if the twooutcomes are equal, we win further 12e. What’s the expectedvalue of the game?

Answer: Let ω1 and ω2 be the two outcomes.Consider the following random variables:

X(ω1,ω2) = ω1 +ω2 Y(ω1,ω2) =

12 ω1 = ω2

0 otherwise

Note that X and Y are not independent. However, because of thelinearity of expectation we have

E[−10 + X+ Y] = −10 + E[X] + E[Y] = −10 + 7 + 2 = −1

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 27 / 44

Page 43: Introduction to Probability Theory - TU Wien

Examples

We roll 4 dice. What is the expectation of the sum of theoutcomes? Answer:

E[Z] = E

[4∑i=1

Xi

]=

4∑i=1

E[Xi] = 4 · 3.5 = 14

We pay 10e to play a game: two dice are rolled and we win (ine) the sum of the two outcomes. Additionally, if the twooutcomes are equal, we win further 12e. What’s the expectedvalue of the game? Answer: Let ω1 and ω2 be the two outcomes.Consider the following random variables:

X(ω1,ω2) = ω1 +ω2 Y(ω1,ω2) =

12 ω1 = ω2

0 otherwise

Note that X and Y are not independent. However, because of thelinearity of expectation we have

E[−10 + X+ Y] = −10 + E[X] + E[Y] = −10 + 7 + 2 = −1

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 27 / 44

Page 44: Introduction to Probability Theory - TU Wien

Examples

We roll one die. If the outcome is 6, then we also roll a seconddie. What is the expectation of the sum of the outcomes?

Answer: LetX be describing the first outcome (i.e., X

( )= 1, X

( )= 2,

etc.)Y be describing the second outcome (with Y = 0 if the die is not

rolled)Again, X and Y are not independent. We want to computeE[X+ Y] = E[X] + E[Y]. E[X] is clearly equal to 3.5, as for E[Y]

E[Y] = fX(6) E[Y|X = 6] + (1 − fX(6)) E[Y|X 6= 6]

Since E[Y|X = 6] = 3.5 and E[Y|X 6= 6] = 0, we finally obtain

E[X+ Y] = 3.5 +3.56

=4912≈ 4.08

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 28 / 44

Page 45: Introduction to Probability Theory - TU Wien

Examples

We roll one die. If the outcome is 6, then we also roll a seconddie. What is the expectation of the sum of the outcomes?Answer: LetX be describing the first outcome (i.e., X

( )= 1, X

( )= 2,

etc.)Y be describing the second outcome (with Y = 0 if the die is not

rolled)Again, X and Y are not independent. We want to computeE[X+ Y] = E[X] + E[Y]. E[X] is clearly equal to 3.5, as for E[Y]

E[Y] = fX(6) E[Y|X = 6] + (1 − fX(6)) E[Y|X 6= 6]

Since E[Y|X = 6] = 3.5 and E[Y|X 6= 6] = 0, we finally obtain

E[X+ Y] = 3.5 +3.56

=4912≈ 4.08

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 28 / 44

Page 46: Introduction to Probability Theory - TU Wien

Binomial DistributionDefinition

Bernoulli indicatorAn experiment has probability p to succeeds and 1− p to fail (e.g., wetoss a biased coin). We define the following random variable X:

X :=

1 if the experiment succeeds

0 otherwise

Binomial distributionWe repeat the Bernoulli experiment n times (independently, and withthe same distribution). We now want to count the number ofsuccessfull experiments, and hence define the new random variable Y:

Y :=

n∑i=1

Xi

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 29 / 44

Page 47: Introduction to Probability Theory - TU Wien

Binomial DistributionProperties

Expected valuesBernoulli ∀i E[Xi] = 1 · p+ 0 · (1 − p) = p

Binomial E[Y] = E

[n∑i=1

Xi

]=

n∑i=1

E[Xi] = np

Distribution function of Binomial r.v.

fY(k) = Pr(Y = k) =

(n

k

)pk(1 − p)n−k

Reminder: Binomial theorem

∀x∀y (x+ y)n =

n∑k=0

(n

k

)xkyn−k

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 30 / 44

Page 48: Introduction to Probability Theory - TU Wien

Geometric DistributionDefinition

We repeat a Bernoulli trials process until an experiment succeeds. LetX be the random variable which gives the number of tests we have toperform until the experiment finally succeeds.

Distribution function of the Geometric r.v.

fX(n) = Pr(X = n) = (1 − p)n−1p

Why geometric?Let’s prove that fX is really a distribution (i.e., that is sums to one). Letq := 1 − p.

∞∑n=1

fX(n) =

∞∑n=1

(1 − p)n−1p = p

∞∑n=0

qn =p

1 − q= 1

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 31 / 44

Page 49: Introduction to Probability Theory - TU Wien

Geometric DistributionProperties

Expectation

E[X] =∞∑n=1

nfX(n) =

∞∑n=1

n(1 − p)n−1p

= p

∞∑n=1

nqn−1 = p

∞∑n=1

d(qn)dq

= pd

dq

∞∑n=1

qn

= pd

dq

[1

1 − q− 1]= p

ddq

[q

1 − q

]= p

1(1 − q)2 =

1p

The geometric r.v. is memoryless

Pr(X = n+ k |X > k) = Pr(X = n)

Which means that if some number has not come out for 20 weeks inthe Lotto game, it isn’t more luckily to be extracted. . .

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 32 / 44

Page 50: Introduction to Probability Theory - TU Wien

Example: Coupon Collector’s Problem

There are n different coupons and we want to collect all of them.Each day we get a new (uniformly random) coupon. How long does ittake, on average, to finish the collection?

Answer – 1/2Consider the following random variables:

Xi Number of coupons we get while we have exactly i− 1 coupons

X Number of coupons we get until we have all the coupons

We have

X =

n∑i=1

Xi ⇒ E[X] =n∑i=1

E[Xi]

Each Xi is a geometric r.v., with

pi = 1 −i− 1n

=n− i+ 1

n

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 33 / 44

Page 51: Introduction to Probability Theory - TU Wien

Example: Coupon Collector’s Problem

Answer – 2/2We finally have

E[Xi] =1pi

=n

n− i+ 1

E[X] =n∑i=1

E[Xi] =n∑i=1

n

n− i+ 1= n

n∑i=1

1i= nHn

If n = 80 coupons, then E[X] ≈ 397 days.

Harmonic numbers

Hn :=

n∑i=1

1i= lnn+ γ+O

(1n

)with γ ≈ 0.577 being the Euler-Mascheroni constant

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 34 / 44

Page 52: Introduction to Probability Theory - TU Wien

Example: Randomized Quicksort

When sorting n elements, mergesort makes, on both the worst andaverage cases, M(n) = n log2(n) +O(n) comparisons. Weimplement quicksort choosing the pivot uniformly at random. What isthe average number of comparisons made by quicksort?

Answer – 1/3Let [x1, x2, . . . , xn] be the sorted list of elements and ∀j∀i < j let Xijbe the following Bernoulli random variable

Xij :=

1 if xi and xj get compared during the execution

0 otherwise

and let X be

X :=

n−1∑i=1

n∑j=i+1

Xij

We want to compute E[X].F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 35 / 44

Page 53: Introduction to Probability Theory - TU Wien

Example: Randomized Quicksort

Answer – 2/3By linearity of expectation we have

E[X] =n−1∑i=1

n∑j=i+1

E[Xij]

,

and then we just need to compute the probability that a generic pair(xi, xj) gets compared during the execution of the algorithm.Consider the segment [xi, xi+1, . . . , xj]; xi and xj are compared ifand only if one of the two is chosen as pivot before all theintermediate values xi+1, . . . , xj−1 (otherwise they are split indifferent classes and never compared against each other). Theprobability for this to happen is

pij =2

j− i+ 1⇒ E

[Xij]=

2j− i+ 1

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 36 / 44

Page 54: Introduction to Probability Theory - TU Wien

Example: Randomized Quicksort

Answer – 3/3

E[X] =n−1∑i=1

n∑j=i+1

E[Xij]=

n−1∑i=1

n∑j=i+1

2j− i+ 1

=

n−1∑i=1

n−i+1∑k=2

2k=

n∑k=2

n+1−k∑i=1

2k=

n∑k=2

2(n+ 1 − k)

k

= (n+ 1)

[n∑k=2

2k

]− 2(n− 1) = 2(n+ 1)

[n∑k=1

1k

]− 4n

= 2(n+ 1)Hn−4n = 2n ln(n) +O(n)

=2

log2(e)n log2(n) +O(n) ≈ 1.39M(n)

Randomized quicksort makes about 39% more comparisons thanmergesort on the average cases.

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 37 / 44

Page 55: Introduction to Probability Theory - TU Wien

Moments

Let X be a discrete random variable

Raw momentsThe n-th (raw) moment of X is

E [Xn] =∑x∈X

xnfX(x)

Central momentsThe n-th central moment of X is

E [(X− E[X])n] =∑x∈X

(x− µ)n fX(x)

where µ := E[X].

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 38 / 44

Page 56: Introduction to Probability Theory - TU Wien

Variance

Definition

Var[X] = σ2[X] := E[(X− E[X])2

]Note

Var[X] = E[(X− E[X])2

]= E

[X2 − 2X E[X] + (E[X])2

]= E

[X2]− 2 E[X] E[X] + (E[X])2

= E[X2]− (E[X])2

Sum of independent variablesLet X1,X2, . . . ,Xn be mutually independent random variables. Then

Var

[n∑i=1

Xi

]=

n∑i=1

Var [Xi]

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 39 / 44

Page 57: Introduction to Probability Theory - TU Wien

Examples

Bernoulli distribution

Var[X] = p(1 − p)

Binomial distribution

Var[X] = np(1 − p)

Geometric distribution

Var[X] =1 − p

p2

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 40 / 44

Page 58: Introduction to Probability Theory - TU Wien

Markov’s Inequality

TheoremLet X be a non-negative r.v. Then,

∀a ∈ R,a > 0 Pr(X > a) 6E[X]a

ExampleA coin is tossed 100 times. Let X be the number of heads outcomes(binomial variable with p = 1

2 ). We have

E[X] = 50 Var[X] = 100 · 12· 1

2= 25

An upper bound on the probability that we have at least 80 heads isgiven by

Pr(X > 80) 6E[X]80

=58= 62.5%

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 41 / 44

Page 59: Introduction to Probability Theory - TU Wien

Chebyshev’s Inequality

TheoremLet X be a r.v. Then,

∀a ∈ R,a > 0 Pr (|X− E[X]| > a) 6Var[X]a2

ExampleAgain, a coin is tossed 100 times. An upper bound on the probabilitythat we have at least 80 heads is given by

Pr(X > 80) =Pr (|X− E[X]| > 30)

2

612

Var[X]302 =

12

25900

=1

72≈ 1.39%

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 42 / 44

Page 60: Introduction to Probability Theory - TU Wien

Chernoff BoundFor the Sum of Poisson Trials

Theorem

X :=

n∑i=1

Xi, with Xi Bernoulli r.v. with parameter pi

Let µ := E[X], then

∀δ ∈ ]0, 1[ Pr (|X− µ| > δµ) 6 2 exp(−µδ2

3

)ExampleAgain, a coin is tossed 100 times. Give an upper bound on theprobability that we have at least 80 heads. Let µδ = 30⇒ δ = 3

5 , then

Pr(X > 80) 6 exp(−µδ2

3

)= exp

(−50 · 93 · 25

)= exp(−6) ≈ 0.248%

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 43 / 44

Page 61: Introduction to Probability Theory - TU Wien

References

C.M. Grinstead and J.L. Snell. Introduction to Probability.http://math.dartmouth.edu/~prob/prob/prob.pdf. Amer.Math. Soc., 1997.

M. Loève. Probability Theory I. Springer, 1977.

M. Mitzenmacher and E. Upfal. Probability and Computing:Randomized Algorithms and Probabilistic Analysis. CambridgeUniversity Press, 2005.

F. Versaci (TU Wien) Introduction to Probability Theory 11 November, 2013 44 / 44