194
Basic Stochastics and the idea of testing Dirk Metzler http://evol.bio.lmu.de/_statgen October 10, 2017 1 / 61

Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Basic Stochastics and the idea of testing

Dirk Metzler

http://evol.bio.lmu.de/_statgen

October 10, 2017

1 / 61

Page 2: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Assume a small population of 100 individuals, and a neutralallele A that has frequency 0.3 in this generation.

What will be the frequency X of A in the next generation?

We don’t know, as X is a random variable .

However, we can ask, for example, forEX , the expectation value of X , or for

Pr(X = 0.32) , the probability that X takes a value of 0.32.Even these values (especially the second on) depend on ourmodel assumptions.

2 / 61

Page 3: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Assume a small population of 100 individuals, and a neutralallele A that has frequency 0.3 in this generation.

What will be the frequency X of A in the next generation?

We don’t know, as X is a random variable .

However, we can ask, for example, forEX , the expectation value of X , or for

Pr(X = 0.32) , the probability that X takes a value of 0.32.Even these values (especially the second on) depend on ourmodel assumptions.

2 / 61

Page 4: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Assume a small population of 100 individuals, and a neutralallele A that has frequency 0.3 in this generation.

What will be the frequency X of A in the next generation?

We don’t know, as X is a random variable .

However, we can ask, for example, forEX , the expectation value of X , or for

Pr(X = 0.32) , the probability that X takes a value of 0.32.Even these values (especially the second on) depend on ourmodel assumptions.

2 / 61

Page 5: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Assume a small population of 100 individuals, and a neutralallele A that has frequency 0.3 in this generation.

What will be the frequency X of A in the next generation?

We don’t know, as X is a random variable .

However, we can ask, for example, forEX , the expectation value of X ,

or forPr(X = 0.32) , the probability that X takes a value of 0.32.Even these values (especially the second on) depend on ourmodel assumptions.

2 / 61

Page 6: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Assume a small population of 100 individuals, and a neutralallele A that has frequency 0.3 in this generation.

What will be the frequency X of A in the next generation?

We don’t know, as X is a random variable .

However, we can ask, for example, forEX , the expectation value of X , or for

Pr(X = 0.32) , the probability that X takes a value of 0.32.

Even these values (especially the second on) depend on ourmodel assumptions.

2 / 61

Page 7: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Assume a small population of 100 individuals, and a neutralallele A that has frequency 0.3 in this generation.

What will be the frequency X of A in the next generation?

We don’t know, as X is a random variable .

However, we can ask, for example, forEX , the expectation value of X , or for

Pr(X = 0.32) , the probability that X takes a value of 0.32.Even these values (especially the second on) depend on ourmodel assumptions.

2 / 61

Page 8: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

1 Random Variables and Distributions

2 The binomial distribution

3 Principle of statistical testing

4 Expectation value

5 Variance and Correlation

3 / 61

Page 9: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

Contents

1 Random Variables and Distributions

2 The binomial distribution

3 Principle of statistical testing

4 Expectation value

5 Variance and Correlation

4 / 61

Page 10: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

We start with a simpler Example: Rolling a dice, W is the result ofthe next trial.

S = {1,2, . . . ,6}Pr(W = 1) = · · · = Pr(W = 6) = 1

6( Pr(W = x) = 1

6 for all x ∈ {1, . . . ,6} )

5 / 61

Page 11: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

A Random Variable is aresult of a random incident or experiment.

The state space Sof a random variable is the set of possible values.

The distribution of a random variable Xassigns to each set A ⊆ Sthe probability Pr(X ∈ A)that X takes a value in A.

In general, we use capitals forrandom variables (X ,Y ,Z , . . . ),

and small letters (x , y , z, . . . ) for (possible) fixed values.

6 / 61

Page 12: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

A Random Variable is aresult of a random incident or experiment.

The state space Sof a random variable is the set of possible values.

The distribution of a random variable Xassigns to each set A ⊆ Sthe probability Pr(X ∈ A)that X takes a value in A.

In general, we use capitals forrandom variables (X ,Y ,Z , . . . ),

and small letters (x , y , z, . . . ) for (possible) fixed values.

6 / 61

Page 13: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

A Random Variable is aresult of a random incident or experiment.

The state space Sof a random variable is the set of possible values.

The distribution of a random variable Xassigns to each set A ⊆ Sthe probability Pr(X ∈ A)that X takes a value in A.

In general, we use capitals forrandom variables (X ,Y ,Z , . . . ),

and small letters (x , y , z, . . . ) for (possible) fixed values.

6 / 61

Page 14: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

A Random Variable is aresult of a random incident or experiment.

The state space Sof a random variable is the set of possible values.

The distribution of a random variable Xassigns to each set A ⊆ Sthe probability Pr(X ∈ A)that X takes a value in A.

In general, we use capitals forrandom variables (X ,Y ,Z , . . . ),

and small letters (x , y , z, . . . ) for (possible) fixed values.

6 / 61

Page 15: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

Notations for events

The event U that X takes a value in A can be written with curlybrackets:

U = {X ∈ A}

We can interpret this as the set of results (elementary events) forwhich the event is fulfilled.

Thus, events have a lot in common with sets, and similar notations asfor sets are used for events U and V :

U ∩ V = U “and” V

is the event that takes place if and only if both U and V take place.

U ∪ V = U “or”V

is the event that takes place if and only if U or V (or both) take place.

7 / 61

Page 16: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

Notations for events

The event U that X takes a value in A can be written with curlybrackets:

U = {X ∈ A}

We can interpret this as the set of results (elementary events) forwhich the event is fulfilled.Thus, events have a lot in common with sets, and similar notations asfor sets are used for events U and V :

U ∩ V = U “and” V

is the event that takes place if and only if both U and V take place.

U ∪ V = U “or”V

is the event that takes place if and only if U or V (or both) take place.

7 / 61

Page 17: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

ExampleLet X and Y be the results of two dice rolls, A = {1,2,3}, andB = {1,3,5}. Then:

{X ∈ A} ∩ {X ∈ B} = {X ∈ A ∩ B} = {X ∈ {1,3}}= {X = 1} ∪ {X = 3}

and

{Y ∈ A} ∪ {Y ∈ B} = {Y ∈ A ∪ B} = {Y ∈ {1,2,3,5}}

and{X ∈ A} ∩ {Y ∈ B} = {(X ,Y ) ∈ A× B},where

A×B = {(1,1), (1,3), (1,5), (2,1), (2,3), (2,5), (3,1), (3,3), (3,5)}.

8 / 61

Page 18: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

ExampleLet X and Y be the results of two dice rolls, A = {1,2,3}, andB = {1,3,5}. Then:

{X ∈ A} ∩ {X ∈ B} = {X ∈ A ∩ B} = {X ∈ {1,3}}= {X = 1} ∪ {X = 3}

and

{Y ∈ A} ∪ {Y ∈ B} = {Y ∈ A ∪ B} = {Y ∈ {1,2,3,5}}

and{X ∈ A} ∩ {Y ∈ B} =

{(X ,Y ) ∈ A× B},where

A×B = {(1,1), (1,3), (1,5), (2,1), (2,3), (2,5), (3,1), (3,3), (3,5)}.

8 / 61

Page 19: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

ExampleLet X and Y be the results of two dice rolls, A = {1,2,3}, andB = {1,3,5}. Then:

{X ∈ A} ∩ {X ∈ B} = {X ∈ A ∩ B} = {X ∈ {1,3}}= {X = 1} ∪ {X = 3}

and

{Y ∈ A} ∪ {Y ∈ B} = {Y ∈ A ∪ B} = {Y ∈ {1,2,3,5}}

and{X ∈ A} ∩ {Y ∈ B} = {(X ,Y ) ∈ A× B},

where

A×B = {(1,1), (1,3), (1,5), (2,1), (2,3), (2,5), (3,1), (3,3), (3,5)}.

8 / 61

Page 20: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

ExampleLet X and Y be the results of two dice rolls, A = {1,2,3}, andB = {1,3,5}. Then:

{X ∈ A} ∩ {X ∈ B} = {X ∈ A ∩ B} = {X ∈ {1,3}}= {X = 1} ∪ {X = 3}

and

{Y ∈ A} ∪ {Y ∈ B} = {Y ∈ A ∪ B} = {Y ∈ {1,2,3,5}}

and{X ∈ A} ∩ {Y ∈ B} = {(X ,Y ) ∈ A× B},where

A×B = {(1,1), (1,3), (1,5), (2,1), (2,3), (2,5), (3,1), (3,3), (3,5)}.

8 / 61

Page 21: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

The intersection

{X ∈ A} ∩ {X ∈ B} = {X ∈ A,X ∈ B} = {X ∈ A ∩ B}

is then the event that X takes a value that is in A and in B.

The join{X ∈ A} ∪ {X ∈ B} = {X ∈ A ∪ B}

is the event that the event that X takes a value in A or in B (orboth).Sometimes the curly brackets are not written:

Pr(X ∈ A,X ∈ B) = Pr({X ∈ A,X ∈ B})

9 / 61

Page 22: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

The intersection

{X ∈ A} ∩ {X ∈ B} = {X ∈ A,X ∈ B} = {X ∈ A ∩ B}

is then the event that X takes a value that is in A and in B.The join

{X ∈ A} ∪ {X ∈ B} = {X ∈ A ∪ B}

is the event that the event that X takes a value in A or in B (orboth).

Sometimes the curly brackets are not written:

Pr(X ∈ A,X ∈ B) = Pr({X ∈ A,X ∈ B})

9 / 61

Page 23: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

The intersection

{X ∈ A} ∩ {X ∈ B} = {X ∈ A,X ∈ B} = {X ∈ A ∩ B}

is then the event that X takes a value that is in A and in B.The join

{X ∈ A} ∪ {X ∈ B} = {X ∈ A ∪ B}

is the event that the event that X takes a value in A or in B (orboth).Sometimes the curly brackets are not written:

Pr(X ∈ A,X ∈ B) = Pr({X ∈ A,X ∈ B})

9 / 61

Page 24: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

Calculation rules:

Example Rolling a dice W :

Pr(W ∈ {2, 3}) =26

=16+

16

= Pr(W = 2) + Pr(W = 3)

Pr(W ∈ {1, 2} ∪ {3, 4}) =46

=26+

26

= Pr(W ∈ {1, 2}) + Pr(W ∈ {3, 4})

Caution:

Pr(W ∈ {2, 3}) + Pr(W ∈ {3, 4}) =26+

26

=46

6= Pr(W ∈ {2, 3, 4}) =36

10 / 61

Page 25: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

Example: rolling two dice (W1,W2):Let W1 and W2 the result of dice 1 and dice 2.

Pr(W1 ∈ {4},W2 ∈ {2,3,4})= Pr((W1,W2) ∈ {(4,2), (4,3), (4,4)})

=336

=16· 3

6= Pr(W1 ∈ {4}) · Pr(W2 ∈ {2,3,4})

In general:

Pr(W1 ∈ A,W2 ∈ B) = Pr(W1 ∈ A) · Pr(W2 ∈ B)

for all sets A,B ⊆ {1,2, . . . ,6}

11 / 61

Page 26: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

If S is the sum of the results S = W1 + W2,what is the probability that S = 5,if dice 1 shows W1 = 2?

Pr(S = 5|W1 = 2) != Pr(W2 = 3)

= 16 = 1/36

1/6 = Pr(S=5,W1=2)Pr(W1=2)

What is the probability S ∈ {4,5} under the condition W1 ∈ {1,6}?

Pr(S ∈ {4,5}|W1 ∈ {1,6})

=Pr(S ∈ {4,5},W1 ∈ {1,6})

Pr(W1 ∈ {1,6})

=Pr(W2 ∈ {3,4},W1 = 1)

Pr(W1 ∈ {1,6})

=2/362/6

=16

12 / 61

Page 27: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

If S is the sum of the results S = W1 + W2,what is the probability that S = 5,if dice 1 shows W1 = 2?

Pr(S = 5|W1 = 2) != Pr(W2 = 3)

= 16 = 1/36

1/6 = Pr(S=5,W1=2)Pr(W1=2)

What is the probability S ∈ {4,5} under the condition W1 ∈ {1,6}?

Pr(S ∈ {4,5}|W1 ∈ {1,6})

=Pr(S ∈ {4,5},W1 ∈ {1,6})

Pr(W1 ∈ {1,6})

=Pr(W2 ∈ {3,4},W1 = 1)

Pr(W1 ∈ {1,6})

=2/362/6

=16

12 / 61

Page 28: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

Calculation rules:

0 ≤ Pr(U) ≤ 1 for each event U (in the probability space).

Pr(X ∈ S) = 1 if X is a random variable with state space S.

If the events U and V exclude each other, then

Pr(U ∪ V ) = Pr(U) + Pr(V )

The general rule is the inclusion-exclusion formula

Pr(U ∪ V ) = Pr(U) + Pr(V )− Pr(U ∩ V )

Definition of conditional probabilities:The probability of U under the condition V

Pr(U|V ) :=Pr(U,V )

Pr(V )

“Conditional probability of U given V ”

Note: Pr(U, V ) = Pr(V ) · Pr(U| V )

13 / 61

Page 29: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

Calculation rules:

0 ≤ Pr(U) ≤ 1 for each event U (in the probability space).

Pr(X ∈ S) = 1 if X is a random variable with state space S.

If the events U and V exclude each other, then

Pr(U ∪ V ) = Pr(U) + Pr(V )

The general rule is the inclusion-exclusion formula

Pr(U ∪ V ) = Pr(U) + Pr(V )− Pr(U ∩ V )

Definition of conditional probabilities:The probability of U under the condition V

Pr(U|V ) :=Pr(U,V )

Pr(V )

“Conditional probability of U given V ”

Note: Pr(U, V ) = Pr(V ) · Pr(U| V )

13 / 61

Page 30: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

How to say

Pr(X ∈ A, Y ∈ B) = Pr(X ∈ A) · Pr(Y ∈ B | X ∈ A)

in words:

The probability of {X ∈ A, Y ∈ B} can be computed in two steps:

First, the event {X ∈ A} must take place.

Multiply its probability with the conditional probability of {Y ∈ B},given that {X ∈ A} is already known to take place.

14 / 61

Page 31: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

How to say

Pr(X ∈ A, Y ∈ B) = Pr(X ∈ A) · Pr(Y ∈ B | X ∈ A)

in words:

The probability of {X ∈ A, Y ∈ B} can be computed in two steps:

First, the event {X ∈ A} must take place.

Multiply its probability with the conditional probability of {Y ∈ B},given that {X ∈ A} is already known to take place.

14 / 61

Page 32: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

Stochastic Independence of events

Definition (stochastic independence)Two events U and V are (stochastically) independent, if the identity

Pr(U,V ) = Pr(U) · Pr(V )

holds for all events U, V .

Note that Pr(U,V ) = Pr(U) · Pr(V ) is equivalent to

Pr(U|V ) = Pr(U) and also to Pr(V |U) = Pr(V )

15 / 61

Page 33: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

Stochastic Independence of events

Definition (stochastic independence)Two events U and V are (stochastically) independent, if the identity

Pr(U,V ) = Pr(U) · Pr(V )

holds for all events U, V .

Note that Pr(U,V ) = Pr(U) · Pr(V ) is equivalent to

Pr(U|V ) = Pr(U) and also to Pr(V |U) = Pr(V )

15 / 61

Page 34: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

Stochastic Independence of randomvariables

Definition (stochastic independence)Two random variables X and Y are (stochastically) independent, if theidentity

Pr(X ∈ A,Y ∈ B) = Pr(X ∈ A) · Pr(Y ∈ B)

holds for all (measurable) subsets A and B of the state spaces of Xand Y .

Example:

Tossing two dice:X = result dice 1, Y = result dice 2.

Pr(X = 2, Y = 5) =1

36=

16· 1

6= Pr(X = 2) · Pr(Y = 5)

16 / 61

Page 35: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Random Variables and Distributions

Stochastic Independence of randomvariables

Definition (stochastic independence)Two random variables X and Y are (stochastically) independent, if theidentity

Pr(X ∈ A,Y ∈ B) = Pr(X ∈ A) · Pr(Y ∈ B)

holds for all (measurable) subsets A and B of the state spaces of Xand Y .

Example:

Tossing two dice:X = result dice 1, Y = result dice 2.

Pr(X = 2, Y = 5) =1

36=

16· 1

6= Pr(X = 2) · Pr(Y = 5)

16 / 61

Page 36: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Contents

1 Random Variables and Distributions

2 The binomial distribution

3 Principle of statistical testing

4 Expectation value

5 Variance and Correlation

17 / 61

Page 37: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Bernoulli distribution

A Bernoulli experiment is an experiment with two possible oucomes“success” and “fail”, or 1 or 0.

Bernoulli random variable X :State space S = {0,1}.Distribution:

Pr(X = 1) = pPr(X = 0) = 1− p

The parameter p ∈ [0,1] is the success probability.

18 / 61

Page 38: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Bernoulli distribution

A Bernoulli experiment is an experiment with two possible oucomes“success” and “fail”, or 1 or 0.

Bernoulli random variable X :State space S = {0,1}.Distribution:

Pr(X = 1) = pPr(X = 0) = 1− p

The parameter p ∈ [0,1] is the success probability.

18 / 61

Page 39: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Bernoulli distribution

A Bernoulli experiment is an experiment with two possible oucomes“success” and “fail”, or 1 or 0.

Bernoulli random variable X :State space S = {0,1}.Distribution:

Pr(X = 1) = pPr(X = 0) = 1− p

The parameter p ∈ [0,1] is the success probability.

18 / 61

Page 40: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Bernoulli distribution

Examples:

Tossing a coin: Possible outcomes are “head” and “tail”

Does the Drosophila have a mutation that causes white eyes?Possible outcomes are “yes” or “no”.

The sex of a newborn child has the values “male” or “female”.

19 / 61

Page 41: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Bernoulli distribution

Examples:

Tossing a coin: Possible outcomes are “head” and “tail”

Does the Drosophila have a mutation that causes white eyes?Possible outcomes are “yes” or “no”.

The sex of a newborn child has the values “male” or “female”.

19 / 61

Page 42: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Bernoulli distribution

Examples:

Tossing a coin: Possible outcomes are “head” and “tail”

Does the Drosophila have a mutation that causes white eyes?Possible outcomes are “yes” or “no”.

The sex of a newborn child has the values “male” or “female”.

19 / 61

Page 43: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Assume a Bernoulli experiment (for example tossing a coin) withsuccess probability p is repeated n times independently.

What is the probability that it...

1 ...alway succeeds?p · p · p · · · p = pn

2 ...always fails?

(1− p) · (1− p) · · · (1− p) = (1− p)n

3 ...first succeeds k times and then fails n − k times?

pk · (1− p)n−k

4 ...succeeds in total k times and fails the other n − k times?(nk

)· pk · (1− p)n−k

20 / 61

Page 44: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Assume a Bernoulli experiment (for example tossing a coin) withsuccess probability p is repeated n times independently.What is the probability that it...

1 ...alway succeeds?

p · p · p · · · p = pn

2 ...always fails?

(1− p) · (1− p) · · · (1− p) = (1− p)n

3 ...first succeeds k times and then fails n − k times?

pk · (1− p)n−k

4 ...succeeds in total k times and fails the other n − k times?(nk

)· pk · (1− p)n−k

20 / 61

Page 45: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Assume a Bernoulli experiment (for example tossing a coin) withsuccess probability p is repeated n times independently.What is the probability that it...

1 ...alway succeeds?p · p · p · · · p = pn

2 ...always fails?

(1− p) · (1− p) · · · (1− p) = (1− p)n

3 ...first succeeds k times and then fails n − k times?

pk · (1− p)n−k

4 ...succeeds in total k times and fails the other n − k times?(nk

)· pk · (1− p)n−k

20 / 61

Page 46: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Assume a Bernoulli experiment (for example tossing a coin) withsuccess probability p is repeated n times independently.What is the probability that it...

1 ...alway succeeds?p · p · p · · · p = pn

2 ...always fails?

(1− p) · (1− p) · · · (1− p) = (1− p)n

3 ...first succeeds k times and then fails n − k times?

pk · (1− p)n−k

4 ...succeeds in total k times and fails the other n − k times?(nk

)· pk · (1− p)n−k

20 / 61

Page 47: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Assume a Bernoulli experiment (for example tossing a coin) withsuccess probability p is repeated n times independently.What is the probability that it...

1 ...alway succeeds?p · p · p · · · p = pn

2 ...always fails?

(1− p) · (1− p) · · · (1− p) = (1− p)n

3 ...first succeeds k times and then fails n − k times?

pk · (1− p)n−k

4 ...succeeds in total k times and fails the other n − k times?(nk

)· pk · (1− p)n−k

20 / 61

Page 48: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Assume a Bernoulli experiment (for example tossing a coin) withsuccess probability p is repeated n times independently.What is the probability that it...

1 ...alway succeeds?p · p · p · · · p = pn

2 ...always fails?

(1− p) · (1− p) · · · (1− p) = (1− p)n

3 ...first succeeds k times and then fails n − k times?

pk · (1− p)n−k

4 ...succeeds in total k times and fails the other n − k times?(nk

)· pk · (1− p)n−k

20 / 61

Page 49: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Assume a Bernoulli experiment (for example tossing a coin) withsuccess probability p is repeated n times independently.What is the probability that it...

1 ...alway succeeds?p · p · p · · · p = pn

2 ...always fails?

(1− p) · (1− p) · · · (1− p) = (1− p)n

3 ...first succeeds k times and then fails n − k times?

pk · (1− p)n−k

4 ...succeeds in total k times and fails the other n − k times?

(nk

)· pk · (1− p)n−k

20 / 61

Page 50: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Assume a Bernoulli experiment (for example tossing a coin) withsuccess probability p is repeated n times independently.What is the probability that it...

1 ...alway succeeds?p · p · p · · · p = pn

2 ...always fails?

(1− p) · (1− p) · · · (1− p) = (1− p)n

3 ...first succeeds k times and then fails n − k times?

pk · (1− p)n−k

4 ...succeeds in total k times and fails the other n − k times?(nk

)· pk · (1− p)n−k

20 / 61

Page 51: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Note(nk

)= n!

k!·(n−k)! (“n choose k ”) is the number of possibilities to choose ksuccesses in n trials.

21 / 61

Page 52: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Binomial distribution

Let X be the number of successes in n independent trials withsuccess probability of p each. Then,

Pr(X = k) =(

nk

)pk · (1− p)n−k

holds for all k ∈ {0,1, . . . ,n} and X is said to be binomially distributed,for short:

X ∼ bin(n,p).

22 / 61

Page 53: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

0.30

probabilities of bin(n=10,p=0.2)

k

●● ● ● ●

23 / 61

Page 54: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

0 20 40 60 80 100

0.00

0.02

0.04

0.06

0.08

0.10

probabilities of bin(n=100,p=0.2)

k

●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

23 / 61

Page 55: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

With the binomial distribution we can treat our initial question

Assume in a small populaiton of n = 100 individuals the neutralallele A has a frequency of 0.3.

How probable is it that X , the frequency of A in the nextgenerarion is 0.32?

Pr(X = 0.32) =?

We can only answer this on the basis of a probabilistic model,and the answer will depend on how we model the population.

25 / 61

Page 56: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

With the binomial distribution we can treat our initial question

Assume in a small populaiton of n = 100 individuals the neutralallele A has a frequency of 0.3.

How probable is it that X , the frequency of A in the nextgenerarion is 0.32?

Pr(X = 0.32) =?

We can only answer this on the basis of a probabilistic model,and the answer will depend on how we model the population.

25 / 61

Page 57: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

With the binomial distribution we can treat our initial question

Assume in a small populaiton of n = 100 individuals the neutralallele A has a frequency of 0.3.

How probable is it that X , the frequency of A in the nextgenerarion is 0.32?

Pr(X = 0.32) =?

We can only answer this on the basis of a probabilistic model,and the answer will depend on how we model the population.

25 / 61

Page 58: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

With the binomial distribution we can treat our initial question

Assume in a small populaiton of n = 100 individuals the neutralallele A has a frequency of 0.3.

How probable is it that X , the frequency of A in the nextgenerarion is 0.32?

Pr(X = 0.32) =?

We can only answer this on the basis of a probabilistic model,and the answer will depend on how we model the population.

25 / 61

Page 59: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

With the binomial distribution we can treat our initial question

Assume in a small populaiton of n = 100 individuals the neutralallele A has a frequency of 0.3.

How probable is it that X , the frequency of A in the nextgenerarion is 0.32?

Pr(X = 0.32) =?

We can only answer this on the basis of a probabilistic model,and the answer will depend on how we model the population.

25 / 61

Page 60: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Modeling approachWe make a few simplifying assumptions:

Discrete generationsThe population is haploid, that is, each individual hasexactly one parent in the generation before.constant population size n = 100

Pr(X = 0.32) still depends on whether few individuals havemany offspring or whether all individuals have similar offspringnumbers. Pr(X = 0.32) is only defined with additionalassumptions, e.g.:

Each individual chooses its parent purely randomly in thegeneration before.

“purely randomly” means independent of all others and allpotential parents with the same probability.

26 / 61

Page 61: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Modeling approachWe make a few simplifying assumptions:

Discrete generationsThe population is haploid, that is, each individual hasexactly one parent in the generation before.constant population size n = 100

Pr(X = 0.32) still depends on whether few individuals havemany offspring or whether all individuals have similar offspringnumbers.

Pr(X = 0.32) is only defined with additionalassumptions, e.g.:

Each individual chooses its parent purely randomly in thegeneration before.

“purely randomly” means independent of all others and allpotential parents with the same probability.

26 / 61

Page 62: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Modeling approachWe make a few simplifying assumptions:

Discrete generationsThe population is haploid, that is, each individual hasexactly one parent in the generation before.constant population size n = 100

Pr(X = 0.32) still depends on whether few individuals havemany offspring or whether all individuals have similar offspringnumbers. Pr(X = 0.32) is only defined with additionalassumptions, e.g.:

Each individual chooses its parent purely randomly in thegeneration before.

“purely randomly” means independent of all others and allpotential parents with the same probability.

26 / 61

Page 63: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Modeling approachWe make a few simplifying assumptions:

Discrete generationsThe population is haploid, that is, each individual hasexactly one parent in the generation before.constant population size n = 100

Pr(X = 0.32) still depends on whether few individuals havemany offspring or whether all individuals have similar offspringnumbers. Pr(X = 0.32) is only defined with additionalassumptions, e.g.:

Each individual chooses its parent purely randomly in thegeneration before.

“purely randomly” means independent of all others and allpotential parents with the same probability.

26 / 61

Page 64: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Modeling approachWe make a few simplifying assumptions:

Discrete generationsThe population is haploid, that is, each individual hasexactly one parent in the generation before.constant population size n = 100

Pr(X = 0.32) still depends on whether few individuals havemany offspring or whether all individuals have similar offspringnumbers. Pr(X = 0.32) is only defined with additionalassumptions, e.g.:

Each individual chooses its parent purely randomly in thegeneration before.

“purely randomly” means independent of all others and allpotential parents with the same probability.

26 / 61

Page 65: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Our assumptions imply that each individuals of the nextgenerations have a probability of 0.3 to obtain allele A, and theyget their alleles independently of each other.

Therefore, the number K of individuels who get allele A isbinomially distributed with n = 100 and p = 0.3:

K ∼ bin(n = 100,p = 0.3)

For X = K/n follows:

Pr(X = 0.32) = Pr(K = 32) =(

n32

)· p32 · (1− p)100−32

=

(10032

)· 0.332 · 0.768 ≈ 0.078

27 / 61

Page 66: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Our assumptions imply that each individuals of the nextgenerations have a probability of 0.3 to obtain allele A, and theyget their alleles independently of each other.

Therefore, the number K of individuels who get allele A isbinomially distributed with n = 100 and p = 0.3:

K ∼ bin(n = 100,p = 0.3)

For X = K/n follows:

Pr(X = 0.32) = Pr(K = 32) =(

n32

)· p32 · (1− p)100−32

=

(10032

)· 0.332 · 0.768 ≈ 0.078

27 / 61

Page 67: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Our assumptions imply that each individuals of the nextgenerations have a probability of 0.3 to obtain allele A, and theyget their alleles independently of each other.

Therefore, the number K of individuels who get allele A isbinomially distributed with n = 100 and p = 0.3:

K ∼ bin(n = 100,p = 0.3)

For X = K/n follows:

Pr(X = 0.32) =

Pr(K = 32) =(

n32

)· p32 · (1− p)100−32

=

(10032

)· 0.332 · 0.768 ≈ 0.078

27 / 61

Page 68: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Our assumptions imply that each individuals of the nextgenerations have a probability of 0.3 to obtain allele A, and theyget their alleles independently of each other.

Therefore, the number K of individuels who get allele A isbinomially distributed with n = 100 and p = 0.3:

K ∼ bin(n = 100,p = 0.3)

For X = K/n follows:

Pr(X = 0.32) = Pr(K = 32) =

(n32

)· p32 · (1− p)100−32

=

(10032

)· 0.332 · 0.768 ≈ 0.078

27 / 61

Page 69: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Our assumptions imply that each individuals of the nextgenerations have a probability of 0.3 to obtain allele A, and theyget their alleles independently of each other.

Therefore, the number K of individuels who get allele A isbinomially distributed with n = 100 and p = 0.3:

K ∼ bin(n = 100,p = 0.3)

For X = K/n follows:

Pr(X = 0.32) = Pr(K = 32) =(

n32

)· p32 · (1− p)100−32

=

(10032

)· 0.332 · 0.768 ≈ 0.078

27 / 61

Page 70: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Our assumptions imply that each individuals of the nextgenerations have a probability of 0.3 to obtain allele A, and theyget their alleles independently of each other.

Therefore, the number K of individuels who get allele A isbinomially distributed with n = 100 and p = 0.3:

K ∼ bin(n = 100,p = 0.3)

For X = K/n follows:

Pr(X = 0.32) = Pr(K = 32) =(

n32

)· p32 · (1− p)100−32

=

(10032

)· 0.332 · 0.768

≈ 0.078

27 / 61

Page 71: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

The binomial distribution

Our assumptions imply that each individuals of the nextgenerations have a probability of 0.3 to obtain allele A, and theyget their alleles independently of each other.

Therefore, the number K of individuels who get allele A isbinomially distributed with n = 100 and p = 0.3:

K ∼ bin(n = 100,p = 0.3)

For X = K/n follows:

Pr(X = 0.32) = Pr(K = 32) =(

n32

)· p32 · (1− p)100−32

=

(10032

)· 0.332 · 0.768 ≈ 0.078

27 / 61

Page 72: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

Contents

1 Random Variables and Distributions

2 The binomial distribution

3 Principle of statistical testing

4 Expectation value

5 Variance and Correlation

28 / 61

Page 73: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

We want to argue that some deviation in the data is not justrandom.

To this end we first specify a null hypothesis H0, i.e. wedefine, what “just random” means.Then we try to show: If H0 is true, then a deviation that is atleast at large as the observed one, is very improbable.If we can do this, we reject H0.How we measure deviation, must be clear before we seethe data.

29 / 61

Page 74: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

We want to argue that some deviation in the data is not justrandom.To this end we first specify a null hypothesis H0, i.e. wedefine, what “just random” means.

Then we try to show: If H0 is true, then a deviation that is atleast at large as the observed one, is very improbable.If we can do this, we reject H0.How we measure deviation, must be clear before we seethe data.

29 / 61

Page 75: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

We want to argue that some deviation in the data is not justrandom.To this end we first specify a null hypothesis H0, i.e. wedefine, what “just random” means.Then we try to show: If H0 is true, then a deviation that is atleast at large as the observed one, is very improbable.

If we can do this, we reject H0.How we measure deviation, must be clear before we seethe data.

29 / 61

Page 76: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

We want to argue that some deviation in the data is not justrandom.To this end we first specify a null hypothesis H0, i.e. wedefine, what “just random” means.Then we try to show: If H0 is true, then a deviation that is atleast at large as the observed one, is very improbable.If we can do this, we reject H0.

How we measure deviation, must be clear before we seethe data.

29 / 61

Page 77: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

We want to argue that some deviation in the data is not justrandom.To this end we first specify a null hypothesis H0, i.e. wedefine, what “just random” means.Then we try to show: If H0 is true, then a deviation that is atleast at large as the observed one, is very improbable.If we can do this, we reject H0.How we measure deviation, must be clear before we seethe data.

29 / 61

Page 78: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

Statistical Testing: Imporatant terms

null hypothesis H0 : says that what we want to substantiate isnot true and anything that looks like evidence in thedata is just random. We try to reject H0.

significance level α : If H0 is true, the probability to falsly rejectit, must be ≤ α (often α = 0.05).

test statistic : measures how far the data deviates from what H0

predicts into the direction of our alternativehypothesis.

p value : Probability that, if H0 is true, a dataset leads to atest statistic value that is as least as extreme as theobserved one.

30 / 61

Page 79: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

We reject the null hypothesis H0 if the p value is smallerthan α.

Thus, if H0 is true, the probability to (falsely) reject it is α(not the p value).This entails that a researcher who performs many tests withα = 0.05 on complete random data (i.e. where H0 is alwaystrue), will falsely reject H0 in 5% of the tests.Therefore it is a severe violation of academic soundness toperform tests until one shows significance, and to publishonly the latter.

31 / 61

Page 80: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

We reject the null hypothesis H0 if the p value is smallerthan α.Thus, if H0 is true, the probability to (falsely) reject it is α(not the p value).

This entails that a researcher who performs many tests withα = 0.05 on complete random data (i.e. where H0 is alwaystrue), will falsely reject H0 in 5% of the tests.Therefore it is a severe violation of academic soundness toperform tests until one shows significance, and to publishonly the latter.

31 / 61

Page 81: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

We reject the null hypothesis H0 if the p value is smallerthan α.Thus, if H0 is true, the probability to (falsely) reject it is α(not the p value).This entails that a researcher who performs many tests withα = 0.05 on complete random data (i.e. where H0 is alwaystrue), will falsely reject H0 in 5% of the tests.

Therefore it is a severe violation of academic soundness toperform tests until one shows significance, and to publishonly the latter.

31 / 61

Page 82: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

We reject the null hypothesis H0 if the p value is smallerthan α.Thus, if H0 is true, the probability to (falsely) reject it is α(not the p value).This entails that a researcher who performs many tests withα = 0.05 on complete random data (i.e. where H0 is alwaystrue), will falsely reject H0 in 5% of the tests.Therefore it is a severe violation of academic soundness toperform tests until one shows significance, and to publishonly the latter.

31 / 61

Page 83: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

Testing two-sided or one-sided?

We observe a value of x that is much larger than the H0

expectation value µ.

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

density

2.5%2.5%

p-value=PrH0(|X − µ| ≥ |x − µ|)

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

density

5.0%p-value=PrH0(X ≥ x)

32 / 61

Page 84: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

The pure teachings of statistical testing

Specify a null hypothesis H0, e.g. µ = 0.

Specify level of significance α, e.g. α = 0.05.Specify an event A such that

PrH0(A) = α

(or at least PrH0(A) ≤ α).

e.g. A = {X > q} or A = {|X − µ| > r}in general: A = {p-value ≤ α}

AND AFTER THAT: Look at the data and check if if Aoccurs.Then, the probability that H0 is rejected in the case that H0

is actually true (“Type I error”) is just α.

33 / 61

Page 85: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

The pure teachings of statistical testing

Specify a null hypothesis H0, e.g. µ = 0.Specify level of significance α, e.g. α = 0.05.

Specify an event A such that

PrH0(A) = α

(or at least PrH0(A) ≤ α).

e.g. A = {X > q} or A = {|X − µ| > r}in general: A = {p-value ≤ α}

AND AFTER THAT: Look at the data and check if if Aoccurs.Then, the probability that H0 is rejected in the case that H0

is actually true (“Type I error”) is just α.

33 / 61

Page 86: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

The pure teachings of statistical testing

Specify a null hypothesis H0, e.g. µ = 0.Specify level of significance α, e.g. α = 0.05.Specify an event A such that

PrH0(A) = α

(or at least PrH0(A) ≤ α).

e.g. A = {X > q} or A = {|X − µ| > r}in general: A = {p-value ≤ α}AND AFTER THAT: Look at the data and check if if Aoccurs.Then, the probability that H0 is rejected in the case that H0

is actually true (“Type I error”) is just α.

33 / 61

Page 87: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

The pure teachings of statistical testing

Specify a null hypothesis H0, e.g. µ = 0.Specify level of significance α, e.g. α = 0.05.Specify an event A such that

PrH0(A) = α

(or at least PrH0(A) ≤ α).e.g. A = {X > q} or A = {|X − µ| > r}

in general: A = {p-value ≤ α}AND AFTER THAT: Look at the data and check if if Aoccurs.Then, the probability that H0 is rejected in the case that H0

is actually true (“Type I error”) is just α.

33 / 61

Page 88: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

The pure teachings of statistical testing

Specify a null hypothesis H0, e.g. µ = 0.Specify level of significance α, e.g. α = 0.05.Specify an event A such that

PrH0(A) = α

(or at least PrH0(A) ≤ α).e.g. A = {X > q} or A = {|X − µ| > r}in general: A = {p-value ≤ α}

AND AFTER THAT: Look at the data and check if if Aoccurs.Then, the probability that H0 is rejected in the case that H0

is actually true (“Type I error”) is just α.

33 / 61

Page 89: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

The pure teachings of statistical testing

Specify a null hypothesis H0, e.g. µ = 0.Specify level of significance α, e.g. α = 0.05.Specify an event A such that

PrH0(A) = α

(or at least PrH0(A) ≤ α).e.g. A = {X > q} or A = {|X − µ| > r}in general: A = {p-value ≤ α}AND AFTER THAT: Look at the data and check if if Aoccurs.

Then, the probability that H0 is rejected in the case that H0

is actually true (“Type I error”) is just α.

33 / 61

Page 90: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

The pure teachings of statistical testing

Specify a null hypothesis H0, e.g. µ = 0.Specify level of significance α, e.g. α = 0.05.Specify an event A such that

PrH0(A) = α

(or at least PrH0(A) ≤ α).e.g. A = {X > q} or A = {|X − µ| > r}in general: A = {p-value ≤ α}AND AFTER THAT: Look at the data and check if if Aoccurs.Then, the probability that H0 is rejected in the case that H0

is actually true (“Type I error”) is just α.

33 / 61

Page 91: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

Violations against the pure teachings

“The two-sided test gave me a p-value of0.06. Therefore, I tested one-sided and this

worked out nicely.”

is as bad as:

“At first glance I saw that x is larger thanµH0. So, I immediately applied the one-sided

test.”

34 / 61

Page 92: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

Violations against the pure teachings

“The two-sided test gave me a p-value of0.06. Therefore, I tested one-sided and this

worked out nicely.”

is as bad as:

“At first glance I saw that x is larger thanµH0. So, I immediately applied the one-sided

test.”

34 / 61

Page 93: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

Violations against the pure teachings

“The two-sided test gave me a p-value of0.06. Therefore, I tested one-sided and this

worked out nicely.”

is as bad as:

“At first glance I saw that x is larger thanµH0. So, I immediately applied the one-sided

test.”

34 / 61

Page 94: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

ImportantThe decision between one-sided and two-sided must notdepend on the concrete data that are used in the test.

More generally: If A is the event that will lead to the rejection ofH0, (if it occurs) then A must be defined without beinginfluenced by the data that is used for testing.

35 / 61

Page 95: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

ImportantThe decision between one-sided and two-sided must notdepend on the concrete data that are used in the test.More generally: If A is the event that will lead to the rejection ofH0, (if it occurs) then A must be defined without beinginfluenced by the data that is used for testing.

35 / 61

Page 96: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

This means: Use separate data sets for exploratory dataanalysis and for testing.

In some fields these rules are followed quite strictly, e.g. testingnew pharmaceuticals for accreditation.In some other fields the practical approach is more common:Just inform the reader about the p-values of differentnull-hypotheses. Let the reader decide which null-hypothesiswould have been the most natural one.

36 / 61

Page 97: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

This means: Use separate data sets for exploratory dataanalysis and for testing.In some fields these rules are followed quite strictly, e.g. testingnew pharmaceuticals for accreditation.

In some other fields the practical approach is more common:Just inform the reader about the p-values of differentnull-hypotheses. Let the reader decide which null-hypothesiswould have been the most natural one.

36 / 61

Page 98: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

This means: Use separate data sets for exploratory dataanalysis and for testing.In some fields these rules are followed quite strictly, e.g. testingnew pharmaceuticals for accreditation.In some other fields the practical approach is more common:Just inform the reader about the p-values of differentnull-hypotheses. Let the reader decide which null-hypothesiswould have been the most natural one.

36 / 61

Page 99: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If H0 is rejected on the 5%-level, which of the followingstatements is true?

The null hypothesis is wrong.

37 / 61

Page 100: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If H0 is rejected on the 5%-level, which of the followingstatements is true?

The null hypothesis is wrong.

37 / 61

Page 101: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If H0 is rejected on the 5%-level, which of the followingstatements is true?

The null hypothesis is wrong.H0 is wrong with a probability of 95%.

37 / 61

Page 102: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If H0 is rejected on the 5%-level, which of the followingstatements is true?

The null hypothesis is wrong.H0 is wrong with a probability of 95%.

37 / 61

Page 103: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If H0 is rejected on the 5%-level, which of the followingstatements is true?

The null hypothesis is wrong.H0 is wrong with a probability of 95%.If H0 is true, you will see such an extreme event only in 5%of the data sets.

37 / 61

Page 104: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If H0 is rejected on the 5%-level, which of the followingstatements is true?

The null hypothesis is wrong.H0 is wrong with a probability of 95%.

If H0 is true, you will see such an extreme event only in 5%

of the data sets. X

37 / 61

Page 105: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If the test did not reject H0, which of the following statements aretrue?

We have to reject the alternative H1.

38 / 61

Page 106: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If the test did not reject H0, which of the following statements aretrue?

We have to reject the alternative H1.

38 / 61

Page 107: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If the test did not reject H0, which of the following statements aretrue?

We have to reject the alternative H1.H0 is true.

38 / 61

Page 108: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If the test did not reject H0, which of the following statements aretrue?

We have to reject the alternative H1.H0 is true

38 / 61

Page 109: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If the test did not reject H0, which of the following statements aretrue?

We have to reject the alternative H1.H0 is trueH0 is probably true.

38 / 61

Page 110: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If the test did not reject H0, which of the following statements aretrue?

We have to reject the alternative H1.H0 is trueH0 is probably true.

38 / 61

Page 111: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If the test did not reject H0, which of the following statements aretrue?

We have to reject the alternative H1.H0 is trueH0 is probably true.It is safe to assume that H0 was true.

38 / 61

Page 112: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If the test did not reject H0, which of the following statements aretrue?

We have to reject the alternative H1.H0 is trueH0 is probably true.It is safe to assume that H0 was true.

38 / 61

Page 113: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If the test did not reject H0, which of the following statements aretrue?

We have to reject the alternative H1.H0 is trueH0 is probably true.It is safe to assume that H0 was true.Even if H0 is true, it is not so unlikely that our test statistictakes a value that is as extreme as the one we observed.

38 / 61

Page 114: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If the test did not reject H0, which of the following statements aretrue?

We have to reject the alternative H1.H0 is trueH0 is probably true.It is safe to assume that H0 was true.Even if H0 is true, it is not so unlikely that our test statistictakes a value that is as extreme as the one we observed.X

38 / 61

Page 115: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If the test did not reject H0, which of the following statements aretrue?

We have to reject the alternative H1.H0 is trueH0 is probably true.It is safe to assume that H0 was true.Even if H0 is true, it is not so unlikely that our test statistictakes a value that is as extreme as the one we observed.XWith this respect, H0 is compatible with the data.

38 / 61

Page 116: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Principle of statistical testing

If the test did not reject H0, which of the following statements aretrue?

We have to reject the alternative H1.H0 is trueH0 is probably true.It is safe to assume that H0 was true.Even if H0 is true, it is not so unlikely that our test statistictakes a value that is as extreme as the one we observed.XWith this respect, H0 is compatible with the data.X

38 / 61

Page 117: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Contents

1 Random Variables and Distributions

2 The binomial distribution

3 Principle of statistical testing

4 Expectation value

5 Variance and Correlation

39 / 61

Page 118: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Definition (Expectation value)Let X be a random variable with finite or countable state spaceS = {x1, x2, x3 . . . } ⊆ R.

The expectation value of X is defined by

EX =∑x ∈S

x · Pr(X = x)

It is also common to write µX instead of EX .

If we replace probabilities by relative frequencies in this definition, weget the formula for the mean value (of a sample).

40 / 61

Page 119: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Definition (Expectation value)Let X be a random variable with finite or countable state spaceS = {x1, x2, x3 . . . } ⊆ R. The expectation value of X is defined by

EX =∑x ∈S

x · Pr(X = x)

It is also common to write µX instead of EX .

If we replace probabilities by relative frequencies in this definition, weget the formula for the mean value (of a sample).

40 / 61

Page 120: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Definition (Expectation value)Let X be a random variable with finite or countable state spaceS = {x1, x2, x3 . . . } ⊆ R. The expectation value of X is defined by

EX =∑x ∈S

x · Pr(X = x)

It is also common to write µX instead of EX .

If we replace probabilities by relative frequencies in this definition, weget the formula for the mean value (of a sample).

40 / 61

Page 121: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Definition (Expectation value)If X is a random variable with finite or countable state spaceS = {x1, x2, x3 . . . } ⊆ R, the expectation value of X is defined by

EX =∑x ∈S

x · Pr(X = x)

Examples:

Let X be Bernoulli distributed with success probability p ∈ [0,1].Then we get

EX = 1 · Pr(X = 1) + 0 · Pr(X = 0) = Pr(X = 1) = p

Let W be the result of rolling a dice. Then we get

EW = 1 · Pr(W = 1) + 2 · Pr(W = 2) + . . .+ 6 · Pr(W = 6)

= 1 · 16 + 2 · 1

6 + . . .+ 6 · 16 = 211

6 = 3.5

41 / 61

Page 122: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Definition (Expectation value)If X is a random variable with finite or countable state spaceS = {x1, x2, x3 . . . } ⊆ R, the expectation value of X is defined by

EX =∑x ∈S

x · Pr(X = x)

Examples:

Let X be Bernoulli distributed with success probability p ∈ [0,1].Then we get

EX = 1 · Pr(X = 1) + 0 · Pr(X = 0) = Pr(X = 1) = p

Let W be the result of rolling a dice. Then we get

EW = 1 · Pr(W = 1) + 2 · Pr(W = 2) + . . .+ 6 · Pr(W = 6)

= 1 · 16 + 2 · 1

6 + . . .+ 6 · 16 = 211

6 = 3.5

41 / 61

Page 123: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Calculating with expectatins

Theorem (Linearity of Expectation)If X and Y are random variables with values in R and if a ∈ R, we get:

E(a · X ) = a · EX

E(X + Y ) = EX + EY

Theorem (Only if independent!)If X and Y are stochastically independent random variables withvalues in R, we get

E(X · Y ) = EX · EY.

But in general E(X · Y ) 6= EX · EY . Example:

E(W ·W ) = 916 = 15.167 > 12.25 = 3.5 · 3.5 = EW · EW

42 / 61

Page 124: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Calculating with expectatins

Theorem (Linearity of Expectation)If X and Y are random variables with values in R and if a ∈ R, we get:

E(a · X ) = a · EX

E(X + Y ) = EX + EY

Theorem (Only if independent!)If X and Y are stochastically independent random variables withvalues in R, we get

E(X · Y ) = EX · EY.

But in general E(X · Y ) 6= EX · EY . Example:

E(W ·W ) = 916 = 15.167 > 12.25 = 3.5 · 3.5 = EW · EW

42 / 61

Page 125: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Calculating with expectatins

Theorem (Linearity of Expectation)If X and Y are random variables with values in R and if a ∈ R, we get:

E(a · X ) = a · EX

E(X + Y ) = EX + EY

Theorem (Only if independent!)If X and Y are stochastically independent random variables withvalues in R, we get

E(X · Y ) = EX · EY.

But in general E(X · Y ) 6= EX · EY . Example:

E(W ·W ) = 916 = 15.167 > 12.25 = 3.5 · 3.5 = EW · EW

42 / 61

Page 126: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Proof of Linearity: If S is the state space of X and Y , and if a,b ∈ R,we obtain:

E(a · X + b · Y )

=∑x∈S

∑y∈S

(a · x + b · y)Pr(X = x ,Y = y)

= a ·∑x∈S

∑y∈S

x Pr(X = x ,Y = y) + b ·∑x∈S

∑y∈S

y Pr(X = x ,Y = y)

= a ·∑x∈S

x∑y∈S

Pr(X = x ,Y = y) + b ·∑y∈S

y∑x∈S

Pr(X = x ,Y = y)

= a ·∑x∈S

x Pr(X = x) + b ·∑y∈S

y Pr(Y = y)

= a · E(X ) + b · E(Y )

43 / 61

Page 127: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Proof of the product formula: Let S be the state space of X and Y , andlet X and Y be (stochastically) independent.

E(X · Y )

=∑x∈S

∑y∈S

(x · y)Pr(X = x ,Y = y)

=∑x∈S

∑y∈S

(x · y)Pr(X = x)Pr(Y = y)

=∑x∈S

x Pr(X = x) ·∑y∈S

y Pr(Y = y)

= EX · EY ·

44 / 61

Page 128: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

TheoremIf X is random variable with finite state space S ⊂ R, and iff : R→ R is a function, we obtain

E(f (X )) =∑x∈S

f (x) · Pr(X = x)

Exercise: proof this.

45 / 61

Page 129: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Expectation of the binomial distributionLet Y1,Y2, . . . ,Yn be the indicator variables of the n independent trials,that is

Yi =

{1 if trial i succeeds0 if trial i − fails

Then X = Y1 + · · ·+ Yn is binomially distributed with parameters(n,p), where p is the success probability of the trials.

Linearity of expectation implies

EX = E(Y1 + · · ·+ Yn)

= EY1 + · · ·+ EYn

= p + · · ·+ p = np

Note:X ∼ bin(n,p)⇒ EX = n · p

46 / 61

Page 130: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Expectation of the binomial distributionLet Y1,Y2, . . . ,Yn be the indicator variables of the n independent trials,that is

Yi =

{1 if trial i succeeds0 if trial i − fails

Then X = Y1 + · · ·+ Yn is binomially distributed with parameters(n,p), where p is the success probability of the trials.

Linearity of expectation implies

EX = E(Y1 + · · ·+ Yn)

= EY1 + · · ·+ EYn

= p + · · ·+ p = np

Note:X ∼ bin(n,p)⇒ EX = n · p

46 / 61

Page 131: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Expectation of the binomial distributionLet Y1,Y2, . . . ,Yn be the indicator variables of the n independent trials,that is

Yi =

{1 if trial i succeeds0 if trial i − fails

Then X = Y1 + · · ·+ Yn is binomially distributed with parameters(n,p), where p is the success probability of the trials.

Linearity of expectation implies

EX = E(Y1 + · · ·+ Yn)

= EY1 + · · ·+ EYn

= p + · · ·+ p = np

Note:X ∼ bin(n,p)⇒ EX = n · p

46 / 61

Page 132: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Expectation of the binomial distributionLet Y1,Y2, . . . ,Yn be the indicator variables of the n independent trials,that is

Yi =

{1 if trial i succeeds0 if trial i − fails

Then X = Y1 + · · ·+ Yn is binomially distributed with parameters(n,p), where p is the success probability of the trials.

Linearity of expectation implies

EX = E(Y1 + · · ·+ Yn)

= EY1 + · · ·+ EYn

= p + · · ·+ p = np

Note:X ∼ bin(n,p)⇒ EX = n · p

46 / 61

Page 133: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Expectation value

Expectation of the binomial distributionLet Y1,Y2, . . . ,Yn be the indicator variables of the n independent trials,that is

Yi =

{1 if trial i succeeds0 if trial i − fails

Then X = Y1 + · · ·+ Yn is binomially distributed with parameters(n,p), where p is the success probability of the trials.

Linearity of expectation implies

EX = E(Y1 + · · ·+ Yn)

= EY1 + · · ·+ EYn

= p + · · ·+ p = np

Note:X ∼ bin(n,p)⇒ EX = n · p

46 / 61

Page 134: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Contents

1 Random Variables and Distributions

2 The binomial distribution

3 Principle of statistical testing

4 Expectation value

5 Variance and Correlation

47 / 61

Page 135: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Definition (Variance, Covariance and Correlation)The Variance of a R-valued random variable X is

VarX = σ2X = E

[(X − EX )2

].

σX =√

Var X is the Standard Deviation.If Y is enother R-valued random variable,

Cov(X ,Y ) = E [(X − EX ) · (Y − EY )]

is the Covariance of X and Y .The Correlation of X and Y is

Cor(X ,Y ) =Cov(X ,Y )

σX · σY.

48 / 61

Page 136: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Definition (Variance, Covariance and Correlation)The Variance of a R-valued random variable X is

VarX = σ2X = E

[(X − EX )2

].

σX =√

Var X is the Standard Deviation.

If Y is enother R-valued random variable,

Cov(X ,Y ) = E [(X − EX ) · (Y − EY )]

is the Covariance of X and Y .The Correlation of X and Y is

Cor(X ,Y ) =Cov(X ,Y )

σX · σY.

48 / 61

Page 137: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Definition (Variance, Covariance and Correlation)The Variance of a R-valued random variable X is

VarX = σ2X = E

[(X − EX )2

].

σX =√

Var X is the Standard Deviation.If Y is enother R-valued random variable,

Cov(X ,Y ) = E [(X − EX ) · (Y − EY )]

is the Covariance of X and Y .

The Correlation of X and Y is

Cor(X ,Y ) =Cov(X ,Y )

σX · σY.

48 / 61

Page 138: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Definition (Variance, Covariance and Correlation)The Variance of a R-valued random variable X is

VarX = σ2X = E

[(X − EX )2

].

σX =√

Var X is the Standard Deviation.If Y is enother R-valued random variable,

Cov(X ,Y ) = E [(X − EX ) · (Y − EY )]

is the Covariance of X and Y .The Correlation of X and Y is

Cor(X ,Y ) =Cov(X ,Y )

σX · σY.

48 / 61

Page 139: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

The VarianceVarX = E

[(X − EX )2

]is the average squared deviation from the expectation.

The CorrelationCor(X ,Y ) =

Cov(X ,Y )

σX · σY

is always between in the range from -1 to 1. The random variables Xand Y are

positively correlated, if X and Y tend to be both above average orboth below average.

negatively correlated, if X and Y tend to deviate from theirexpectation values in opposite ways.

If X and Y are independent, they are also uncorrelated, that isCor(X ,Y ) = 0.

49 / 61

Page 140: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

The VarianceVarX = E

[(X − EX )2

]is the average squared deviation from the expectation.

The CorrelationCor(X ,Y ) =

Cov(X ,Y )

σX · σY

is always between in the range from -1 to 1. The random variables Xand Y are

positively correlated, if X and Y tend to be both above average orboth below average.

negatively correlated, if X and Y tend to deviate from theirexpectation values in opposite ways.

If X and Y are independent, they are also uncorrelated, that isCor(X ,Y ) = 0.

49 / 61

Page 141: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

The VarianceVarX = E

[(X − EX )2

]is the average squared deviation from the expectation.

The CorrelationCor(X ,Y ) =

Cov(X ,Y )

σX · σY

is always between in the range from -1 to 1. The random variables Xand Y are

positively correlated, if X and Y tend to be both above average orboth below average.

negatively correlated, if X and Y tend to deviate from theirexpectation values in opposite ways.

If X and Y are independent, they are also uncorrelated, that isCor(X ,Y ) = 0.

49 / 61

Page 142: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Example: rolling dice

Variance of result from rolling a dice W :

Var(W ) = E[(

W − EW)2]

= E[(

W − 3.5)2]

= (1− 3.5)2 · 16+ (2− 3.5)2 · 1

6+ . . .+ (6− 3.5)2 · 1

6

=17.5

6= 2.91667

50 / 61

Page 143: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Example: Empirical DistributionIf x1, . . . , xn ∈ R are data and if X is the result of a random draw fromthe data (such that Pr(X = xi) =

1n ), we get:

EX =n∑

i=1

xi Pr(X = xi) =1n

n∑i=1

xi = x

and

Var X = E[(

X − EX)2]

=1n

n∑i=1

(xi − x)2

If (x1, y1), . . . , (xn, yn) ∈ R× R are data if (X ,Y ) are drawn from thedata such that Pr((X ,Y ) = (xi , yi)) =

1n , we get

Cov (X ,Y ) = E[(

X − EX)(

Y − EY)]

=1n

n∑i=1

(xi − x)(yi − y)

51 / 61

Page 144: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Example: Empirical DistributionIf x1, . . . , xn ∈ R are data and if X is the result of a random draw fromthe data (such that Pr(X = xi) =

1n ), we get:

EX =n∑

i=1

xi Pr(X = xi) =1n

n∑i=1

xi = x

and

Var X = E[(

X − EX)2]

=1n

n∑i=1

(xi − x)2

If (x1, y1), . . . , (xn, yn) ∈ R× R are data if (X ,Y ) are drawn from thedata such that Pr((X ,Y ) = (xi , yi)) =

1n , we get

Cov (X ,Y ) = E[(

X − EX)(

Y − EY)]

=1n

n∑i=1

(xi − x)(yi − y)

51 / 61

Page 145: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Why Cov(X ,Y ) = E([X − EX ][Y − EY ])?

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

52 / 61

Page 146: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Why Cov(X ,Y ) = E([X − EX ][Y − EY ])?

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

●●●

●●

●●

● ●

●●

●●●

●●

●●

● ●

●●

[X−EX]>0[X−EX]<0

52 / 61

Page 147: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Why Cov(X ,Y ) = E([X − EX ][Y − EY ])?

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

●●

●●●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●●

●●

● ●

●●

[Y−EY]>0

[Y−EY]<0

[X−EX]>0[X−EX]<0

52 / 61

Page 148: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Why Cov(X ,Y ) = E([X − EX ][Y − EY ])?

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

●●

●●

●●

●●

[X−EX]*[Y−EY]<0

[X−EX]*[Y−EY]>0

[X−EX]*[Y−EY]>0

[X−EX]*[Y−EY]<0

52 / 61

Page 149: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Why Cov(X ,Y ) = E([X − EX ][Y − EY ])?

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

●●

●●

●●

●●

[X−EX]*[Y−EY]<0

[X−EX]*[Y−EY]>0

[X−EX]*[Y−EY]>0

[X−EX]*[Y−EY]<0

Cov(X,Y)= 1.11

52 / 61

Page 150: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Why Cov(X ,Y ) = E([X − EX ][Y − EY ])?

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

●●

●●

●●

●●

●●

●●

●●

[X−EX]*[Y−EY]<0

[X−EX]*[Y−EY]>0

[X−EX]*[Y−EY]>0

[X−EX]*[Y−EY]<0

52 / 61

Page 151: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Why Cov(X ,Y ) = E([X − EX ][Y − EY ])?

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

●●

●●

●●

●●

●●

●●

●●

[X−EX]*[Y−EY]<0

[X−EX]*[Y−EY]>0

[X−EX]*[Y−EY]>0

[X−EX]*[Y−EY]<0

Cov(X,Y)= −0.78

52 / 61

Page 152: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 0.95, σY = 0.92

●●●

●●

●●

● ●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 153: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 0.95, σY = 0.92

Cov(X ,Y ) = −0.06

●●●

●●

●●

● ●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 154: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 0.95, σY = 0.92

Cov(X ,Y ) = −0.06

Cor(X ,Y ) = −0.069

●●●

●●

●●

● ●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 155: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 0.95, σY = 0.92

Cov(X ,Y ) = −0.06

Cor(X ,Y ) = −0.069

●●●

●●

●●

● ●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

σX = 1.13, σY = 1.2

● ●

●●

● ●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 156: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 0.95, σY = 0.92

Cov(X ,Y ) = −0.06

Cor(X ,Y ) = −0.069

●●●

●●

●●

● ●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

σX = 1.13, σY = 1.2

Cov(X ,Y ) = −1.26

● ●

●●

● ●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 157: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 0.95, σY = 0.92

Cov(X ,Y ) = −0.06

Cor(X ,Y ) = −0.069

●●●

●●

●●

● ●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

σX = 1.13, σY = 1.2

Cov(X ,Y ) = −1.26

Cor(X ,Y ) = −0.92

● ●

●●

● ●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 158: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 1.14, σY = 0.78

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

σX = 1.13, σY = 1.2

Cov(X ,Y ) = −1.26

Cor(X ,Y ) = −0.92

● ●

●●

● ●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 159: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 1.14, σY = 0.78

Cov(X ,Y ) = 0.78

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

σX = 1.13, σY = 1.2

Cov(X ,Y ) = −1.26

Cor(X ,Y ) = −0.92

● ●

●●

● ●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 160: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 1.14, σY = 0.78

Cov(X ,Y ) = 0.78

Cor(X ,Y ) = 0.71

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

σX = 1.13, σY = 1.2

Cov(X ,Y ) = −1.26

Cor(X ,Y ) = −0.92

● ●

●●

● ●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 161: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 1.14, σY = 0.78

Cov(X ,Y ) = 0.78

Cor(X ,Y ) = 0.71

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

σX = 1.03, σY = 0.32

●●●

●●

●●●

●●

●●●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

● ●● ● ●●

●●

●●

●●

●●

●●

●●

●●

●●●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 162: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 1.14, σY = 0.78

Cov(X ,Y ) = 0.78

Cor(X ,Y ) = 0.71

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

σX = 1.03, σY = 0.32

Cov(X ,Y ) = 0.32

●●●

●●

●●●

●●

●●●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

● ●● ● ●●

●●

●●

●●

●●

●●

●●

●●

●●●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 163: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 1.14, σY = 0.78

Cov(X ,Y ) = 0.78

Cor(X ,Y ) = 0.71

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

02

46

810

X

Y

σX = 1.03, σY = 0.32

Cov(X ,Y ) = 0.32

Cor(X ,Y ) = 0.95

●●●

●●

●●●

●●

●●●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

● ●● ● ●●

●●

●●

●●

●●

●●

●●

●●

●●●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 164: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 0.91, σY = 0.88

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

0 2 4 6 8 10

02

46

810

X

Y

σX = 1.03, σY = 0.32

Cov(X ,Y ) = 0.32

Cor(X ,Y ) = 0.95

●●●

●●

●●●

●●

●●●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

● ●● ● ●●

●●

●●

●●

●●

●●

●●

●●

●●●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 165: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 0.91, σY = 0.88

Cov(X ,Y ) = 0

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

0 2 4 6 8 10

02

46

810

X

Y

σX = 1.03, σY = 0.32

Cov(X ,Y ) = 0.32

Cor(X ,Y ) = 0.95

●●●

●●

●●●

●●

●●●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

● ●● ● ●●

●●

●●

●●

●●

●●

●●

●●

●●●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 166: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

σX = 0.91, σY = 0.88

Cov(X ,Y ) = 0

Cor(X ,Y ) = 0

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

0 2 4 6 8 10

02

46

810

X

Y

σX = 1.03, σY = 0.32

Cov(X ,Y ) = 0.32

Cor(X ,Y ) = 0.95

●●●

●●

●●●

●●

●●●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

● ●● ● ●●

●●

●●

●●

●●

●●

●●

●●

●●●●

0 2 4 6 8 10

02

46

810

X

Y

53 / 61

Page 167: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for variances

VarX = E[(X − EX )2]

VarX = Cov(X ,X )

VarX = E(X 2)− (EX )2 (Exercise!)Var(a · X ) = a2 · VarXVar(X + Y ) = VarX + VarY + 2 · Cov(X ,Y )

Var(∑n

i=1 Xi)=∑n

i=1 Var(Xi)+ 2 ·

∑nj=1

∑j−1i=1 Cov(Xi ,Xj)

If (X ,Y ) stochastically independent we get:

Var(X + Y ) = VarX + VarY

54 / 61

Page 168: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for variances

VarX = E[(X − EX )2]

VarX = Cov(X ,X )

VarX = E(X 2)− (EX )2 (Exercise!)

Var(a · X ) = a2 · VarXVar(X + Y ) = VarX + VarY + 2 · Cov(X ,Y )

Var(∑n

i=1 Xi)=∑n

i=1 Var(Xi)+ 2 ·

∑nj=1

∑j−1i=1 Cov(Xi ,Xj)

If (X ,Y ) stochastically independent we get:

Var(X + Y ) = VarX + VarY

54 / 61

Page 169: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for variances

VarX = E[(X − EX )2]

VarX = Cov(X ,X )

VarX = E(X 2)− (EX )2 (Exercise!)Var(a · X ) = a2 · VarX

Var(X + Y ) = VarX + VarY + 2 · Cov(X ,Y )

Var(∑n

i=1 Xi)=∑n

i=1 Var(Xi)+ 2 ·

∑nj=1

∑j−1i=1 Cov(Xi ,Xj)

If (X ,Y ) stochastically independent we get:

Var(X + Y ) = VarX + VarY

54 / 61

Page 170: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for variances

VarX = E[(X − EX )2]

VarX = Cov(X ,X )

VarX = E(X 2)− (EX )2 (Exercise!)Var(a · X ) = a2 · VarXVar(X + Y ) = VarX + VarY + 2 · Cov(X ,Y )

Var(∑n

i=1 Xi)=∑n

i=1 Var(Xi)+ 2 ·

∑nj=1

∑j−1i=1 Cov(Xi ,Xj)

If (X ,Y ) stochastically independent we get:

Var(X + Y ) = VarX + VarY

54 / 61

Page 171: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for variances

VarX = E[(X − EX )2]

VarX = Cov(X ,X )

VarX = E(X 2)− (EX )2 (Exercise!)Var(a · X ) = a2 · VarXVar(X + Y ) = VarX + VarY + 2 · Cov(X ,Y )

Var(∑n

i=1 Xi)=∑n

i=1 Var(Xi)+ 2 ·

∑nj=1

∑j−1i=1 Cov(Xi ,Xj)

If (X ,Y ) stochastically independent we get:

Var(X + Y ) = VarX + VarY

54 / 61

Page 172: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for variances

VarX = E[(X − EX )2]

VarX = Cov(X ,X )

VarX = E(X 2)− (EX )2 (Exercise!)Var(a · X ) = a2 · VarXVar(X + Y ) = VarX + VarY + 2 · Cov(X ,Y )

Var(∑n

i=1 Xi)=∑n

i=1 Var(Xi)+ 2 ·

∑nj=1

∑j−1i=1 Cov(Xi ,Xj)

If (X ,Y ) stochastically independent we get:

Var(X + Y ) = VarX + VarY

54 / 61

Page 173: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for Covariances

Cov(X ,Y ) = E[(X − EX ) · (Y − EY )]

If X and Y are independent, then Cov(X ,Y ) = 0(but not the other way around!)

Cov(X ,Y )=Cov(Y ,X )

Cov(X ,Y ) = E(X · Y )− EX · EY (Exercise!)Cov(a · X ,Y ) = a · Cov(X ,Y ) = Cov(X ,a · Y )

Cov(X + Z ,Y ) = Cov(X ,Y ) + Cov(Z ,Y )

Cov(X ,Z + Y ) = Cov(X ,Z ) + Cov(X ,Y )

The last three rules describe the bilinearity of covariance.

55 / 61

Page 174: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for Covariances

Cov(X ,Y ) = E[(X − EX ) · (Y − EY )]

If X and Y are independent, then Cov(X ,Y ) = 0(but not the other way around!)Cov(X ,Y )=Cov(Y ,X )

Cov(X ,Y ) = E(X · Y )− EX · EY (Exercise!)Cov(a · X ,Y ) = a · Cov(X ,Y ) = Cov(X ,a · Y )

Cov(X + Z ,Y ) = Cov(X ,Y ) + Cov(Z ,Y )

Cov(X ,Z + Y ) = Cov(X ,Z ) + Cov(X ,Y )

The last three rules describe the bilinearity of covariance.

55 / 61

Page 175: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for Covariances

Cov(X ,Y ) = E[(X − EX ) · (Y − EY )]

If X and Y are independent, then Cov(X ,Y ) = 0(but not the other way around!)Cov(X ,Y )=Cov(Y ,X )

Cov(X ,Y ) = E(X · Y )− EX · EY (Exercise!)

Cov(a · X ,Y ) = a · Cov(X ,Y ) = Cov(X ,a · Y )

Cov(X + Z ,Y ) = Cov(X ,Y ) + Cov(Z ,Y )

Cov(X ,Z + Y ) = Cov(X ,Z ) + Cov(X ,Y )

The last three rules describe the bilinearity of covariance.

55 / 61

Page 176: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for Covariances

Cov(X ,Y ) = E[(X − EX ) · (Y − EY )]

If X and Y are independent, then Cov(X ,Y ) = 0(but not the other way around!)Cov(X ,Y )=Cov(Y ,X )

Cov(X ,Y ) = E(X · Y )− EX · EY (Exercise!)Cov(a · X ,Y ) = a · Cov(X ,Y ) = Cov(X ,a · Y )

Cov(X + Z ,Y ) = Cov(X ,Y ) + Cov(Z ,Y )

Cov(X ,Z + Y ) = Cov(X ,Z ) + Cov(X ,Y )

The last three rules describe the bilinearity of covariance.

55 / 61

Page 177: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for Covariances

Cov(X ,Y ) = E[(X − EX ) · (Y − EY )]

If X and Y are independent, then Cov(X ,Y ) = 0(but not the other way around!)Cov(X ,Y )=Cov(Y ,X )

Cov(X ,Y ) = E(X · Y )− EX · EY (Exercise!)Cov(a · X ,Y ) = a · Cov(X ,Y ) = Cov(X ,a · Y )

Cov(X + Z ,Y ) = Cov(X ,Y ) + Cov(Z ,Y )

Cov(X ,Z + Y ) = Cov(X ,Z ) + Cov(X ,Y )

The last three rules describe the bilinearity of covariance.

55 / 61

Page 178: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for Covariances

Cov(X ,Y ) = E[(X − EX ) · (Y − EY )]

If X and Y are independent, then Cov(X ,Y ) = 0(but not the other way around!)Cov(X ,Y )=Cov(Y ,X )

Cov(X ,Y ) = E(X · Y )− EX · EY (Exercise!)Cov(a · X ,Y ) = a · Cov(X ,Y ) = Cov(X ,a · Y )

Cov(X + Z ,Y ) = Cov(X ,Y ) + Cov(Z ,Y )

Cov(X ,Z + Y ) = Cov(X ,Z ) + Cov(X ,Y )

The last three rules describe the bilinearity of covariance.

55 / 61

Page 179: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for Covariances

Cov(X ,Y ) = E[(X − EX ) · (Y − EY )]

If X and Y are independent, then Cov(X ,Y ) = 0(but not the other way around!)Cov(X ,Y )=Cov(Y ,X )

Cov(X ,Y ) = E(X · Y )− EX · EY (Exercise!)Cov(a · X ,Y ) = a · Cov(X ,Y ) = Cov(X ,a · Y )

Cov(X + Z ,Y ) = Cov(X ,Y ) + Cov(Z ,Y )

Cov(X ,Z + Y ) = Cov(X ,Z ) + Cov(X ,Y )

The last three rules describe the bilinearity of covariance.

55 / 61

Page 180: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for Correlations

Cor(X ,Y ) = Cov(X ,Y )σX ·σY

−1 ≤ Cor(X ,Y ) ≤ 1Cor(X ,Y ) = Cor(Y ,X )

Cor(X ,Y ) = Cov(X/σX ,Y/σY )

Cor(X ,Y ) = 1 if and only if Y is an increasing, affine-linearfunction of X , that is, if Y = a · X + b for appropriate a > 0and b ∈ R.Cor(X ,Y ) = −1 if and only if Y is an decreasing,affine-linear function of X , that is, if Y = a · X + b forappropriate a < 0 and b ∈ R.

56 / 61

Page 181: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for Correlations

Cor(X ,Y ) = Cov(X ,Y )σX ·σY

−1 ≤ Cor(X ,Y ) ≤ 1Cor(X ,Y ) = Cor(Y ,X )

Cor(X ,Y ) = Cov(X/σX ,Y/σY )

Cor(X ,Y ) = 1 if and only if Y is an increasing, affine-linearfunction of X , that is, if Y = a · X + b for appropriate a > 0and b ∈ R.

Cor(X ,Y ) = −1 if and only if Y is an decreasing,affine-linear function of X , that is, if Y = a · X + b forappropriate a < 0 and b ∈ R.

56 / 61

Page 182: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Calculation rules for Correlations

Cor(X ,Y ) = Cov(X ,Y )σX ·σY

−1 ≤ Cor(X ,Y ) ≤ 1Cor(X ,Y ) = Cor(Y ,X )

Cor(X ,Y ) = Cov(X/σX ,Y/σY )

Cor(X ,Y ) = 1 if and only if Y is an increasing, affine-linearfunction of X , that is, if Y = a · X + b for appropriate a > 0and b ∈ R.Cor(X ,Y ) = −1 if and only if Y is an decreasing,affine-linear function of X , that is, if Y = a · X + b forappropriate a < 0 and b ∈ R.

56 / 61

Page 183: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Bernoulli distributionA Bernoulli distributed random variable Y with success probabilityp ∈ [0,1] has expectation value

EY = p

and varianceVar Y = p · (1− p)

Proof: From Pr(Y = 1) = p and Pr(Y = 0) = (1− p) follows

EY = 1 · p + 0 · (1− p) = p.

variance:Var Y = E

(Y 2)− (EY

)2

= 12 · p + 02 · (1− p)− p2 = p · (1− p)

57 / 61

Page 184: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Bernoulli distributionA Bernoulli distributed random variable Y with success probabilityp ∈ [0,1] has expectation value

EY = p

and varianceVar Y = p · (1− p)

Proof: From Pr(Y = 1) = p and Pr(Y = 0) = (1− p) follows

EY = 1 · p + 0 · (1− p) = p.

variance:Var Y = E

(Y 2)− (EY

)2

= 12 · p + 02 · (1− p)− p2 = p · (1− p)

57 / 61

Page 185: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Bernoulli distributionA Bernoulli distributed random variable Y with success probabilityp ∈ [0,1] has expectation value

EY = p

and varianceVar Y = p · (1− p)

Proof: From Pr(Y = 1) = p and Pr(Y = 0) = (1− p) follows

EY = 1 · p + 0 · (1− p) = p.

variance:Var Y = E

(Y 2)− (EY

)2

= 12 · p + 02 · (1− p)− p2 = p · (1− p)

57 / 61

Page 186: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Binomial distribution

Let Y1, · · · ,Yn be independent Bernoulli distributed with successprobability p. Then follows

n∑i=1

Yi =: X ∼ bin(n,p)

and we get:

Var X =

Var( n∑

i=1

Yi

)=

n∑i=1

Var Yi = n · p · (1− p)

58 / 61

Page 187: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Binomial distribution

Let Y1, · · · ,Yn be independent Bernoulli distributed with successprobability p. Then follows

n∑i=1

Yi =: X ∼ bin(n,p)

and we get:

Var X = Var( n∑

i=1

Yi

)=

n∑i=1

Var Yi = n · p · (1− p)

58 / 61

Page 188: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Binomial distribution

Let Y1, · · · ,Yn be independent Bernoulli distributed with successprobability p. Then follows

n∑i=1

Yi =: X ∼ bin(n,p)

and we get:

Var X = Var( n∑

i=1

Yi

)=

n∑i=1

Var Yi =

n · p · (1− p)

58 / 61

Page 189: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Binomial distribution

Let Y1, · · · ,Yn be independent Bernoulli distributed with successprobability p. Then follows

n∑i=1

Yi =: X ∼ bin(n,p)

and we get:

Var X = Var( n∑

i=1

Yi

)=

n∑i=1

Var Yi = n · p · (1− p)

58 / 61

Page 190: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Binomial distribution

Theorem (Expectation value and variance of the binomial distribution)

If X is binomially distributed with parameters (n,p), we get:

EX = n · p

undVar X = n · p · (1− p)

59 / 61

Page 191: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Example: Genetic Drift

In a haploid population of n individuals, let p be the frequency ofsome allele A. We assume that (due to some simplifyingassumptions?) the absolute frequency K of A in the nextgeneration is (n,p)-binomially distributed.For X = K/n, the relative frequency in the next generationfollows:

Var(X ) = Var(K/n) = Var(K )/n2

= n · p · (1− p)/n2

=p · (1− p)

n

60 / 61

Page 192: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Example: Genetic Drift

In a haploid population of n individuals, let p be the frequency ofsome allele A. We assume that (due to some simplifyingassumptions?) the absolute frequency K of A in the nextgeneration is (n,p)-binomially distributed.For X = K/n, the relative frequency in the next generationfollows:

Var(X ) = Var(K/n) = Var(K )/n2 = n · p · (1− p)/n2

=p · (1− p)

n

60 / 61

Page 193: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Example: Genetic Drift

In a haploid population of n individuals, let p be the frequency ofsome allele A. We assume that (due to some simplifyingassumptions?) the absolute frequency K of A in the nextgeneration is (n,p)-binomially distributed.For X = K/n, the relative frequency in the next generationfollows:

Var(X ) = Var(K/n) = Var(K )/n2 = n · p · (1− p)/n2

=p · (1− p)

n

60 / 61

Page 194: Basic Stochastics and the idea of testing - evol.bio.lmu.deevol.bio.lmu.de/_statgen/maths/ws1718/stochbasics_slides.pdf · Basic Stochastics and the idea of testing Dirk Metzler October

Variance and Correlation

Example: Genetic Drift

If we consider the change of allele frequencies over mgenerations, the variances add up. If m is a small number, suchthat p will not change much over m generations, the is varianceof change of allele frequencies is approximately

m · Var(X ) =m · p · (1− p)

n

(because the changes per generation are independent of eachother) and thus, the standard deviation is about√

mn· p · (1− p)

61 / 61