36
Primer on Probability Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576/ Sushmita Roy [email protected] Oct 2nd, 2012 BMI/CS 576

Primer on Probability Sushmita Roy BMI/CS 576 Sushmita Roy [email protected] Oct 2nd, 2012 BMI/CS 576

Embed Size (px)

Citation preview

Page 1: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Primer on Probability

Sushmita RoyBMI/CS 576

www.biostat.wisc.edu/bmi576/Sushmita Roy

[email protected] 2nd, 2012

BMI/CS 576

Page 2: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Definition of probability

• frequentist interpretation: the probability of an event from a random experiment is the proportion of the time events of same kind will occur in the long run, when the experiment is repeated

• examples– the probability my flight to Chicago will be on time– the probability this ticket will win the lottery– the probability it will rain tomorrow

• always a number in the interval [0,1]0 means “never occurs”1 means “always occurs”

Page 3: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Sample spaces

• sample space: a set of possible outcomes for some event• event: a subset of sample space• examples

– flight to Chicago: {on time, late}– lottery: {ticket 1 wins, ticket 2 wins,…,ticket n wins}– weather tomorrow:

{rain, not rain} or{sun, rain, snow} or{sun, clouds, rain, snow, sleet} or…

Page 4: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Random variables

• random variable: a function associating a value with an attribute of the outcome of an experiment

• example– X represents the outcome of my flight to Chicago– we write the probability of my flight being on time as P(X

= on-time)– or when it’s clear which variable we’re referring to, we

may use the shorthand P(on-time)

Page 5: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Notation

• uppercase letters and capitalized words denote random variables

• lowercase letters and uncapitalized words denote values• we’ll denote a particular value for a variable as follows

• we’ll also use the shorthand form

• for Boolean random variables, we’ll use the shorthand€

P(Fever = true)

P(X = x)

P(x) for P(X = x)

P( fever) for P(Fever = true)

P(¬ fever) for P(Fever = false)

Page 6: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Probability distributions

• if X is a random variable, the function given by P(X = x) for each x is the probability distribution of X

• requirements:

P(x) =1x

sun

cloud

sra

insn

ow sleet

0.2

0.3

0.1

P(x) ≥ 0 for every x

Page 7: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Joint distributions

• joint probability distribution: the function given by P(X = x, Y = y)

• read “X equals x and Y equals y”• example

x, y P(X = x, Y = y)

sun, on-time 0.20

rain, on-time 0.20

snow, on-time 0.05

sun, late 0.10

rain, late 0.30

snow, late 0.15

probability that it’s sunny and my flight is on time

Page 8: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Marginal distributions

• the marginal distribution of X is defined by

“the distribution of X ignoring other variables”

• this definition generalizes to more than two variables, e.g.€

P(x) = P(x, y)y

P(x) = P(x,y,z)z

∑y

Page 9: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Marginal distribution example

x, y P(X = x, Y = y)

sun, on-time 0.20

rain, on-time 0.20

snow, on-time 0.05

sun, late 0.10

rain, late 0.30

snow, late 0.15

x P(X = x)

sun 0.3

rain 0.5

snow 0.2

joint distribution marginal distribution for X

Page 10: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Conditional distributions

• the conditional distribution of X given Y is defined as:

“the distribution of X given that we know the value of Y ”

P(X = x |Y = y) =P(X = x,Y = y)

P(Y = y)

Page 11: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Conditional distribution example

x, y P(X = x, Y = y)

sun, on-time 0.20

rain, on-time 0.20

snow, on-time 0.05

sun, late 0.10

rain, late 0.30

snow, late 0.15

x P(X = x|Y=on-time)

sun 0.20/0.45 = 0.444

rain 0.20/0.45 = 0.444

snow 0.05/0.45 = 0.111

joint distributionconditional distribution for X given Y=on-time

Page 12: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Independence

•two random variables, X and Y, are independent if

P(x, y) = P(x) × P(y) for all x and y

Page 13: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Independence example #1

x, y P(X = x, Y = y)

sun, on-time 0.20

rain, on-time 0.20

snow, on-time 0.05

sun, late 0.10

rain, late 0.30

snow, late 0.15

x P(X = x)

sun 0.3

rain 0.5

snow 0.2

joint distribution marginal distributions

y P(Y = y)

on-time 0.45

late 0.55

Are X and Y independent here? NO.

Page 14: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Independence example #2

x, y P(X = x, Y = y)

sun, fly-United 0.27

rain, fly-United 0.45

snow, fly-United 0.18

sun, fly-Northwest 0.03

rain, fly-Northwest 0.05

snow, fly-Northwest 0.02

x P(X = x)

sun 0.3

rain 0.5

snow 0.2

joint distribution marginal distributions

y P(Y = y)

fly-United 0.9

fly-Northwest 0.1

Are X and Y independent here? YES.

Page 15: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

• two random variables X and Y are conditionally independent given Z if

“once you know the value of Z, knowing Y doesn’t tell you anything about X ”

• alternatively

Conditional independence

P(X |Y,Z) = P(X | Z)

P(x, y | z) = P(x | z) × P(y | z) for all x, y,z

Page 16: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Conditional independence example

Flu Fever Headache P

true true true 0.04

true true false 0.04

true false true 0.01

true false false 0.01

false true true 0.009

false true false 0.081

false false true 0.081

false false false 0.729

e.g. P( fever,headache) ≠ P( fever) × P(headache)

Are Fever and Headache independent? NO.

Page 17: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Conditional independence example

Flu Fever Headache P

true true true 0.04

true true false 0.04

true false true 0.01

true false false 0.01

false true true 0.009

false true false 0.081

false false true 0.081

false false false 0.729

Are Fever and Headache conditionally independent given Flu:

P( fever,headache | flu) = P( fever | flu) × P(headache | flu)

P( fever,headache | ¬ flu) = P( fever | ¬ flu) × P(headache | ¬ flu)

etc.

YES.

Page 18: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Chain rule of probability

• for two variables

• for three variables

• etc.• to see that this is true, note that

P(X,Y ) = P(X |Y ) × P(Y )

P(X,Y,Z) = P(X |Y,Z) × P(Y | Z) × P(Z)

P(X,Y,Z) =P(X,Y,Z)

P(Y,Z)×

P(Y,Z)

P(Z)× P(Z)

Page 19: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Bayes theorem

• this theorem is extremely useful• there are many cases when it is hard to estimate P(x | y)

directly, but it’s not too hard to estimate P(y | x) and P(x)€

P(x | y) =P(y | x)P(x)

P(y)=

P(y | x)P(x)

P(y | x)P(x)x

Page 20: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Bayes theorem example

• MDs usually aren’t good at estimating P(Disorder| Symptom)

• they’re usually better at estimating P(Symptom | Disorder) • if we can estimate P(Fever | Flu) and P(Flu) we can use Bayes’

Theorem to do diagnosis

P( flu | fever) =P( fever | flu)P( flu)

P( fever | flu)P( flu) + P( fever |¬ flu)P(¬ flu)

Page 21: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Expected values

• the expected value of a random variable that takes on numerical values is defined as:

this is the same thing as the mean

• we can also talk about the expected value of a function of a random variable

E X[ ] = x × P(x)x

E g(X)[ ] = g(x) × P(x)x

Page 22: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Expected value examples

E Shoesize[ ] =

5 × P(Shoesize = 5) + ...+14 × P(Shoesize =14)

• Suppose each lottery ticket costs $1 and the winning ticket pays out $100.

The probability that a particular ticket is the winning ticket is 0.001.

E gain(Lottery)[ ] =

gain(winning)P(winning) + gain(losing )P(losing ) =

($100 − $1) × 0.001− $1× 0.999 =

− $0.90

Page 23: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

• distribution over the number of successes in a fixed number n of independent trials (with same probability of success p in each)

• e.g. the probability of x heads in n coin flips

The binomial distribution

P(x) =n

x

⎝ ⎜

⎠ ⎟px (1− p)n−x

P(X

=x)

p=0.5 p=0.1

x x

Page 24: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

• k possible outcomes on each trial• probability pi for outcome xi in each trial• distribution over the number of occurrences xi for each

outcome in a fixed number n of independent trials

• e.g. with k=6 (a six-sided die) and n=30

The multinomial distribution

P(x) =n!

(x i!)i

∏pi

xi

i

P([7,3,0,8,10,2]) =30!

7!×3!×0!×8!×10!×2!p1

7 p23 p3

0 p48 p5

10 p62

( )

vector of outcomeoccurrences

Page 25: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Continuous distributions

• Needed for real-valued attributes– Weight of an object– Height of a tree

• Pobability distribution needs a probability density function which must be integrated over an interval to derive probability values

Page 26: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Cumulative distribution function (CDF)

• To define a probability distribution for a continuous variable, we need to integrate f(x)

Page 27: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Examples of continuous distributions

• Uniform distribution over [a,b]

• Gaussian distribution

Page 28: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Statistics of alignment scores

Q: How do we assess whether an alignment provides good evidence for homology?

A: determine how likely it is that such an alignment score would result from chance.

What is “chance”?– real but non-homologous sequences– real sequences shuffled to preserve compositional

properties– sequences generated randomly based upon a DNA/protein

sequence model

Page 29: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Model for unrelated sequences

• we’ll assume that each position in the alignment is sampled randomly from some distribution of amino acids

• let be the probability of amino acid a

• the probability of an n-character alignment of x and y is given by

1 1

Pr( , | )i i

n n

x yi i

x y U q q= =

=∏ ∏

aq

Page 30: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Model for related sequences

• we’ll assume that each pair of aligned amino acids evolved from a common ancestor

• let be the probability that evolution gave rise to amino acid a in one sequence and b in another sequence

• the probability of an alignment of x and y is given by

abp

1

Pr( , | )i i

n

x yi

x y R p=

=∏

Page 31: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Probabilistic model of alignments

• How can we decide which possibility (U or R) is more likely?

• one principled way is to consider the relative likelihood of the two possibilities

∏∏

∏∏∏

==

iyx

iyx

iy

ix

iyx

ii

ii

ii

ii

qq

p

qq

p

Uyx

Ryx

)|,Pr(

)|,Pr(

∑ ⎟⎟⎠

⎞⎜⎜⎝

⎛=

i yx

yx

ii

ii

qq

p

Uyx

Ryxlog

)|,Pr(

)|,Pr(log

• taking the log, we get

Page 32: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Probabilistic model of alignments

• the score for an alignment is thus given by:

• the substitution matrix score for the pair a, b should thus be given by:

⎟⎟⎠

⎞⎜⎜⎝

⎛=

ba

ab

qq

pbas log),(

)|,Pr(

)|,Pr(log),(

Uyx

RyxyxsS i

ii ==∑

Page 33: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Scores from random alignments

• suppose we assume• sequence lengths m and n• a particular substitution matrix and amino-acid

frequencies

• and we consider generating random sequences of lengths m and n and finding the best alignment of these sequences

• this will give us a distribution over alignment scores for random pairs of sequences

Page 34: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

The extreme value distribution

• we’re picking the best alignments, so we want to know what the distribution of max scores for alignments against a random set of sequences looks like

• this is given by an extreme value distribution• The one that is of most interest to us is called the Gumbel

distribution

PDF

CDF

Page 35: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

• x is a given score threshold• K and are constants that can be calculated from

• the substitution matrix• the frequencies of the individual amino acids

Assessing significance of scores

λ

• the probability of observing a max score at least x from N random sequence alignments is :

Effective lengths of sequences

Page 36: Primer on Probability Sushmita Roy BMI/CS 576  Sushmita Roy sroy@biostat.wisc.edu Oct 2nd, 2012 BMI/CS 576

Statistics of alignment scores

• E value in BLAST is the expected number of sequences one would expect to see of max score x– If p is the probability of observing a score > x, E=p*m*n– n is the sum of all sequences in a database.

• theory for gapped alignments not as well developed

• computational experiments suggest this analysis holds for gapped alignments (but K and must be estimated from data)

λ