Probability Review

Probability Review

1

CS479/679 Pattern RecognitionDr. George Bebis

Why Bother About Probabilities?

• Accounting for uncertainty is a crucial component in decision making (e.g., classification) because of ambiguity in our measurements.

• Probability theory is the proper mechanism for accounting for uncertainty.

• Take into account a-priori knowledge, for example: "If the fish was caught in the Atlantic ocean, then it is more likely to be salmon than sea-bass

2

Definitions• Random experiment– An experiment whose result is not certain in advance

(e.g., throwing a die)

• Outcome– The result of a random experiment

• Sample space– The set of all possible outcomes (e.g., {1,2,3,4,5,6})

• Event– A subset of the sample space (e.g., obtain an odd

number in the experiment of throwing a die = {1,3,5})

3

Intuitive Formulation of Probability

• Intuitively, the probability of an event α could be defined as:

• Assumes that all outcomes are equally likely (Laplacian definition)

4

where where N(a) is the number of times that event is the number of times that event α happens happens in in nn trials trials

Axioms of Probability

5

C)

A1

A2

A3

A4

S

Prior (Unconditional) Probability

• This is the probability of an event in the absence of any evidence.

P(Cavity)=0.1 means that

“in the absence of any other information, there is a 10% chance that the patient is having a cavity”.

6

Posterior (Conditional) Probability

• This is the probability of an event given some evidence.

P(Cavity/Toothache)=0.8 means that

“there is an 80% chance that the patient is having a cavity given that he is having a toothache”

7

Posterior (Conditional) Probability (cont’d)

• Conditional probabilities can be defined in terms of unconditional probabilities:

• Conditional probabilities lead to the chain rule:

P(A,B)=P(A/B)P(B)=P(B/A)P(A)

( , ) ( , )( / ) , /

( ) ( )

P A B P A BP A B P B A

P B P A

8

Law of Total Probability• If A1, A2, …, An is a partition of mutually exclusive events

and B is any event, then:

• Special case :

1 2

1 1 2 2

1

( ) ( , ) ( , ) ... ( , )

( / ) ( ) ( / ) ( ) ... ( / ) ( )

( / ) ( )

n

n n

n

j jj

P B P B A P B A P B A

P B A P A P B A P A P B A P A

P B A P A

9

S

( ) ( , ) ( , ) ( / ) ( ) ( / ) ( )P B P B A P B A P B A P A P B A P A

Bayes’ Theorem

• Conditional probabilities lead to the Bayes’ rule:

where

( / ) ( )( / )

( )

P B A P AP A B

P B

10

( ) ( , ) ( , ) ( / ) ( ) ( / ) ( )P B P B A P B A P B A P A P B A P A

( , ) ( , )( / ) , /

( ) ( )

P A B P A BP A B P B A

P B P A

Example

• Consider the probability of Disease given Symptom:

( / ) ( )( / )

( )

P Symptom Disease P DiseaseP Disease Symptom

P Symptom

11

( ) ( / ) ( )

( / ) ( )

P Symptom P Symptom Disease P Disease

P Symptom Disease P Disease

where:

Example (cont’d)

• Meningitis causes a stiff neck 50% of the time.• A patient comes in with a stiff neck – what is the probability

that he has meningitis?

• Need to know the following:– The prior probability of a patient having meningitis (P(M)=1/50,000)– The prior probability of a patient having a stiff neck (P(S)=1/20)

P(M/S)=0.0002

( / ) ( )( / )

( )

P S M P MP M S

P S

12

General Form of Bayes’ Rule

• If A1, A2, …, An is a partition of mutually exclusive events and B is any event, then the Bayes’ rule is given by:

where

( / ) ( )( / )

( )i i

i

P B A P AP A B

P B

1

( ) ( / ) ( )n

j jj

P B P B A P A

13

Independence

• Two events A and B are independent iff:

P(A,B)=P(A)P(B)

• Using the formula above, we can show:

P(A/B)=P(A) and P(B/A)=P(B)

• A and B are conditionally independent given C iff:

P(A/B,C)=P(A/C)

e.g., P(WetGrass/Season,Rain)=P(WetGrass/Rain)

14

Random Variables

• In many experiments, it is easier to deal with a summary variable than with the original probability structure.

Example• In an opinion poll, we ask 50 people whether agree or disagree

with a certain issue.– Suppose we record a "1" for agree and "0" for disagree.– The sample space for this experiment has 250 elements.

• Suppose we are only interested in the number of people who agree.– Define the variable X=number of "1“ 's recorded out of 50.– The new sample space has only 51 elements.

15

Random Variables (cont’d)

• A random variable (r.v.) is a function that assigns a value to the outcome of a random experiment.

16

X(j)

Random Variables (cont’d)

• How do we compute probabilities using random variables?– Suppose the sample space is S=<s1, s2, …, sn>

– Suppose the range of the random variable X is <x1,x2,…,xm>

– We observe X=xj iff the outcome of the random experiment is an such that X(sj)=xj

js S

17

: ( )

( ) ( )j j j

j js X s x

P X x P s

Example

• Consider the experiment of throwing a pair of dice

18

Discrete/Continuous Random Variables

• A discrete r.v. can assume only discrete values.

• A continuous r.v. can assume a continuous range of values (e.g., sensor readings).

19

Probability mass function (pmf)

• The pmf (discrete r.v.) of a r.v. X assigns a probability for each possible value of X.

• Notation: given two r.v.'s, X and Y, their pmf are denoted as pX(x) and pY(y); for convenience, we will drop the subscripts and denote them as p(x) and p(y). However, keep in mind that these functions are different!

20

Probability density function (pdf)

• The pdf (continuous r.v.) of a r.v. X shows the probability of being “close” to some number x

• Notation: given two r.v.'s, X and Y, their pdf are denoted as pX(x) and pY(y); for convenience, we will drop the subscripts and denote them as p(x) and p(y). However, keep in mind that these functions are different!

21

Probability Distribution Function (PDF)

• Definition:

• Some properties of PDF:– (1) – (2) F(x) is a non-decreasing function of x

• If X is discrete, its PDF can be computed as follows:

( ) ( )F x P X x

0 ( ) 1F x

22

0 0

( ) ( ) ( ) ( )x x

k k

F x P X x P X k p k

Probability Distribution Function (PDF) (cont’d)

23

pmf

Probability Distribution Function (PDF) (cont’d)

• If X is continuous, its PDF can be computed as follows:

• The pdf can be obtained from the PDF using:

( ) ( )x

F x p t dt for all x

( ) ( )dF

p x xdx

24

Example

• Gaussian pdf and PDF

25

0

Joint pmf (discrete case)

• For n random variables, the joint pmf assigns a probability for each possible combination of values:

p(x1,x2,…,xn)=P(X1=x1, X2=x2, …, Xn=xn)

Notation: the joint pmf /pdf of the r.v.'s X1, X2, ..., Xn and Y1, Y2, ...,

Yn are denoted as pX1X2...Xn(x1,x2,...,xn) and pY1Y2...Yn(y1,y2,...,yn); for convenience, we will drop the subscripts and denote them p(x1,x2,...,xn) and p(y1,y2,...,yn), keep in mind, however, that these are two different functions.

26

1 2

1 2, ,...,

( , ,..., ) 1n

nx x x

p x x x

Joint pmf (discrete r.v.) (cont’d)

• Specifying the joint pmf requires a large number of values – kn assuming n random variables where each one can

assume one of k discrete values.– Can be simplified if we assume independence or

conditional independence.

P(Cavity, Toothache) is a 2 x 2 matrix

ToothacheToothache Not ToothacheNot Toothache

CavityCavity 0.040.04 0.060.06

Not CavityNot Cavity 0.010.01 0.890.89

27

Joint Probability

Sum of probabilities = 1.0

Joint pdf (continuous r.v.)

For n random variables, the joint pdf gives the probability of being “close” to a given combination of values:

1 2( , ,..., ) 0np x x x

1

1 2 1... ( , ,..., ) ... 1n

n n

x x

p x x x dx dx

28

Conditional pdf

• The conditional pdf can be derived from the joint pdf:

• Conditional pdfs lead to the chain rule:

• General case:

1 2 1 2 2 3 1( , ,..., ) ( / ,..., ) ( / ,..., )... ( / ) ( )n n n n n np x x x p x x x p x x x p x x p x29

( , )( / )

( )

p x yp y x

p x

( , ) ( / ) ( )p x y p y x p x

Independence

• Knowledge about independence between r.v.'s is very useful because it simplifies things.

e.g., if X and Y are independent, then:

( , ) ( ) ( )p x y p x p y

30

2D 1D

Law of Total Probability

• The law of total probability:

( ) ( / ) ( )x

p y p y x p x

31

Marginalization

• Given the joint pmf/pdf, we can compute the pmf/pdf of any subset of variables using marginalization:

–Examples:

( ) ( , )p x p x y dy

32

1 2 1 1 1 2( , ,..., , ,..., ) ( , ,..., )i i n n ip x x x x x p x x x dx

3

1 2 1 2 3( , ) ... ( , ,..., ) ...n

n n

x x

p x x p x x x dx dx

The joint pmf/pdf is very useful!

33

Normal (Gaussian) Distribution

34

d-dimensional:

d x 1 d x d symmetric matrix

1-dimensional:-

Normal (Gaussian) Distribution (cont’d)

• Parameters and shape of Gaussian distribution– Number of parameters is – Shape determined by Σ

( 1)2

d dd

35

Normal (Gaussian) Distribution (cont’d)

• Mahalanobis distance:

• If the variables are independent, the multivariate normal distribution becomes:

2 1( ) ( )tr x x

2

22

( )1( ) exp[ ]

22i i

iii

xp

x

36

Σ is diagonal

Expected Value

37

Expected Value (cont’d)

• The sample mean and expected value are related by:

• The expected value for a continuous r.v. is given by:

38

Variance and Standard Deviation

39

Covariance

40

Covariance Matrix – 2 variables

41

and Cov(X,Y)=Cov(Y,X)

Σ =

(i.e., symmetric matrix)

can be computed using:

Covariance Matrix – n variables

42

(i.e., symmetric matrix)

Uncorrelated r.v.’s

43

D D

Properties of covariance matrix

44

(Note: we will review these concepts later in case you do not remember them)

Covariance Matrix Decomposition

45

where Φ-1= ΦT

Linear Transformations

46

Linear Transformations (cont’d)

47

Whiteningtransform

Documents

Probability Review