Upload
stan
View
41
Download
3
Embed Size (px)
DESCRIPTION
Probability Review. CS479/679 Pattern Recognition Dr. George Bebis. Why Bother About Probabilities?. Probability theory is the proper mechanism for accounting for uncertainty . - PowerPoint PPT Presentation
Citation preview
Probability Review
1
CS479/679 Pattern RecognitionDr. George Bebis
Why Bother About Probabilities?
• Accounting for uncertainty is a crucial component in decision making (e.g., classification) because of ambiguity in our measurements.
• Probability theory is the proper mechanism for accounting for uncertainty.
• Take into account a-priori knowledge, for example: "If the fish was caught in the Atlantic ocean, then it is more likely to be salmon than sea-bass
2
Definitions• Random experiment– An experiment whose result is not certain in advance
(e.g., throwing a die)
• Outcome– The result of a random experiment
• Sample space– The set of all possible outcomes (e.g., {1,2,3,4,5,6})
• Event– A subset of the sample space (e.g., obtain an odd
number in the experiment of throwing a die = {1,3,5})
3
Intuitive Formulation of Probability
• Intuitively, the probability of an event α could be defined as:
• Assumes that all outcomes are equally likely (Laplacian definition)
4
where where N(a) is the number of times that event is the number of times that event α happens happens in in nn trials trials
Axioms of Probability
5
C)
A1
A2
A3
A4
S
Prior (Unconditional) Probability
• This is the probability of an event in the absence of any evidence.
P(Cavity)=0.1 means that
“in the absence of any other information, there is a 10% chance that the patient is having a cavity”.
6
Posterior (Conditional) Probability
• This is the probability of an event given some evidence.
P(Cavity/Toothache)=0.8 means that
“there is an 80% chance that the patient is having a cavity given that he is having a toothache”
7
Posterior (Conditional) Probability (cont’d)
• Conditional probabilities can be defined in terms of unconditional probabilities:
• Conditional probabilities lead to the chain rule:
P(A,B)=P(A/B)P(B)=P(B/A)P(A)
( , ) ( , )( / ) , /
( ) ( )
P A B P A BP A B P B A
P B P A
8
Law of Total Probability• If A1, A2, …, An is a partition of mutually exclusive events
and B is any event, then:
• Special case :
1 2
1 1 2 2
1
( ) ( , ) ( , ) ... ( , )
( / ) ( ) ( / ) ( ) ... ( / ) ( )
( / ) ( )
n
n n
n
j jj
P B P B A P B A P B A
P B A P A P B A P A P B A P A
P B A P A
9
S
( ) ( , ) ( , ) ( / ) ( ) ( / ) ( )P B P B A P B A P B A P A P B A P A
Bayes’ Theorem
• Conditional probabilities lead to the Bayes’ rule:
where
( / ) ( )( / )
( )
P B A P AP A B
P B
10
( ) ( , ) ( , ) ( / ) ( ) ( / ) ( )P B P B A P B A P B A P A P B A P A
( , ) ( , )( / ) , /
( ) ( )
P A B P A BP A B P B A
P B P A
Example
• Consider the probability of Disease given Symptom:
( / ) ( )( / )
( )
P Symptom Disease P DiseaseP Disease Symptom
P Symptom
11
( ) ( / ) ( )
( / ) ( )
P Symptom P Symptom Disease P Disease
P Symptom Disease P Disease
where:
Example (cont’d)
• Meningitis causes a stiff neck 50% of the time.• A patient comes in with a stiff neck – what is the probability
that he has meningitis?
• Need to know the following:– The prior probability of a patient having meningitis (P(M)=1/50,000)– The prior probability of a patient having a stiff neck (P(S)=1/20)
P(M/S)=0.0002
( / ) ( )( / )
( )
P S M P MP M S
P S
12
General Form of Bayes’ Rule
• If A1, A2, …, An is a partition of mutually exclusive events and B is any event, then the Bayes’ rule is given by:
where
( / ) ( )( / )
( )i i
i
P B A P AP A B
P B
1
( ) ( / ) ( )n
j jj
P B P B A P A
13
Independence
• Two events A and B are independent iff:
P(A,B)=P(A)P(B)
• Using the formula above, we can show:
P(A/B)=P(A) and P(B/A)=P(B)
• A and B are conditionally independent given C iff:
P(A/B,C)=P(A/C)
e.g., P(WetGrass/Season,Rain)=P(WetGrass/Rain)
14
Random Variables
• In many experiments, it is easier to deal with a summary variable than with the original probability structure.
Example• In an opinion poll, we ask 50 people whether agree or disagree
with a certain issue.– Suppose we record a "1" for agree and "0" for disagree.– The sample space for this experiment has 250 elements.
• Suppose we are only interested in the number of people who agree.– Define the variable X=number of "1“ 's recorded out of 50.– The new sample space has only 51 elements.
15
Random Variables (cont’d)
• A random variable (r.v.) is a function that assigns a value to the outcome of a random experiment.
16
X(j)
Random Variables (cont’d)
• How do we compute probabilities using random variables?– Suppose the sample space is S=<s1, s2, …, sn>
– Suppose the range of the random variable X is <x1,x2,…,xm>
– We observe X=xj iff the outcome of the random experiment is an such that X(sj)=xj
js S
17
: ( )
( ) ( )j j j
j js X s x
P X x P s
Example
• Consider the experiment of throwing a pair of dice
18
Discrete/Continuous Random Variables
• A discrete r.v. can assume only discrete values.
• A continuous r.v. can assume a continuous range of values (e.g., sensor readings).
19
Probability mass function (pmf)
• The pmf (discrete r.v.) of a r.v. X assigns a probability for each possible value of X.
• Notation: given two r.v.'s, X and Y, their pmf are denoted as pX(x) and pY(y); for convenience, we will drop the subscripts and denote them as p(x) and p(y). However, keep in mind that these functions are different!
20
Probability density function (pdf)
• The pdf (continuous r.v.) of a r.v. X shows the probability of being “close” to some number x
• Notation: given two r.v.'s, X and Y, their pdf are denoted as pX(x) and pY(y); for convenience, we will drop the subscripts and denote them as p(x) and p(y). However, keep in mind that these functions are different!
21
Probability Distribution Function (PDF)
• Definition:
• Some properties of PDF:– (1) – (2) F(x) is a non-decreasing function of x
• If X is discrete, its PDF can be computed as follows:
( ) ( )F x P X x
0 ( ) 1F x
22
0 0
( ) ( ) ( ) ( )x x
k k
F x P X x P X k p k
Probability Distribution Function (PDF) (cont’d)
23
pmf
Probability Distribution Function (PDF) (cont’d)
• If X is continuous, its PDF can be computed as follows:
• The pdf can be obtained from the PDF using:
( ) ( )x
F x p t dt for all x
( ) ( )dF
p x xdx
24
Example
• Gaussian pdf and PDF
25
0
Joint pmf (discrete case)
• For n random variables, the joint pmf assigns a probability for each possible combination of values:
p(x1,x2,…,xn)=P(X1=x1, X2=x2, …, Xn=xn)
Notation: the joint pmf /pdf of the r.v.'s X1, X2, ..., Xn and Y1, Y2, ...,
Yn are denoted as pX1X2...Xn(x1,x2,...,xn) and pY1Y2...Yn(y1,y2,...,yn); for convenience, we will drop the subscripts and denote them p(x1,x2,...,xn) and p(y1,y2,...,yn), keep in mind, however, that these are two different functions.
26
1 2
1 2, ,...,
( , ,..., ) 1n
nx x x
p x x x
Joint pmf (discrete r.v.) (cont’d)
• Specifying the joint pmf requires a large number of values – kn assuming n random variables where each one can
assume one of k discrete values.– Can be simplified if we assume independence or
conditional independence.
P(Cavity, Toothache) is a 2 x 2 matrix
ToothacheToothache Not ToothacheNot Toothache
CavityCavity 0.040.04 0.060.06
Not CavityNot Cavity 0.010.01 0.890.89
27
Joint Probability
Sum of probabilities = 1.0
Joint pdf (continuous r.v.)
For n random variables, the joint pdf gives the probability of being “close” to a given combination of values:
1 2( , ,..., ) 0np x x x
1
1 2 1... ( , ,..., ) ... 1n
n n
x x
p x x x dx dx
28
Conditional pdf
• The conditional pdf can be derived from the joint pdf:
• Conditional pdfs lead to the chain rule:
• General case:
1 2 1 2 2 3 1( , ,..., ) ( / ,..., ) ( / ,..., )... ( / ) ( )n n n n n np x x x p x x x p x x x p x x p x29
( , )( / )
( )
p x yp y x
p x
( , ) ( / ) ( )p x y p y x p x
Independence
• Knowledge about independence between r.v.'s is very useful because it simplifies things.
e.g., if X and Y are independent, then:
( , ) ( ) ( )p x y p x p y
30
2D 1D
Law of Total Probability
• The law of total probability:
( ) ( / ) ( )x
p y p y x p x
31
Marginalization
• Given the joint pmf/pdf, we can compute the pmf/pdf of any subset of variables using marginalization:
–Examples:
( ) ( , )p x p x y dy
32
1 2 1 1 1 2( , ,..., , ,..., ) ( , ,..., )i i n n ip x x x x x p x x x dx
3
1 2 1 2 3( , ) ... ( , ,..., ) ...n
n n
x x
p x x p x x x dx dx
The joint pmf/pdf is very useful!
33
Normal (Gaussian) Distribution
34
d-dimensional:
d x 1 d x d symmetric matrix
1-dimensional:-
Normal (Gaussian) Distribution (cont’d)
• Parameters and shape of Gaussian distribution– Number of parameters is – Shape determined by Σ
( 1)2
d dd
35
Normal (Gaussian) Distribution (cont’d)
• Mahalanobis distance:
• If the variables are independent, the multivariate normal distribution becomes:
2 1( ) ( )tr x x
2
22
( )1( ) exp[ ]
22i i
iii
xp
x
36
Σ is diagonal
Expected Value
37
Expected Value (cont’d)
• The sample mean and expected value are related by:
• The expected value for a continuous r.v. is given by:
38
Variance and Standard Deviation
39
Covariance
40
Covariance Matrix – 2 variables
41
and Cov(X,Y)=Cov(Y,X)
Σ =
(i.e., symmetric matrix)
can be computed using:
Covariance Matrix – n variables
42
(i.e., symmetric matrix)
Uncorrelated r.v.’s
43
D D
Properties of covariance matrix
44
(Note: we will review these concepts later in case you do not remember them)
Covariance Matrix Decomposition
45
where Φ-1= ΦT
Linear Transformations
46
Linear Transformations (cont’d)
47
Whiteningtransform