Upload
walter-chapman
View
213
Download
0
Embed Size (px)
Citation preview
CS 484 – Artificial Intelligence 1
Announcements
• Homework 8 due today, November 13• ½ to 1 page description of final project due
Thursday, November 15• Current Events
• Christian - now• Jeff - Thursday
• Research Paper due Tuesday, November 20
Probabilistic Reasoning
Lecture 15
CS 484 – Artificial Intelligence 3
Probabilistic Reasoning
• Logic deals with certainties• A → B
• Probabilities are expressed in a notation similar to that of predicates in First Order Predicate Calculus:• P(R) = 0.7
• P(S) = 0.1
• P(¬(A Λ B) V C) = 0.2
• 1 = certain; 0 = certainly not
CS 484 – Artificial Intelligence 4
What's the probability that either A is true or B is true?
A Λ
B
A B
P(A V B) =
Venn Diagram
CS 484 – Artificial Intelligence 5
Conditional Probability
• Conditional probability refers to the probability of one thing given that we already know another to be true:
• This states the probability of B, given A.
)(
)()|(
AP
ABPABP
∧=
A Λ
B
A B
CS 484 – Artificial Intelligence 6
Calculate
• P(R|S) given that the probability of rain is 0.7, the probability of sun is 0.1 and the probability of rain and sun is 0.01
• P(R|S) =
• Note: P(A|B) ≠ P(B|A)
CS 484 – Artificial Intelligence 7
Joint Probability Distributions• A joint probability distribution represents the combined
probabilities of two or more variables.
• This table shows, for example, that P (A Λ B) = 0.11 P (¬A Λ B) = 0.09
• Using this, we can calculate P(A):P(A) = P(A Λ B) + P(A Λ ¬B)
= 0.11 + 0.63= 0.74
A ⌐A
B 0.11 0.09
⌐B 0.63 0.17
A Λ
BA B
CS 484 – Artificial Intelligence 8
Bayes’ Theorem
• Bayes’ theorem lets us calculate a conditional probability:
• P(B) is the prior probability of B.• P(B | A) is the posterior probability of B.
)(
)()|()|(
AP
BPBAPABP
⋅=
CS 484 – Artificial Intelligence 9
Bayes' Theorem Deduction
• Recall: )(
)()|(
AP
ABPABP
∧=
CS 484 – Artificial Intelligence 10
Medical Diagnosis• Data
• 80% of the time you have a cold, you also have a high temperature.
• At any one time, 1 in every 10,000 people has a cold
• 1 in every 1000 people has a high temperature
• Suppose you have a high temperature. What is the likelihood that you have a cold?
CS 484 – Artificial Intelligence 11
Witness Reliability• A hit-and-run incident has been reported, and an
eye witness has stated she is certain that the car was a white taxi.
• How likely is she right?• Facts:
• Yellow taxi company has 90 cars• White taxi company has 10 cars• Expert says that given the foggy weather, the witness
has 75% chance of correctly identifying the taxi
CS 484 – Artificial Intelligence 12
Witness Reliability – Prior Probability• Imagine lady shown a sequence of 1000 cars
• Expect 900 to be yellow and 100 to be white
• Given 75% accuracy, how many will she say are white and yellow• Of 900 yellow cars, says yellow and says white
• Of 100 yellow cars, says yellow and says white
• What is the probability women says white?
• How likely is she right?
CS 484 – Artificial Intelligence 13
Comparing Conditional Probabilities• Medical diagnosis
• Probability of cold (C) is 0.0001
• P(HT|C) = 0.8
• Probability of plague (P) is 0.000000001
• P(HT|P) = 0.99
• Relative likelihood of cold and plague
)(*)|(
)(*)|(
)|(
)|(
)(
)()|()|(
)(
)(*)|()|(
PPPHTP
CPCHTP
HTPP
HTCP
HTP
PPPHTPHTPP
HTP
CPCHTPHTCP
=
∗==
CS 484 – Artificial Intelligence 14
Simple Bayesian Concept Learning (1)
• P (H|E) is used to represent the probability that some hypothesis, H, is true, given evidence E.
• Let us suppose we have a set of hypotheses H1…Hn.
• For each Hi • Hence, given a piece of evidence, a learner can
determine which is the most likely explanation by finding the hypothesis that has the highest posterior probability.
)(
)()|()|(
EP
HPHEPEHP ii
i
⋅=
CS 484 – Artificial Intelligence 15
Simple Bayesian Concept Learning (2)
• In fact, this can be simplified. • Since P(E) is independent of Hi it will have the same
value for each hypothesis. • Hence, it can be ignored, and we can find the
hypothesis with the highest value of:
• We can simplify this further if all the hypotheses are equally likely, in which case we simply seek the hypothesis with the highest value of P(E|Hi).
• This is the likelihood of E given Hi.
)()|( ii HPHEP ⋅
CS 484 – Artificial Intelligence 16
Bayesian Belief Networks (1)
• A belief network shows the dependencies between a group of variables.
• If two variables A and B are independent if the likelihood that A will occur has nothing to do with whether B occurs.
• C and D are dependent on A; D and E are dependent on B.
• The Bayesian belief network has probabilities associated with each link. E.g., P(C|A) = 0.2, P(C|¬A) = 0.4
CS 484 – Artificial Intelligence 17
Bayesian Belief Networks (2)
• A complete set of probabilities for this belief network might be:• P(A) = 0.1• P(B) = 0.7• P(C|A) = 0.2• P(C|¬A) = 0.4• P(D|A Λ B) = 0.5• P(D|A Λ ¬B) = 0.4• P(D|¬A Λ B) = 0.2• P(D|¬A Λ ¬B) = 0.0001• P(E|B) = 0.2• P(E|¬B) = 0.1
CS 484 – Artificial Intelligence 18
Bayesian Belief Networks (3)
• We can now calculate conditional probabilities:
• In fact, we can simplify this, since there are no dependencies between certain pairs of variables – between E and A, for example. Hence:
)()|(),|(),,|(),,,|(),,,,(
),,,(),,,|(),,,,(
APABPBACPCBADPDCBAEPEDCBAP
DCBAPDCBAEPEDCBAP
⋅⋅⋅⋅=⋅=
)()()|(),|()|(),,,,( APBPACPBADPBEPEDCBAP ⋅⋅⋅⋅=
CS 484 – Artificial Intelligence 19
College Life Example• C = that you will go to college• S = that you will study• P = that you will party• E = that you will be successful in your exams• F = that you will have fun
C
S P
E F
CS 484 – Artificial Intelligence 20
College Life Example
C
S P
E F
P(C)
0.2
C P(S)
true 0.8
false 0.2
C P(P)
true 0.6
false 0.5
S P P(E)
true true 0.6
true false 0.9
false true 0.1
false false 0.2
P P(F)
true 0.9
false 0.7
CS 484 – Artificial Intelligence 21
College Example
• Using the tables to solve problems such as
P(C==true, S = true, P = false, E = true, F = false) ==
P(C,S, ¬P,E, ¬F)• General solution
∏=
=n
iin ExPxxP
11 )|(),,( K
)|()|()|()|()(
),,,,(
PFPPSEPCPPCSPCP
FEPSCP
¬¬⋅¬∧⋅¬⋅⋅=¬¬
CS 484 – Artificial Intelligence 22
Noisy-V Function• Want to assume know all reasons for a possible event
• E.g. Medical Diagnosis System• P(HT|C) = 0.8• P(HT|P) = 0.99• Assume P(HT|C V P) = 1 (?)
• Assumption clearly not true
• Leak node – represents all other causes• P(HT|O) = 0.9
• Define noise parameters – conditional probabilities for ¬HT• P(¬ HT|C) = 1 – P(HT|C) = 0.2• P(¬ HT|P) = • P(¬ HT|O) =
• Further assumption – the causes of a high temperature are independent of each other and the noisy parameters are independent
CS 484 – Artificial Intelligence 23
Noisy V-Function
• Benefit of Noisy V-Function• If cold, plague, and other is all false, P(¬HT) =
1• Otherwise, P(¬HT) is equal to product of the
noise parameters for all the variables that are true
• E.g. If plague and other is true and cold is false, P(HT) = 1 – (0.01 * 0.1) = 0.999
• Benefit – don’t need to store as many values as the Bayesian belief network
CS 484 – Artificial Intelligence 24
Bayes’ Optimal Classifier
• A system that uses Bayes’ theory to classify data.• We have a piece of data y, and are seeking the correct hypothesis from H1 …
H5, each of which assigns a classification to y.• The probability that y should be classified as cj is:
• x1 to xn are the training data, and m is the number of hypotheses.• This method provides the best possible classification for a piece of data.• Example: Given some date will classify it as true or false • P(true|x1,…,xn) =
• P(false|x1,…,xn) =
∑=
⋅=m
iniijnj xxhPhcPxxcP
111 )|()|()|( KK
P(H1| x1,…,xn) = 0.2 P(false|H1) = 0 P(true|H1) = 1
P(H2| x1,…,xn) = 0.3 P(false|H2) = 0 P(true|H2) = 1
P(H3| x1,…,xn) = 0.1 P(false|H3) = 1 P(true|H3) = 0
P(H4| x1,…,xn) = 0.25 P(false|H4) = 0 P(true|H4) = 1
P(H5| x1,…,xn) = 0.15 P(false|H5) = 1 P(true|H5) = 0
CS 484 – Artificial Intelligence 25
The Naïve Bayes Classifier (1)
• A vector of data is classified as a single classification.
p(ci| d1, …, dn)• The classification with the highest posterior probability is
chosen.• The hypothesis which has the highest posterior probability
is the maximum a posteriori, or MAP hypothesis. • In this case, we are looking for the MAP classification.• Bayes’ theorem is used to find the posterior probability:
),,(
)()|,,(
1
1
n
iin
ddP
cPcddP
KK ⋅
CS 484 – Artificial Intelligence 26
The Naïve Bayes Classifier (2)
• Since P(d1, …, dn) is a constant, independent of ci, we can eliminate it, and simply aim to find the classification ci, for which the following is maximised:
• We now assume that all the attributes d1, …, dn are independent
• So P(d1, …, dn|ci) can be rewritten as:
• The classification for which this is highest is chosen to classify the data.
)()|,,( 1 iin cPcddP ⋅K
∏=
⋅n
jiji cdPcP
1
)|()(
CS 484 – Artificial Intelligence 27
Classifier Examplex y z Classification
2 3 2 A
4 1 4 B
1 3 2 A
2 4 3 A
4 2 4 B
2 1 3 C
1 2 4 A
2 3 3 B
2 2 4 A
3 3 3 C
3 2 1 A
1 2 1 B
2 1 4 A
4 3 4 C
2 2 4 A
• New piece of data to classify• (x = 2, y = 3, z =4)
• Want P(ci|x=2,y=3,z=4)
• P(A) * P(x=2|A) * P(y=3|A) * P(z=4|A)
• P(B) * P(x=2|B) * P(y=3|B) * P(z=4|B)
Training Data
0417.08
4
8
2
8
5
15
8=⋅⋅⋅
CS 484 – Artificial Intelligence 28
M-estimate• Problem with too little training data
• (x=1, y=2, z=2)• P(x=1 | B) = 1/4• P(y=2 | B) = 2/4• P(z=2 | B) = 0
• Avoid problem by using M-estimate which pads the computation with additional samples • Conditional probability = (a + mp) / (b + m)• m = 5 (equivalent sample size)• p = 1/num_values_for_category (1/4 for x)• a = training example with category value and
classification (x=1 and B is 1)• b = training examples with classification (B is 4)
CS 484 – Artificial Intelligence 29
Collaborative Filtering
• A method that uses Bayesian reasoning to suggest items that a person might be interested in, based on their known interests.
• If we know that Anne and Bob both like A, B and C, and that Anne likes D then we guess that Bob would also like D.• P(Bob likes Z | Bob likes A, Bob likes B, …, Bob likes Y)
• Can be calculated using decision trees:
B