View
219
Download
1
Tags:
Embed Size (px)
Citation preview
Expert Systems 8 1
Bayesian Networks
1. Probability theory2. BN as knowledge model3. Bayes in Court4. Dazzle examples5. Conclusions
Reverend Thomas Bayes(1702-1761)
Jenneke IJzerman,Bayesiaanse Statistiek in de Rechtspraak,VU Amsterdam, September 2004.http://www.few.vu.nl/onderwijs/stage/werkstuk/werkstukken/werkstuk-ijzerman.doc
Expert Systems 8 2
Thought Experiment: Hypothesis Selection
Imagine two types of bag:• BagA: 250 + 750• BagB: 750 + 250
Take 5 balls from a bag:• Result: 4 + 1What is the type of the bag?
Probability of this result from• BagA: 0. 0144• BagB: 0. 396Conclusion: The bag is BagB.
But…• We don’t know how the bag
was selected• We don’t even know that type
BagB exists
• Experiment is meaningful only in light of the a priori posed hypotheses (BagA, BagB) and their assumed likelihoods.
Expert Systems 8 3
Classical and Bayesian statistics
Classical statistics:• Compute the prob for your
data, assuming a hypothesis• Reject a hypothesis if the
data becomes unlikely
Bayesian statistics:• Compute the prob for a
hypothesis, given your data• Requires a priori prob for
each hypothesis;these are extremely important!
Expert Systems 8 4
Part I: Probability theory
What is a probability?• Frequentist: relative
frequency of occurrence.• Subjectivist: amount of belief
• Mathematician:Axioms (Kolmogorov),assignment of non-negative numbers to a set of states, sum 1 (100%).
State has several variables: product space.
With n binary variables: 2n.
Multi-valued variables.
Blont Not blond
30 70
Blond Not blond
Mother blond
15 15
Mother n.b.
15 55
Expert Systems 8 5
Conditional Probability: Using evidence
• First table:Probability for any woman to deliver blond baby
• Second table:Describes for blond and non-blond mothers separately
• Third table:Describe only for blond mother
Row is rescaled with its weight;Def. conditional probability:
Pr(A|B) = Pr( A & B ) / Pr(B)
Rewrite:Pr(A & B) = Pr(B) x Pr(A | B)
Blond Not blond
30 70
Blond Not blond
Mother
blond
15 15
Mother n.b.
15 55
Blond Not blond
Mother
blond
50 50
Expert Systems 8 6
Dependence and Independence
• The prob for a blond child are 30%, but larger for a blond mother and smaller for a non-blond mother.
• The prob for a boy are 50%, also for blond mothers, and also for non-blond mothers.
Def.: A and B are independent: Pr(A|B) = Pr(A)
Exercise: Show thatPr(A|B) = Pr(A)
is equivalent toPr(B|A) = Pr(B)
(aka B and A are independent).
Blond Not blond
Mother blond
15 15
Mother n.b.
15 55
Boy Girl
Mother blond
15 15
Mother n.b.
35 35
Boy Girl
Mother blond
50 50
Expert Systems 8 7
Bayes Rule: from data to hypothesis
4 + 1 Other
BagA 0.0144 0.986
BagB 0.396 0.604
Other
• Classical Probability Theory:0.0144 is the relative weight of 4+1 in the ROW of BagA.
• Bayesian Theory describes the distribution over the column of 4+1.
Classical statistics: ROW distribution
Bayesian statistic: COLUMN distr.
Bayes’ Rule:• Observe that
Pr(A & B) = Pr(A) x Pr(B|A) = Pr(B) x Pr(A|B)
• Conclude Bayes’ Rule:
)()()|(
)|(BP
APABPBAP
Expert Systems 8 8
Reasons for Dependence 1: Causality
• Dependency: P(B|A) ≠ P(B)• Positive Correlation: >• Negative correlation: <
Possible explanation:A causes B.
Example: P(headache) = 6%P(ha | party) =
10%P(ha | ¬party) =
2%
h.a. no h.a.
party 5 45
no part 1 49
Alternative explanation:B causes A.
In the same example:P(party) = 50%P(party | h.a.) = 83%P(party | no h.a.) = 48%
“Headaches make students go to parties.”
In statistics, correlation has no direction.
Expert Systems 8 9
2. Table of headache and money:
Pr(broke) = 30%Pr(broke | h.a.) = 50%
3. Table of headache and money for party attendants:
Reasons for Dependence 2: Common cause
1. The student party may lead to headache and is costly(money versus broke):
mon-brh.a. no h.a.
party 52-3
4518-27
no part 11-0
4949-0
h.a. no h.a.
money 3 67
broke 3 27
h.a. no h.a.
money 2 18
broke 3 27
This dependency disappears if the common cause variable is known
Expert Systems 8 10
Reasons for Dependence 3: Common effect
A and B are independent:
Pr(B) = 80%Pr(B|A) = 80%B and A are independent.
Their combination stimulates C; for instances satisfying C:
Pr(B) = 90%Pr(B|A) = 93%, Pr(B|¬A)=80%
(#C) A non A
B 40 (14) 40 (4)
non B 10 (1) 10 (1)
A non A
B 14 4
non B 1 1
This dependency appears if the common effect variable is known
Expert Systems 8 11
Part II: Bayesian Networks
• Probabilistic Graphical Model• Probabilistic Network• Bayesian Network• Belief Network
Consists of:• Variables (n)• Domains (here binary)• Acyclic arc set, modeling the
statistical influences• Per variable V (indegree k):
Pr(V | E), for 2k cases of E.
Information in node:exponential in indegree.
Pr -
pa 50%pa
brha
Pr pa ¬pa
ha 10% 2%
Pr pa ¬pa
br 40% 0%
C
BA
Pr A,B A,¬B ¬A,B ¬A,¬B
C 56% 10% 10% 10%
Expert Systems 8 12
The Bayesian Network ModelClosed World Assumption• Rule based:
IF x attends party THEN x has headache WITH cf = .10What if x didn’t attend?
• Bayesian model:
Direction of arcs and correlation
pa
ha Pr pa ¬pa
ha 10% 2%
Pr -
pa 50%
pa
ha
Pr ha ¬ha
pa 83% 48%
Pr -
ha 6%
Pr(ha|¬pa) is included: claim all relevant info is modeled
1. BN does not necessarily model causality
2. Built upon HE understanding of relationships; often causal
Expert Systems 8 13
A little theorem
• A Bayesian network on n binary variables uniquely defines a probability distribution over the associated set of 2n states.
• Distribution has 2n parameters(numbers in [0..1] with sum 1).
• Typical network has in-degree 2 to 3:represented by 4n to 8n parameters (PIGLET!!).
• Bayesian Networks are an efficient representation
Expert Systems 8 14
The Utrecht DSS group
• Initiated by Prof Linda van der Gaag from ~1990• Focus: development of BN support tools• Use experience from building several actual BNs• Medical
applications• Oesoca,
~40 nodes.
• Courses:ProbabilisticReasoning
• NetworkAlgoritms(Ma ACS).
Expert Systems 8 15
How to obtain a BN model
Describe Human Expert knowledge:Metastatic Cancer may be detected by an increased level of serum calcium (SC). The Brain Tumor (BT) may be seen on a CT scan (CT). Severe headaches (SH) are indicative for the presence of a brain tumor. Both a Brain tumor and an increased level of serum calcium may bring the patient in a coma (Co).
Probabilities: Expert guess or statistical study
Learn BN structure automatically from data by means of Data Mining• Research of Carsten• Models not intuitive• Not considered XS• Helpful addition to
Knowledge Acquisition from Human Expert
• Master ACS.mc
sc
bt
co sh
ct
Expert Systems 8 16
Inference in Bayesian Networks
The probability of a stateS = (v1, .. , vn):Multiply Pr(vi | S)
The marginal (overall) probability of each variable:
Sampling: Produce a series of cases, distributed according to the probability distribution implicit in the BN
Pr -
pa 50%pa
brha
Pr pa ¬pa
ha 10% 2%
Pr pa ¬pa
br 40% 0%
Pr (pa, ¬ha, ¬br)= 0.50 * 0.90 * 0.60
= 0.27
Pr(pa) = 50%
Pr(ha) = 6%
Pr(br) = 20%
Expert Systems 8 17
Consultation: Entering Evidence
Consultation applies the BN knowledge to a specific case• Known variable values can be entered into the network• Probability tables for all nodes are updated
• Obtain (sthlike) new BNmodeling theconditionaldistribution
• Again, showdistributionsand stateprobabilities
• Backward andForwardpropagation
Expert Systems 8 18
Test Selection (Danielle)
• In consultation, enter data until goal variable is known with sufficient probability.
• Data items are obtained at specific cost.
• Data items influence the distribution of the goal.
Problem:• Given the current state of the
consultation, find out what is the best variable to test next.
Started CS study 1996,
PhD Thesis defense Oct 2005
Expert Systems 8 19
Some more work done in Linda’s DSS group
• Sensitivity Analysis:Numerical parameters in the BN may be inaccurate;how does this influence the consultation outcome?
• More efficient inferencing:Inferencing is costly, especially in the presence of
• Cycles (NB.: There are no directed cycles!)
• Nodes with a high in-degree
Approximate reasoning, network decompositions, …
• Writing a program tool: Dazzle
Expert Systems 8 20
Part III: In the Courtroom
What happens in a trial?• Prosecutor and Defense
collect information• Judge decides if there is
sufficient evidence that person is guilty
Forensic tests are far moreconclusive than medical onesbut still probabilistic innature!
Pr(symptom|sick) = 80%Pr(trace|innocent) = 0.01%
Tempting to forget statistics.Need a priori probabilities.
)()()|(
)|(BP
APABPBAP
Jenneke IJzerman, Bayesiaanse Statistiek in de Rechtspraak, VU Amsterdam, September 2004.
Expert Systems 8 21
Prosecutor’s Fallacy
The story:• A DNA sample was taken
from the crime site• Probability of a match of
samples of different people is 1 in 10,000
• 20,000 inhabitants are sampled
• John’s DNA matches the sample
• Prosecutor: chances that John is innocent is 1 in 10,000
• Judge convicts John
The analysis• The prosecutor confuses
Pr(inn | evid) (a) Pr(evid | inn) (b)
• Forensic experts can only shed light on (b)
• The Judge must find (a);a priori probabilities are needed!! (Bayes)
• Dangerous to convict on DNA samples alone
• Pr(innocent match) = 86%• Pr(1 such match) = 27%
Expert Systems 8 22
Defender’s Fallacy
The story• Town has 100,001 people• We expect 11 to match
(1 guilty plus 10 innocent)• Probability that John is guilty
is 9%.
• John must be released
Implicit assumptions:• Offender is from town.• Equal a priori probability for
each inhabitant
It is necessary to take other circumstances into account;why was John prosecuted andwhat other evidence exists?
Conclusions:• PF: it is necessary to take
Bayes and a priori probs into account
• DF: estimating the a prioris is crucial for the outcome
Expert Systems 8 23
verslagen van deskundigen behelzende hun gevoelen betreffende hetgeen hunne wetenschap hen leert omtrent datgene wat aan hun oordeel onderworpen is
Experts’ and Judge’s task
IJzerman’s ideas about trial:1. Forensic Expert may not
claim a priori or a posteriori probabilities (Dutch Penalty Code, 344-1.4)
2. Judge must set a priori3. Judge must compute a
posteriori, based on statements of experts
4. Judge must have explicit threshold of probability for beyond reasonable doubt
5. Threshold should be explicitized in law.
Is this realistic?1. Avoid confusing Pr(G|E) and
Pr(E|G), a good idea2. A priori’s are extremely
important; this almost pre-determines the verdict
3. How is this done? Bayesian Network designed and controlled by Judge?
4. No judge will obey a mathematical formula
5. Public agreement and acceptance?
Expert Systems 8 24
Bayesian Alcoholism Test
• Driving under influence of alcohol leads to a penalty• Administrative procedure may voiden licence
• Judge must decide if the subject is an alcohol addict;incidental or regular (harmful) drinking
• Psychiatrists advice the court by determining if drinking was incidental or regular
• Goal HHAU: Harmful and Hazardous Alcohol Use• Probabilistically confirmed or denied by clinical tests• Bayesian Alcoholism Test: developed 1999-2004 by A.
Korzec, Amsterdam.
Expert Systems 8 25
Variables in Bayesian Alcoholism Test
Hidden variables:• HHAU: alcoholisme• Liver disease
Observable causes:• Hepatitis risk• Social factors• BMI, diabetes
Observable effects:• Skin color• Lab: blood, breadth• Level of Response• Smoking• CAGE questionnaire
Expert Systems 8 26
Knowledge Elicitation for BAT
Knowledge in the Network• Qualitative
- What variables are relevant- How do they interrelate
• Quantitative- A priori probabilities- Conditional probabilities
for hidden diseases- Conditional probabilities
for effects- Response of lab tests to
hidden diseases
How it was obtained• Network structure??
IJzerman does not report about this
• Probabilities- Literature studies:
40% of probabilities- Expert opinions:
60% of probabilities
Expert Systems 8 27
Consultation with BAT
Enter evidence about subject:• Clinical signs:
skin, smoking, LRA;CAGE.
• Lab results• Social factors
The network will return:• Probability that Subject has
HHAU• Probabilities for liver disease
and diabetes
The responsible Human MedicalExpert converts this probabilityto a YES/NO for the judge!(Interpretation phase)
HME may take other data intoaccount (rare disease).
Knowing what the CAGE is used for may influence the answers that the subject gives.
Expert Systems 8 28
Part IV: Bayes in the Field
The Dazzle program• Tool for designing and analysing BN• Mouse-click the network;
fill in the probabilities• Consult by evidence submission• Read posterior probabilities
• Development 2004-2006• Written in Haskell• Arjen van IJzendoorn, Martijn Schrage
• www.cs.uu.nl/dazzle
Expert Systems 8 29
Importance of a good model
In 1998, Donna Anthony (31) was convicted for murdering her two children. She was in prison for seven years but claimed her children died of cot death.
Prosecutor:The probability of two cot deaths in one family is too small, unless the mother is guilty.
Expert Systems 8 30
The Evidence against Donna Anthony
• BN with priors eliminates Prosecutor’s Fallacy
• Enter the evidence:both children died
• A priori probability is very small (1 in 1,000,000)
• Dazzle establishes a 97.6% probability of guilt
• Name of expert: Prof. Sir Roy Meadow (1933)
• His testimony brought a dozen mothers in prison in a decade
Expert Systems 8 31
A More Refined Model
Allow for genetic or social circumstances for which parent is not liable.
Expert Systems 8 32
The Evidence against Donna?
Refined model: genetic defect is the most likely cause of repeated deaths
Donna Anthony was released in 2005 after 7 years in prison
6/2005: Struck from GMC register7/2005: Appeal by Meadow2/2006: Granted; otherwise experts
refuse witnessing
Expert Systems 8 33
Classical Swine Fever, Petra Geenen
• Swine Fever is a costly disease
• Development 2004/5
• 42 vars, 80 arcs• 2454 Prs, but
many are 0.
• Pig/herd level• Prior extremely
small• Probability
elicitation with questionnaire
Expert Systems 8 34
Conclusions
• Mathematically sound model to reason with uncertainty• Further studied in Probabilistic Reasoning (ACS)• Applicable to areas where knowledge is highly statistical
• Acquisition: Instead of classical IF a THEN b (WITH c),obtain both Pr(b|a) and Pr(b|¬a)
• More work but more powerful model• One formalism allows both diagnostic and prognostic
reasoning
• Danger: apparent exactness is deceiving• Disadvantage: Lack of explanation facilities (research);
Model is quite transparant, but consultations are not.
• Increasing popularity, despite difficulty in building