Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Reasoning with Graphical Models
Class 1Rina Dechter
class1 compsci2020
Darwiche chapters 1,3Dechter‐Morgan&claypool book: Chapters 1‐2Pearl chapter 1‐2
Congressional Briefing, December 2019 3
The Primary AI Challenges
• Machine Learning focuses on replicating humans learning
• Automated reasoning focuses on replicating how people reason.
PCWP COHRBP
HREKG HRSAT
ERRCAUTERHRHISTORY
CATECHOL
SAO2 EXPCO2
ARTCO2
VENTALV
VENTLUNG VENITUBE
DISCONNECT
MINVOLSET
VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS
PAP SHUNT
ANAPHYLAXIS
MINOVL
PVSAT
FIO2PRESS
INSUFFANESTHTPR
LVFAILURE
ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME
HYPOVOLEMIA
CVP
BP
A neural network
A Graphical Model
Congressional Briefing, December 2019 4
Automated Reasoning
Lawyer
Policy Maker
Medical Doctor
Queries:• Prediction: what will happen?• Diagnosis: what happened?• Situation assessment: What is going on?• Planning, decision making: what to do?
Automated Reasoning
Queries:• Prediction• Diagnosis• Situation assessment• Planning, decision makinganswers
5
Knowledge is huge, so How to identify what’s relevant? Graphical Models
Congressional Briefing, December 2019
Graphical ModelsExample: diagnosing liver disease (Onisko et al., 1999)
6
Automated Reasoning:• Develop methods to answer these questions.• Learning the models: from experts and data.
Congressional Briefing, December 2019
Queries:• Prediction• Diagnosis• Situation assessment• Planning, decision making
Global Seismic Monitoring Compliance for the Comprehensive Nuclear‐Test‐Ban Treaty (CTBT) (Nimar, Russel, Sudderth, 2011)
Congressional Briefing, December 2019 8
278 monitoring stations (147 seismic)
CNTBT:A Graphical Model Application
The IDC (International Data Centers)operates continuously and in real time,performing signal processing
Congressional Briefing, December 2019 9
Given: continuous waveform measurements from a global network of seismometer stations
Global Seismic Monitoring for the Comprehensive Nuclear‐Test‐Ban Treaty (Nimar, Russel, Sudderth, 2011)
Congressional Briefing, December 2019
Input: obsreved detectionOutput: a bulletin listing seismic events, with
• Time• Location (latitude, longitude)• Depth• Magnitude
Global Seismic Monitoring for the Comprehensive Nuclear‐Test‐Ban Treaty (Nimar, Russel, Sudderth, 2011)
Result: 60% reduction in error compared with human experts.
Reasoning methods infers the most likelyset of seismic events given the observed detections,
Complexity of Automated Reasoning• Prediction• Diagnosis• Planning and scheduling• Probabilistic Inference• Explanation • Decision‐making
0
200
400
600
800
1000
1200
1 2 3 4 5 6 7 8 9 10
f(n)
n
Linear / Polynomial / Exponential
Linear
Reasoning is computationally hardComplexity is exponential
Congressional Briefing, December 2019
Approximation, anytime
Bounded error
11
AI Renaissance
• Deep learning– Fast predictions– “Instinctive”
• Probabilistic models– Slow reasoning– “Logical / deliberative”
Tools: Tensorflow, PyTorch, …
Tools: Graphical Models,
Probabilistic programming,Markov Logic, …
class1 compsci2020
Outline of classes• Part 1: Introduction and Inference
• Part 2: Search
• Parr 3: Variational Methods and Monte‐Carlo Sampling
class1 compsci2020
E K
F
LH
C
BA
M
G
J
DABC
BDEF
DGF
EFH
FHK
HJ KLM
0 1 0 1 0 1 0 10 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
01010101010101010101010101010101010101010101010101010101010101010 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1EC
FD
BA 0 1
Context minimal AND/OR search graph
AOR0ANDBOR
0ANDOR E
OR F FAND 01
AND 0 1C
D D01
0 1
1EC
D D0 1
1B
0E
F F0 1
C1EC
Probabilistic Graphical models• Describe structure in large problems
– Large complex system– Made of “smaller”, “local” interactions– Complexity emerges through interdependence
class1 compsci2020
Probabilistic Graphical models• Describe structure in large problems
– Large complex system– Made of “smaller”, “local” interactions– Complexity emerges through interdependence
• Examples & Tasks– Maximization (MAP): compute the most probable configuration
[Yanover & Weiss 2002][Bruce R. Donald et. Al. 2016]
class1 compsci2020
• Protein Structure prediction: predicting the 3d structure from given sequences
• PDB: Protein design (backbone) algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC).
Probabilistic Graphical models• Describe structure in large problems
– Large complex system– Made of “smaller”, “local” interactions– Complexity emerges through interdependence
• Examples & Tasks– Summation & marginalization
grass
plane
sky
grass
cow
Observation y Observation yMarginals p( xi | y ) Marginals p( xi | y )
and
“partition function”
class1 compsci2020
e.g., [Plath et al. 2009]
Image segmentation and classification:
Graphical models• Describe structure in large problems
– Large complex system– Made of “smaller”, “local” interactions– Complexity emerges through interdependence
• Examples & Tasks– Mixed inference (marginal MAP, MEU, …)
Test
Drill Oil salepolicy
Testresult
Seismicstructure
Oilunderground
Oilproduced
Testcost
Drillcost
Salescost
Oil sales
Marketinformation
Influence diagrams &optimal decision‐making
(the “oil wildcatter” problem)
class1 compsci2020
e.g., [Raiffa 1968; Shachter 1986]
Bayesian Networks (Pearl 1988)
P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B)
lung Cancer
Smoking
X-ray
Bronchitis
DyspnoeaP(D|C,B)
P(B|S)
P(S)
P(X|C,S)
P(C|S)
Θ) (G,BN
CPD:C B P(D|C,B)0 0 0.1 0.90 1 0.7 0.31 0 0.8 0.21 1 0.9 0.1
• Posterior marginals, probability of evidence, MPE
• P( D= 0) = ∑ P(S)· P(C|S)· P(B|S)· P(X|C,S)· P(D|C,B, , ,MAP(P)= 𝑚𝑎𝑥 , , , P(S)· P(C|S)· P(B|S)· P(X|C,S)· P(D|C,B)
Combination: ProductMarginalization: sum/max
class1 compsci2020
An early exampleFrom medical diagnosis
Alarm network• Bayes nets: compact representation of large joint distributions
PCWP CO
HRBP
HREKG HRSAT
ERRCAUTERHRHISTORY
CATECHOL
SAO2 EXPCO2
ARTCO2
VENTALV
VENTLUNG VENITUBE
DISCONNECT
MINVOLSET
VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS
PAP SHUNT
ANAPHYLAXIS
MINOVL
PVSAT
FIO2PRESS
INSUFFANESTHTPR
LVFAILURE
ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME
HYPOVOLEMIA
CVP
BP
The “alarm” network: 37 variables, 509 parameters (rather than 237 = 1011 !)
[Beinlich et al., 1989]
class1 compsci2020
A Bred greenred yellowgreen redgreen yellowyellow greenyellow red
Example: map coloringVariables - countries (A,B,C,etc.)Values - colors (red, green, blue)Constraints: etc. ,ED D, AB,A
C
A
B
DE
FG
Constraint Networks
A
B
E
G
D F
C
Constraint graph
class1 828X‐2018
Propositional Reasoning
• If Alex goes, then Becky goes:• If Chris goes, then Alex goes:
• Question:Is it possible that Chris goes to the party but Becky does not?
Example: party problem
BA
A C
e?satisfiabl ,, theIs C B, ACBA
theorynalpropositio
A B
C
class1 828X‐2018
Probabilistic reasoning (directed)
• Alex is‐likely‐to‐go in bad weather• Chris rarely‐goes in bad weather• Becky is indifferent but unpredictable
Questions:• Given bad weather, which group of individuals is most
likely to show up at the party? • What is the probability that Chris goes to the party
but Becky does not?
Party example: the weather effect
P(W,A,C,B) = P(B|W) · P(C|W) · P(A|W) · P(W)
P(A,C,B|W=bad) = 0.9 · 0.1 · 0.5
P(A|W=bad)=.9W A
P(C|W=bad)=.1W C
P(B|W=bad)=.5W B
WP(W)
P(A|W)P(C|W)P(B|W)
B CA
W A P(A|W)
good 0 .01
good 1 .99
bad 0 .1
bad 1 .9
class1 828X‐2018
Mixed Probabilistic and Deterministic networks
P(C|W)P(B|W)
P(W)
P(A|W)
W
B A C
Query:Is it likely that Chris goes to the party if Becky does not but the weather is bad?
PN CN
),,|,( ACBAbadwBCP
A→B C→AB A CP(C|W)P(B|W)
P(W)
P(A|W)
W
B A C
A→B C→AB A C
Alex is‐likely‐to‐go in bad weatherChris rarely‐goes in bad weatherBecky is indifferent but unpredictable
class1 828X‐2018
Example domains for graphical models• Natural Language processing
– Information extraction, semantic parsing, translation, topic models, …
• Computer vision– Object recognition, scene analysis, segmentation, tracking, …
• Computational biology– Pedigree analysis, protein folding and binding, sequence matching, …
• Networks– Webpage link analysis, social networks, communications, citations, ….
• Robotics– Planning & decision making
class1 compsci2020
Complexity of Reasoning Tasks• Constraint satisfaction• Counting solutions• Combinatorial optimization• Belief updating• Most probable explanation • Decision‐theoretic planning
0
200
400
600
800
1000
1200
1 2 3 4 5 6 7 8 9 10
f(n)
n
Linear / Polynomial / Exponential
LinearPolynomialExponential
Reasoning iscomputationally hardComplexity isTime and space(memory)
class1 compsci2020
Tree‐solving is easy
Belief updating (sum-prod)
MPE (max-prod)
CSP – consistency (projection-join)
#CSP (sum-prod)
P(X)
P(Y|X) P(Z|X)
P(T|Y) P(R|Y) P(L|Z) P(M|Z)
)(XmZX
)(XmXZ
)(ZmZM)(ZmZL
)(ZmMZ)(ZmLZ
)(XmYX
)(XmXY
)(YmTY
)(YmYT
)(YmRY
)(YmYR
Trees are processed in linear time and memoryclass1 compsci2020
Transforming into a Tree • By Inference (thinking)
– Transform into a single, equivalent tree of sub‐problems
• By Conditioning (guessing)– Transform into many tree‐like sub‐problems.
class1 compsci2020
Inference and Treewidth
EK
F
L
H
C
BA
M
G
J
D
ABC
BDEF
DGF
EFH
FHK
HJ KLM
treewidth = 4 - 1 = 3treewidth = (maximum cluster size) - 1
Inference algorithm:Time: exp(tree-width)Space: exp(tree-width)
class1 compsci2020
Conditioning and Cycle cutset
C P
J A
L
B
E
DF M
O
H
K
G N
C P
J
L
B
E
DF M
O
H
K
G N
A
C P
J
L
E
DF M
O
H
K
G N
B
P
J
L
E
DF M
O
H
K
G N
C
Cycle cutset = {A,B,C}
C P
J A
L
B
E
DF M
O
H
K
G N
C P
J
L
B
E
DF M
O
H
K
G N
C P
J
L
E
DF M
O
H
K
G N
C P
J A
L
B
E
DF M
O
H
K
G N
class1 compsci2020
Search over the Cutset
A=yellow A=green
B=red B=blue B=red B=blueB=green B=yellow
C
K
G
LD
FH
M
J
E
C
K
G
LD
FH
M
J
E
C
K
G
LD
FH
M
J
E
C
K
G
LD
FH
M
J
E
C
K
G
LD
FH
M
J
E
C
K
G
LD
FH
M
J
E
• Inference may require too much memory
• Condition on some of the variablesA
C
B K
G
LD
FH
M
J
E
GraphColoringproblem
class1 compsci2020
Inference
exp(w*) time/space
A
D
B C
E
F0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1
E
C
F
D
B
A 0 1 SearchExp(w*) timeO(w*) space
E K
F
L
H
C
BA
M
G
J
D
ABC
BDEF
DGF
EFH
FHK
HJ KLM
A=yellow A=green
B=blue B=red B=blueB=green
CK
G
LD
F H
M
J
EACB K
G
LD
F H
M
J
E
CK
G
LD
F H
M
J
EC
K
G
LD
F H
M
J
EC
K
G
LD
F H
M
J
ESearch+inference:Space: exp(q)Time: exp(q+c(q))
q: usercontrolled
Bird's‐eye View of Exact Algorithms
class1 compsci2020
Inference
exp(w*) time/space
A
D
B C
E
F0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1
E
C
F
D
B
A 0 1 SearchExp(w*) timeO(w*) space
E K
F
L
H
C
BA
M
G
J
D
ABC
BDEF
DGF
EFH
FHK
HJ KLM
A=yellow A=green
B=blue B=red B=blueB=green
CK
G
LD
F H
M
J
EACB K
G
LD
F H
M
J
E
CK
G
LD
F H
M
J
EC
K
G
LD
F H
M
J
EC
K
G
LD
F H
M
J
ESearch+inference:Space: exp(q)Time: exp(q+c(q))
q: usercontrolled
Context minimal AND/OR search graph
18 AND nodes
AOR0ANDBOR
0ANDOR E
OR F FAND 0 1
AND 0 1
C
D D
0 1
0 1
1
EC
D D
0 1
1
B
0
E
F F
0 1
C1
EC
Bird's‐eye View of Exact Algorithms
class1 compsci2020
A
D
B C
E
F0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1
E
C
F
D
B
A 0 1
E K
F
L
H
C
BA
M
G
J
D
ABC
BDEF
DGF
EFH
FHK
HJ KLM
A=yellow A=green
B=blue B=red B=blueB=green
CK
G
LD
F H
M
J
EACB K
G
LD
F H
M
J
E
CK
G
LD
F H
M
J
EC
K
G
LD
F H
M
J
EC
K
G
LD
F H
M
J
E
Inference
Bounded Inference
Search
Sampling
Search + inference:
Sampling + bounded inference
Bird's‐eye View of Approximate Algorithms
class1 compsci2020
Context minimal AND/OR search graph
18 AND nodes
AOR0ANDBOR
0ANDOR E
OR F FAND 0 1
AND 0 1
C
D D
0 1
0 1
1
EC
D D
0 1
1
B
0
E
F F
0 1
C1
EC
Outline• Basics of probability theory • DAGS, Markov(G), Bayesian networks• Graphoids: axioms of for inferring conditional independence (CI)
• D‐separation: Inferring CIs in graphs
class1 compsci2020
Outline• Basics of probability theory • DAGS, Markov(G), Bayesian networks• Graphoids: axioms of for inferring conditional independence (CI)
• Capturing CIs by graphs• D‐separation: Inferring CIs in graphs
class1 compsci2020
Examples: Common Sense Reasoning
• Zebra on Pajama: (7:30 pm): I told Susannah: you have a nice pajama, but it was just a dress. Why jump to that conclusion?: 1. because time is night time. 2. certain designs look like pajama.
• Cars going out of a parking lot: You enter a parking lot which is quite full (UCI), you see a car coming : you think ah… now there is a space (vacated), OR… there is no space and this guy is looking and leaving to another parking lot. What other clues can we have?
• Robot gets out at a wrong level: A robot goes down the elevator. stops at 2nd floor instead of ground floor. It steps out and should immediately recognize not being in the right level, and go back inside.
• Turing quotes– If machines will not be allowed to be fallible they cannot be intelligent– (Mathematicians are wrong from time to time so a machine should also be allowed)
class1 compsci2020
Why Uncertainty?• AI goal: to have a declarative, model‐based, framework that allows computer
system to reason.• People reason with partial information• Sources of uncertainty:
– Limitation in observing the world: e.g., a physician see symptoms and not exactly what goes in the body when he performs diagnosis. Observations are noisy (test results are inaccurate)
– Limitation in modeling the world, – maybe the world is not deterministic.
class1 compsci2020
Why/What/How Uncertainty?• Why Uncertainty?
– Answer: It is abandant
• What formalism to use? – Answer: Probability theory
• How to overcome exponential representation?
– Answer: Graphs, graphs, graphs… to capture irrelevance, independence, causality
class1 compsci2020
Bayesian Networks: Representation
= P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B)
lung Cancer
Smoking
X-ray
Bronchitis
DyspnoeaP(D|C,B)
P(B|S)
P(S)
P(X|C,S)
P(C|S)
P(S, C, B, X, D)
Conditional Independencies Efficient Representation
Θ) (G,BN
CPD:C B D=0 D=10 0 0.1 0.90 1 0.7 0.31 0 0.8 0.21 1 0.9 0.1
class1 compsci2020