69
Reasoning with Graphical Models Class 1 Rina Dechter class1 compsci2020 Darwiche chapters 1,3 Dechter‐Morgan&claypool book: Chapters 1‐2 Pearl chapter 1‐2

Class 1 Rina Dechter - Donald Bren School of Information ...dechter/courses/ics-276/winter-2020/slides/... · Rina Dechter class1 compsci2020 Darwichechapters 1,3 Dechter‐Morgan&claypoolbook:

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Reasoning with Graphical Models

Class 1Rina Dechter

class1  compsci2020

Darwiche chapters 1,3Dechter‐Morgan&claypool book: Chapters 1‐2Pearl chapter 1‐2

Congressional Breifing: AI at UCI

Congressional Briefing, December 2019 2

• Rina Dechter

Congressional Briefing, December 2019 3

The Primary AI Challenges

• Machine Learning focuses on replicating humans learning 

• Automated reasoning focuses on replicating how people reason.  

PCWP COHRBP

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2PRESS

INSUFFANESTHTPR

LVFAILURE

ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

HYPOVOLEMIA

CVP

BP

A neural network

A Graphical Model

Congressional Briefing, December 2019 4

Automated Reasoning

Lawyer

Policy Maker

Medical Doctor

Queries:• Prediction: what will happen?• Diagnosis: what happened?• Situation assessment: What is going on?• Planning, decision making: what to do?

Automated Reasoning

Queries:• Prediction• Diagnosis• Situation assessment• Planning, decision makinganswers

5

Knowledge is huge, so How to identify what’s  relevant? Graphical Models

Congressional Briefing, December 2019

Graphical ModelsExample: diagnosing liver disease (Onisko et al., 1999)

6

Automated Reasoning:• Develop methods to  answer these questions.• Learning the models:  from experts and data.

Congressional Briefing, December 2019

Queries:• Prediction• Diagnosis• Situation assessment• Planning, decision making

class1  compsci2020

Global Seismic Monitoring Compliance for the Comprehensive Nuclear‐Test‐Ban Treaty (CTBT) (Nimar, Russel, Sudderth, 2011)

Congressional Briefing, December 2019 8

278 monitoring stations (147 seismic)

CNTBT:A Graphical Model Application

The IDC (International Data Centers)operates continuously and in real time,performing signal processing

Congressional Briefing, December 2019 9

Given: continuous waveform measurements from a global network of seismometer stations

Global Seismic Monitoring for the Comprehensive Nuclear‐Test‐Ban Treaty (Nimar, Russel, Sudderth, 2011)

Congressional Briefing, December 2019

Input: obsreved detectionOutput: a bulletin listing seismic events, with

• Time• Location (latitude, longitude)• Depth• Magnitude

Global Seismic Monitoring for the Comprehensive Nuclear‐Test‐Ban Treaty (Nimar, Russel, Sudderth, 2011)

Result: 60% reduction in error compared with human experts.

Reasoning methods infers the most likelyset of seismic  events given the observed detections, 

Complexity of Automated Reasoning• Prediction• Diagnosis• Planning and scheduling• Probabilistic  Inference• Explanation • Decision‐making

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10

f(n)

n

Linear / Polynomial / Exponential

Linear

Reasoning is computationally hardComplexity is exponential

Congressional Briefing, December 2019

Approximation, anytime

Bounded error

11

AI Renaissance

• Deep learning– Fast predictions– “Instinctive”

• Probabilistic models– Slow reasoning– “Logical / deliberative”

Tools: Tensorflow, PyTorch, …

Tools: Graphical Models,

Probabilistic programming,Markov Logic, …

class1  compsci2020

Text Books, Outline, Requirements

class1  compsci2020

Class page

Outline of classes• Part 1: Introduction and Inference

• Part 2: Search

• Parr 3: Variational Methods and Monte‐Carlo Sampling

class1  compsci2020

E K

F

LH

C

BA

M

G

J

DABC

BDEF

DGF

EFH

FHK

HJ KLM

0 1 0 1 0 1 0 10 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

01010101010101010101010101010101010101010101010101010101010101010 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1EC

FD

BA 0 1

Context minimal AND/OR search graph

AOR0ANDBOR

0ANDOR E

OR F FAND 01

AND 0 1C

D D01

0 1

1EC

D D0 1

1B

0E

F F0 1

C1EC

Probabilistic Graphical models• Describe structure in large problems

– Large complex system– Made of “smaller”, “local” interactions– Complexity emerges through interdependence

class1  compsci2020

Probabilistic Graphical models• Describe structure in large problems

– Large complex system– Made of “smaller”, “local” interactions– Complexity emerges through interdependence

• Examples & Tasks– Maximization (MAP): compute the most probable configuration

[Yanover & Weiss 2002][Bruce R. Donald et. Al. 2016]

class1  compsci2020

• Protein Structure prediction: predicting the 3d structure from given sequences

• PDB: Protein design (backbone) algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC).

Probabilistic Graphical models• Describe structure in large problems

– Large complex system– Made of “smaller”, “local” interactions– Complexity emerges through interdependence

• Examples & Tasks– Summation & marginalization

grass

plane

sky

grass

cow

Observation  y Observation  yMarginals p( xi | y ) Marginals p( xi | y )

and

“partition function”

class1  compsci2020

e.g., [Plath et al. 2009]

Image segmentation and classification:

Graphical models• Describe structure in large problems

– Large complex system– Made of “smaller”, “local” interactions– Complexity emerges through interdependence

• Examples & Tasks– Mixed inference (marginal MAP, MEU, …)

Test

Drill Oil salepolicy

Testresult

Seismicstructure

Oilunderground

Oilproduced

Testcost

Drillcost

Salescost

Oil sales

Marketinformation

Influence diagrams &optimal decision‐making

(the “oil wildcatter” problem)

class1  compsci2020

e.g., [Raiffa 1968; Shachter 1986]

class1  compsci2020

In more details…

Bayesian Networks (Pearl 1988)

P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B)

lung Cancer

Smoking

X-ray

Bronchitis

DyspnoeaP(D|C,B)

P(B|S)

P(S)

P(X|C,S)

P(C|S)

Θ) (G,BN

CPD:C  B   P(D|C,B)0  0    0.1  0.90  1    0.7  0.31  0    0.8  0.21  1    0.9  0.1

• Posterior marginals, probability of evidence, MPE

• P( D= 0) = ∑ P(S)· P(C|S)· P(B|S)· P(X|C,S)· P(D|C,B, , ,MAP(P)=  𝑚𝑎𝑥 , , , P(S)· P(C|S)· P(B|S)· P(X|C,S)· P(D|C,B)

Combination: ProductMarginalization: sum/max

class1  compsci2020

An early exampleFrom medical diagnosis

Alarm network• Bayes nets: compact representation of large joint distributions

PCWP CO

HRBP

HREKG HRSAT

ERRCAUTERHRHISTORY

CATECHOL

SAO2 EXPCO2

ARTCO2

VENTALV

VENTLUNG VENITUBE

DISCONNECT

MINVOLSET

VENTMACHKINKEDTUBEINTUBATIONPULMEMBOLUS

PAP SHUNT

ANAPHYLAXIS

MINOVL

PVSAT

FIO2PRESS

INSUFFANESTHTPR

LVFAILURE

ERRBLOWOUTPUTSTROEVOLUMELVEDVOLUME

HYPOVOLEMIA

CVP

BP

The “alarm” network:  37 variables, 509 parameters (rather than 237  = 1011 !) 

[Beinlich et al., 1989]

class1  compsci2020

class1  compsci2020

A Bred greenred yellowgreen redgreen yellowyellow greenyellow red

Example: map coloringVariables - countries (A,B,C,etc.)Values - colors (red, green, blue)Constraints: etc. ,ED D, AB,A

C

A

B

DE

FG

Constraint Networks

A

B

E

G

D F

C

Constraint graph

class1  828X‐2018

Propositional Reasoning

• If Alex goes, then Becky goes:• If Chris goes, then Alex goes:

• Question:Is it possible that Chris goes to the party but Becky does not?

Example: party problem

BA

A C

e?satisfiabl ,, theIs C B, ACBA

theorynalpropositio

A B

C

class1  828X‐2018

Probabilistic reasoning (directed)

• Alex is‐likely‐to‐go in bad weather• Chris rarely‐goes in bad weather• Becky is indifferent but unpredictable

Questions:• Given bad weather, which group of individuals is most 

likely to show up at the party? • What is the probability that Chris goes to the party 

but Becky does not?

Party example: the weather effect

P(W,A,C,B) = P(B|W) · P(C|W) · P(A|W) · P(W)

P(A,C,B|W=bad) = 0.9 · 0.1 · 0.5

P(A|W=bad)=.9W A

P(C|W=bad)=.1W C

P(B|W=bad)=.5W B

WP(W)

P(A|W)P(C|W)P(B|W)

B CA

W A P(A|W)

good 0 .01

good 1 .99

bad 0 .1

bad 1 .9

class1  828X‐2018

Mixed Probabilistic and Deterministic networks

P(C|W)P(B|W)

P(W)

P(A|W)

W

B A C

Query:Is it likely that Chris goes to the party if Becky does not but the weather is bad?

PN CN

),,|,( ACBAbadwBCP

A→B C→AB A CP(C|W)P(B|W)

P(W)

P(A|W)

W

B A C

A→B C→AB A C

Alex is‐likely‐to‐go in bad weatherChris rarely‐goes in bad weatherBecky is indifferent but unpredictable

class1  828X‐2018

Example domains for graphical models• Natural Language processing

– Information extraction, semantic parsing, translation, topic models, …

• Computer vision– Object recognition, scene analysis, segmentation, tracking, …

• Computational biology– Pedigree analysis, protein folding and binding, sequence matching, …

• Networks– Webpage link analysis, social networks, communications, citations, ….

• Robotics– Planning & decision making

class1  compsci2020

Complexity of Reasoning Tasks• Constraint satisfaction• Counting solutions• Combinatorial optimization• Belief updating• Most probable explanation • Decision‐theoretic planning

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10

f(n)

n

Linear / Polynomial / Exponential

LinearPolynomialExponential

Reasoning iscomputationally hardComplexity isTime and space(memory)

class1  compsci2020

Tree‐solving is easy

Belief updating (sum-prod)

MPE (max-prod)

CSP – consistency (projection-join)

#CSP (sum-prod)

P(X)

P(Y|X) P(Z|X)

P(T|Y) P(R|Y) P(L|Z) P(M|Z)

)(XmZX

)(XmXZ

)(ZmZM)(ZmZL

)(ZmMZ)(ZmLZ

)(XmYX

)(XmXY

)(YmTY

)(YmYT

)(YmRY

)(YmYR

Trees are processed in linear time and memoryclass1  compsci2020

Transforming into a Tree • By Inference (thinking)

– Transform into a single, equivalent tree of sub‐problems

• By Conditioning (guessing)– Transform into many tree‐like sub‐problems.

class1  compsci2020

Inference and Treewidth

EK

F

L

H

C

BA

M

G

J

D

ABC

BDEF

DGF

EFH

FHK

HJ KLM

treewidth = 4 - 1 = 3treewidth = (maximum cluster size) - 1

Inference algorithm:Time: exp(tree-width)Space: exp(tree-width)

class1  compsci2020

Conditioning and Cycle cutset

C P

J A

L

B

E

DF M

O

H

K

G N

C P

J

L

B

E

DF M

O

H

K

G N

A

C P

J

L

E

DF M

O

H

K

G N

B

P

J

L

E

DF M

O

H

K

G N

C

Cycle cutset = {A,B,C}

C P

J A

L

B

E

DF M

O

H

K

G N

C P

J

L

B

E

DF M

O

H

K

G N

C P

J

L

E

DF M

O

H

K

G N

C P

J A

L

B

E

DF M

O

H

K

G N

class1  compsci2020

Search over the Cutset

A=yellow A=green

B=red B=blue B=red B=blueB=green B=yellow

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

C

K

G

LD

FH

M

J

E

• Inference may require too much memory

• Condition on some of the variablesA

C

B K

G

LD

FH

M

J

E

GraphColoringproblem

class1  compsci2020

Inference

exp(w*) time/space

A

D

B C

E

F0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

E

C

F

D

B

A 0 1 SearchExp(w*) timeO(w*) space

E K

F

L

H

C

BA

M

G

J

D

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A=yellow A=green

B=blue B=red B=blueB=green

CK

G

LD

F H

M

J

EACB K

G

LD

F H

M

J

E

CK

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

ESearch+inference:Space: exp(q)Time: exp(q+c(q))

q: usercontrolled

Bird's‐eye View of Exact Algorithms

class1  compsci2020

Inference

exp(w*) time/space

A

D

B C

E

F0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

E

C

F

D

B

A 0 1 SearchExp(w*) timeO(w*) space

E K

F

L

H

C

BA

M

G

J

D

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A=yellow A=green

B=blue B=red B=blueB=green

CK

G

LD

F H

M

J

EACB K

G

LD

F H

M

J

E

CK

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

ESearch+inference:Space: exp(q)Time: exp(q+c(q))

q: usercontrolled

Context minimal AND/OR search graph

18 AND nodes

AOR0ANDBOR

0ANDOR E

OR F FAND 0 1

AND 0 1

C

D D

0 1

0 1

1

EC

D D

0 1

1

B

0

E

F F

0 1

C1

EC

Bird's‐eye View of Exact Algorithms

class1  compsci2020

A

D

B C

E

F0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

E

C

F

D

B

A 0 1

E K

F

L

H

C

BA

M

G

J

D

ABC

BDEF

DGF

EFH

FHK

HJ KLM

A=yellow A=green

B=blue B=red B=blueB=green

CK

G

LD

F H

M

J

EACB K

G

LD

F H

M

J

E

CK

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

EC

K

G

LD

F H

M

J

E

Inference

Bounded Inference

Search

Sampling

Search + inference:

Sampling + bounded inference

Bird's‐eye View of Approximate Algorithms

class1  compsci2020

Context minimal AND/OR search graph

18 AND nodes

AOR0ANDBOR

0ANDOR E

OR F FAND 0 1

AND 0 1

C

D D

0 1

0 1

1

EC

D D

0 1

1

B

0

E

F F

0 1

C1

EC

Outline• Basics of probability theory • DAGS, Markov(G),  Bayesian networks• Graphoids: axioms of for inferring conditional independence (CI)

• D‐separation: Inferring  CIs in graphs

class1  compsci2020

Outline• Basics of probability theory • DAGS, Markov(G),  Bayesian networks• Graphoids: axioms of for inferring conditional independence (CI)

• Capturing CIs by graphs• D‐separation: Inferring  CIs in graphs

class1  compsci2020

Examples: Common Sense Reasoning

• Zebra on Pajama: (7:30 pm): I told Susannah: you have a nice pajama,  but it was just a dress. Why jump to that conclusion?: 1. because time is night time. 2. certain designs look like pajama.

• Cars going out of a parking lot: You enter a parking lot which is quite full (UCI), you see a car coming : you think ah… now there is a space (vacated), OR… there is no space and this guy is looking and leaving to another parking lot. What other clues can we have?

• Robot gets out at a wrong level:  A robot goes down the elevator. stops at 2nd floor instead of ground floor. It steps out and should immediately recognize not being in the right level, and go back inside.

• Turing quotes– If machines will not be allowed to be fallible they cannot be intelligent– (Mathematicians are wrong from time to time so a machine should also be allowed)

class1  compsci2020

Why Uncertainty?• AI goal: to  have a declarative, model‐based, framework that allows computer 

system to reason.• People  reason with partial information• Sources of uncertainty: 

– Limitation in observing the world: e.g., a physician see symptoms and not exactly what goes in the body when he performs diagnosis. Observations are noisy (test results are inaccurate)

– Limitation in modeling the world, – maybe the world is not deterministic.

class1  compsci2020

Why/What/How Uncertainty?• Why Uncertainty?

– Answer: It is abandant

• What formalism to use? – Answer: Probability theory

• How to overcome exponential representation?

– Answer: Graphs, graphs, graphs… to capture irrelevance, independence, causality

class1  compsci2020

The Burglary Example

class1  compsci2020

Earthquake  Burglary

RadioAlarm

Call

class1  compsci2020

class1  compsci2020

class1  compsci2020

class1  compsci2020

class1  compsci2020

class1  compsci2020

class1  compsci2020

Alpha and beta are events

class1  compsci2020

class1  compsci2020

Burglary is independent  of Earthquake

class1  compsci2020

Earthquake is independent of burglary

class1  compsci2020

class1  compsci2020

class1  compsci2020

class1  compsci2020

class1  compsci2020

class1  compsci2020

class1  compsci2020

class1  compsci2020

class1  compsci2020

class1  compsci2020

Example

P(B,E,A,J,M)=?

class1 compsci2020

class1  compsci2020

class1  compsci2020

class1  compsci2020

class1  compsci2020

Bayesian Networks: Representation

= P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B)

lung Cancer

Smoking

X-ray

Bronchitis

DyspnoeaP(D|C,B)

P(B|S)

P(S)

P(X|C,S)

P(C|S)

P(S, C, B, X, D)

Conditional Independencies Efficient Representation

Θ) (G,BN

CPD:C B D=0 D=10 0 0.1 0.90 1 0.7 0.31 0 0.8 0.21 1 0.9 0.1

class1 compsci2020

class1  compsci2020

End of  slides