Cooperating Intelligent Systems

Bayesian networksChapter 14, AIMA

Inference

• Inference in the statistical setting means computing probabilities for different outcomes to be true given the information

• We need an efficient method for doing this, which is more powerful than the naïve Bayes model.

)|( nInformatioOutcomeP

Bayesian networks

A Bayesian network is a directed graph in which each node is annotated with quantitative probability information:

1. A set of random variables, {X1,X2,X3,...}, makes up the nodes of the network.

2. A set of directed links connect pairs of nodes, parent child

3. Each node Xi has a conditional probability distribution P(Xi | Parents(Xi)) .

4. The graph is a directed acyclic graph (DAG).

The dentist network

Cavity

CatchToothache

Weather

The alarm network

Burglary Earthquake

JohnCalls MaryCalls

Burglar alarm responds to both earthquakes and burglars.

Two neighbors: John and Mary,who have promised to call youwhen the alarm goes off.

John always calls when there’san alarm, and sometimes whenthere’s not an alarm.

Mary sometimes misses the alarms (she likes loud music).

The cancer network

From Breese and Coller 1997

Age Gender

SmokingToxics

Cancer

SerumCalcium

LungTumour

GeneticDamage

The cancer network

From Breese and Coller 1997

Age Gender

SmokingToxics

Cancer

SerumCalcium

LungTumour

GeneticDamage

P(A,G) = P(A)P(G)

P(C|S,T,A,G) = P(C|S,T)

P(SC,C,LT,GD) = P(SC|C)P(LT|C,GD)P(C) P(GD)

P(A,G,T,S,C,SC,LT,GD) = P(A)P(G)P(T|A)P(S|A,G)P(C|T,S)P(GD)P(SC|C)

P(LT|C,GD)

The product (chain) rule

XparentsxPxxxP

xXxXxXP

))(|(),,,(

(This is for Bayesian networks, the general case comeslater in this lecture)

Bayes network node is a function

A B¬a b

a∧b a∧¬b ¬a∧b ¬a∧¬b

Min 0.1 0.3 0.7 0.0

Max 1.5 1.1 1.9 0.9

P(C|¬a,b) = U[0.7,1.9]

0.7 1.9

Bayes network node is a function

A BN node is a conditionaldistribution function• Inputs = Parent values• Output = distribution over values

Any type of function from valuesto distributions.

Example: The alarm network

Burglary Earthquake

JohnCalls MaryCalls

P(B=b)

P(E=e)

A P(J=j)

a 0.90

¬a 0.05

A P(M=m)

a 0.70

¬a 0.01

B E P(A=a)

b e 0.95

b ¬e 0.94

¬b e 0.29

¬b ¬e 0.001

Note: Each number in the tables represents aboolean distribution.Hence there is adistribution output forevery input.

Burglary Earthquake

JohnCalls MaryCalls

P(B=b)

P(E=e)

A P(J=j)

a 0.90

¬a 0.05

A P(M=m)

a 0.70

¬a 0.01

B E P(A=a)

b e 0.95

b ¬e 0.94

¬b e 0.29

¬b ¬e 0.001

00063.090.070.0001.0998.0999.0

)|()|(),|()()(

ajPamPebaPePbP

ebamjP

Probability distribution for”no earthquake, no burglary,but alarm, and both Mary andJohn make the call”

Meaning of Bayesian network

xxxPxxxxPxxxxP

xxxPxxxxPxxxP

43432321

3232121

),,,(),,,|(),,,|(

),,,(),,,|(),,,(

The general chain rule (always true):

iiin XparentsxPxxxP

121 ))(|(),,,(

The Bayesian network chain rule:

The BN is a correct representation of the domain iff each node is conditionally independent of its predecessors, given its parents.

The alarm network

Burglary Earthquake

JohnCalls MaryCalls

The fully correct alarm network might look something like the figure.

The Bayesian network (red) assumes that some of the variables are independent (or that the dependecies can be neglected since they are very weak).

The correctness of the Bayesian network of course depends on the validity of these assumptions.

Burglary Earthquake

JohnCalls MaryCalls

It is this sparse connection structure that makes the BN approachfeasible (~linear growth in complexity rather than exponential)

How construct a BN?

• Add nodes in causal order (”causal” determined from expertize).

• Determine conditional independence using either (or all) of the following semantics:– Blocking/d-separation rule– Non-descendant rule– Markov blanket rule– Experience/your beliefs

Path blocking & d-separation

Intuitively, knowledge about Serum Calcium influences our belief about Cancer, if we don’t know the value of Cancer, which in turn influences our belief about Lung Tumour, etc.

However, if we are given the value of Cancer (i.e. C= true or false), then knowledge of Serum Calcium will not tell us anything about Lung Tumour that we don’t already know.

We say that Cancer d-separates (direction-dependent separation) Serum Calcium and Lung Tumour.

Cancer

SerumCalcium

LungTumour

GeneticDamage

Path blocking & d-separation

Two nodes Xi and Xj are conditionally independent given a set

= {X1,X2,X3,...} of nodes if for every undirected path in the BN

between Xi and Xj there is some node Xk on the path having one of the following three properties:

1. Xk ∈ , and both arcs on the path

lead out of Xk.

2. Xk ∈ , and one arc on the path

leads into Xk and one arc leads out.

3. Neither Xk nor any descendant of

Xk is in , and both arcs on the

path lead into Xk.

Xk blocks the path between Xi and Xj

)|()|()|,( jiji XPXPXXP

Xi and Xj are d-separated if all paths betweeen them are blocked

Non-descendants

A node is conditionally independent of its non-descendants (Zij), given its parents.

),,|,,(),,|(

),,|,,,(

UUZZPUUXP

UUZZXP

Markov blanket

A node is conditionally independent of all other nodes in the network, given its parents, children, and children’s parents

These constitute the nodes Markov blanket.

),,,,,,,,|,,(),,,,,,,,|(

),,,,,,,,|,,,(

1111111

nnjjmknnjjm

nnjjmk

YYZZUUXXPYYZZUUXP

YYZZUUXXXP

Efficient representation of PDs

• Boolean Boolean• Boolean Discrete• Boolean Continuous• Discrete Boolean• Discrete Discrete• Discrete Continuous• Continuous Boolean• Continuous Discrete• Continuous Continuous

P(C|a,b) ?

Efficient representation of PDs

Boolean Boolean: Noisy-OR, Noisy-AND

Boolean/Discrete Discrete: Noisy-MAX

Bool./Discr./Cont. Continuous: Parametric distribution (e.g. Gaussian)

Continuous Boolean: Logit/Probit

Noisy-OR exampleBoolean → Boolean

P(E|C1,C2,C3)

C1 0 1 0 0 1 1 0 1

C2 0 0 1 0 1 0 1 1

C3 0 0 0 1 0 1 1 1

P(E=0) 1 0.1 0.1 0.1 0.01 0.01 0.01 0.001

P(E=1) 0 0.9 0.9 0.9 0.99 0.99 0.99 0.999

Example from L.E. Sucar

The effect (E) is off (false) when none of the causes are true. The probability for the effect increases with the number of true causes.

)(#10)0( TrueEP (for this example)

Noisy-OR general caseBoolean → Boolean

qCCCEP

),,,|0(1

Example on previous slide usedqi = 0.1 for all i.

P(E|C1,...)

C1 C2 Cn

Image adapted from Laskey & Mahoney 1999

Noisy-MAXBoolean → Discrete

Observed effect

C1 C2 Cn

Effect takes on the max value from different causes

Restrictions:– Each cause must have an off state,

which does not contribute to effect– Effect is off when all causes are off – Effect must have consecutive

escalating values: e.g., absent, mild, moderate, severe.

Image adapted from Laskey & Mahoney 1999

CkinkiqCCCeEP

1,21 ),,,|(

Parametric probability densitiesBoolean/Discr./Continuous → Continuous

Use parametric probability densities, e.g., the normal distribution

Gaussian networks (a = input to the node)

Probit & LogitDiscrete → Boolean

If the input is continuous but output is boolean, use probit or logit

dxxxaAP

)/)(exp(2

1)|( :Probit

/)(2exp1

1)|( :Logit

-8 -6 -4 -2 0 2 4 6 80

1The logistic sigmoid

P(A|x)

The cancer network

Age: {1-10, 11-20,...}Gender: {M, F}Toxics: {Low, Medium, High}Smoking: {No, Light, Heavy}Cancer: {No, Benign, Malignant}Serum Calcium: LevelLung Tumour: {Yes, No}

Age Gender

SmokingToxics

Cancer

SerumCalcium

LungTumour

Discrete Discrete/boolean

Discrete Discrete

Discrete

Continuous Discrete/boolean

Inference in BN

Inference means computing P(X|e), where X is a query (variable) and e is a set of evidence variables (for which we know the values).

Examples:

P(Burglary | john_calls, mary_calls)

P(Cancer | age, gender, smoking, serum_calcium)

P(Cavity | toothache, catch)

Exact inference in BN

”Doable” for boolean variables: Look up entries in conditional probability tables (CPTs).

yePePeP

ePeP ),,(),(

),()|( XX

Burglary Earthquake

JohnCalls MaryCalls

P(B=b)

P(E=e)

A P(J=j)

a 0.90

¬a 0.05

A P(M=m)

a 0.70

¬a 0.01

B E P(A=a)

b e 0.95

b ¬e 0.94

¬b e 0.29

¬b ¬e 0.001

},{ },{

),,,,(),|(eeE aaA

mjAEBmjB PP

Evidence variables = {J,M}Query variable = B

What is the probability for a burglary if both John and Mary call?

Burglary Earthquake

JohnCalls MaryCalls

P(B=b)

P(E=e)

A P(J=j)

a 0.90

¬a 0.05

A P(M=m)

a 0.70

¬a 0.01

B E P(A=a)

b e 0.95

b ¬e 0.94

¬b e 0.29

¬b ¬e 0.001

)()(),|()|()|(

),(),|()|()|(

),,()|()|(

),,(),,|,(

),,,,(

EPbPEbaPAmPAjP

EbEbaPAmPAjP

AEbAmPAjP

AEbAEbmj

mjAEbB

= 0.001 = 10-3

)(),|()|()|(10 3 EPEbAPAmPAjP

},{ },{

),,,,(),|(eeE aaA

mjAEBmjB PP

Burglary Earthquake

JohnCalls MaryCalls

P(B=b)

P(E=e)

A P(J=j)

a 0.90

¬a 0.05

A P(M=m)

a 0.70

¬a 0.01

B E P(A=a)

b e 0.95

b ¬e 0.94

¬b e 0.29

¬b ¬e 0.001

},{},{

10491.1),,(

105923.0

)](),|()|()|(

)(),|()|()|(

)(),|()|()|([10

)(),|()|()|(10

ePebaPamPajP

EPEbAPAmPAjP

eeEaaA

},{ },{

),,,,(),|(eeE aaA

mjAEBmjB PP

Burglary Earthquake

JohnCalls MaryCalls

P(B=b)

P(E=e)

A P(J=j)

a 0.90

¬a 0.05

A P(M=m)

a 0.70

¬a 0.01

B E P(A=a)

b e 0.95

b ¬e 0.94

¬b e 0.29

¬b ¬e 0.001

716.0),,(),|(

284.0),,(),|(

]10083.2[

),,(),,(),(

10491.1),,(

105923.0),,(

mjbmjb

mjbmjbmj

},{ },{

),,,,(),|(eeE aaA

mjAEBmjB PP

Burglary Earthquake

JohnCalls MaryCalls

P(B=b)

P(E=e)

A P(J=j)

a 0.90

¬a 0.05

A P(M=m)

a 0.70

¬a 0.01

B E P(A=a)

b e 0.95

b ¬e 0.94

¬b e 0.29

¬b ¬e 0.001

716.0),,(),|(

284.0),,(),|(

]10083.2[

),,(),,(),(

10491.1),,(

105923.0),,(

mjbmjb

mjbmjbmj

Answer: 28%

},{ },{

),,,,(),|(eeE aaA

mjAEBmjB PP

Use depth-first search

A lot of unneccesary repeated computation...

Complexity of exact inference

• By eliminating repeated calculation & uninteresting paths we can speed up the inference a lot.

• Linear time complexity for singly connected networks (polytrees).

• Exponential for multiply connected networks.– Clustering can improve this

Approximate inference in BN

• Exact inference is intractable in large multiply connected BNs ⇒ use approximate inference: Monte Carlo methods (random sampling).– Direct sampling– Rejection sampling– Likelihood weighting– Markov chain Monte Carlo

Markov chain Monte Carlo

1. Fix the evidence variables (E1, E2, ...) at their given values.

2. Initialize the network with values for all other variables, including the query variable.

3. Repeat the following many, many, many times:a. Pick a non-evidence variable at random (query Xi or

hidden Yj)

b. Select a new value for this variable, conditioned on the current values in the variable’s Markov blanket.

Monitor the values of the query variables.

Cooperating Intelligent Systems

Documents

Cooperating Intelligent Systems Bayesian networks Chapter 14, AIMA

Cooperating Intelligent Systems Inference in first-order logic Chapter 9, AIMA

Cooperating Intelligent Systems Learning from observations Chapter 18, AIMA Machine Learning

Intelligent User Behaviour and Intelligent Systems: a … · Intelligent User Behaviour and Intelligent Systems: ... Intelligent User Behaviour and Intelligent ... 2014) zContext-Based

I.T.S. Intelligent Transportation Systems - ELITE Growth · I.T.S. Intelligent Transportation Systems I.T.S. Intelligent Transportation Systems Ducati Energia opera nell’ambito

Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes

COSC 3407: Operating Systems Lecture 5: Independent Vs. Cooperating Threads

Intelligent Systems Investor Conference Call - …intelsys.com/Investor/ISCRightsOfferingPresentation070709.pdf · July 7, 2009. Intelligent Systems Corporation. 1. Intelligent Systems

2nd Intelligent Human Systems Integration · Intelligent Human Systems Integration: Integrating People and Intelligent Systems Intelligent Human Systems Integration onerene International

The Internet of Things (IoT) · Internet of Cyber-Physical Systems Robotics Software -Agents M2M Services Cooperating Systems Âutonomic Systems Cooperating Objects Energy Efficient

Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Intelligent Systems Research - Max Planck Society · Intelligent Systems Research Curators: Joachim Spatz & Stefan Schaal1 Max Planck Institute for Intelligent Systems Intelligent

Safe Cooperating Cyber-Physical Systems using Wireless ... · Safe Cooperating Cyber-Physical Systems using Wireless Communication Report type Deliverable D3.1 Report name State-of-the-Art

Cooperating Intelligent Systems Adversarial search Chapter 6, AIMA 2 nd ed Chapter 5, AIMA 3 rd ed This presentation owes a lot to V. Pavlovic @ Rutgers,

Intelligent Systems

Intelligent Systems Lecture 13 Intelligent robots

MetiTarski's menagerie of cooperating systems

Cooperating Intelligent Systems

Cooperating Intelligent Systems Uninformed search Chapter 3, AIMA (freely available in the internet with the name AMIA_Ch3_L2.ppt, Sweden) Complement to

Modeling Reactive Systems Using Cooperating Adaptive …d.researchbib.com/f/cnZwDkAQDhpTEz.pdf · Modeling Reactive Systems Using Cooperating Adaptive Devices José Maria Novaes dos