Bayesian Networks - Yonsei Universitysclab.yonsei.ac.kr/courses/07AI/data/6)Bayesian Networks.pdf3...

Preview:

Citation preview

1

Bayesian Networks

2

Bayes’ Rule & Bayesian Inference

• Bayesian inference is statistical inference in which probabilities are interpreted as degrees of belief

• The name comes from the frequent use of the Bayes’ rule

)()()|(

)(),()|(

EPHPHEP

EPEHPEHP ==

3

Bayesian Networks

• Many variables exist and joint probability distribution is important– How to represent joint probability distribution effectively

• Bayesian networks: Graphical representation (acyclic directed graph) of joint probability distribution – Node : Variables– Edge : Probabilistic dependency

• A node can represent any kind of variable, be it an observed measurement, a parameter, a latent variable, or a hypothesis

• Node are not restricted to representing random variables

4

Why Use BNs?• Explicit management of uncertainty

• Modularity (modular specification of a joint distribution) implies maintainability

• Better, flexible and robust decision making – MEU (Maximization of Expected Utility), VOI (Value of Information)

• Can be used to answer arbitrary queries - multiple fault problems (General purpose “inference” algorithm)

• Easy to incorporate prior knowledge and to understand

5

= P(A) P(S) P(T|A) P(L|S) P(B|S) P(C|T,L) P(D|T,L,B)

P(A, S, T, L, B, C, D)

Conditional Independencies Efficient Representation

T L B D=0 D=10 0 0 0.1 0.90 0 1 0.7 0.30 1 0 0.8 0.20 1 1 0.9 0.1

...

Lung Cancer (L)

Smoking (S)

Chest X-ray (C)

Bronchitis (B)

Dyspnoea (D)

Tuberculosis (T)

Visit to Asia (A)

P(D|T,L,B)

P(B|S)

P(S)

P(C|T,L)

P(L|S)

P(A)

P(T|A)

[Lauritzen & Spiegelhalter, 95]

An Example of Bayesian Network (Asia Network)

A kind of disease

A kind of disease

Difficult respiration

6

Bayesian Network Knowledge Engineering

• Objective: Construct a model to perform a defined task

• Participants: Collaboration between domain expert and BN modeling expert

• Process: iterate until “done”– Define task objective– Construct model– Evaluate model

7

The Knowledge Acquisition Task

• Variables:

– Collectively exhaustive, mutually exclusive values

– Clarity test: value should be knowable in principle

• Structure

– If data available, can be learned

– Constructed by hand (using “expert” knowledge)

– variable ordering matters: Causal knowledge usually simplifies

• Probabilities

– Can be learned from data

– Sensitivity analysis

8

Learning Bayesian Network

• Why learn a Bayesian network?– It is expensive to gain expert knowledge ...

• But Collecting data is cheap!!– It is difficult to use Expert knowledge for BN

• Applying data to learning method... Easy!!– It can be mixed expert knowledge and data– Knowledge discovery

DATADATA

PriorPriorInformationInformation

LearningMethod

LearningLearningMethodMethod

S

E

C

D

…………0.30.7FT0.10.9TT

P(E|S,C)CS

Structure

CPT(parameter)

9

Compactness and Node Ordering

• Compact BN has order k (number of edges) < N (number of nodes)– Sparsely connected

• The optimal order to add nodes is to add the “root causes” first, then variable they influence, so on until “leaves” reached

• Example of poor ordering (which still represent the same joint distribution)

10

Network Construction Algorithm

1) Choose the set of relevant variables Xi that describe the domain

2) Choose an ordering for the variables

3) While there are variable left:

1) Pick a variable Xi and add a node to the network for it

2) Set Parents(Xi ) to some minimal set of nodes already in the net such that the conditional independence property is satisfied

3) Define the CPT (Conditional Probability Table) for Xi

))(|(=),...,|( 11- iiii XParentsXPXXXP

11

K2 Algorithm (1)

• Assume that node input in order• A set of n nodes in order, an upper bound u on the # of parents a

node may have, and a database D containing m cases

{K2} end{for} end

),:'node thisof Parents' ,,:'Node' write({while}; end

false; : dOKToProcee else };{:

;: then if

});{,(: });{,( maximizes that -)Pred(in node thebe let

do and dOKToProcee whiletrue;: dOKToProcee );,(:

;: do to1:for

K2; procedure

ii

ii

newold

oldnew

inew

iii

i

iold

i

x

zPP

PPziscoreP

ziscorexzu

iscoreP

ni

π

ππ

πππ

π

πφπ

=∪=

=>

∪=∪

<

==

==

Pred(node) that returns the set of nodes that precede node in the node ordering

12

K2 Algorithm (2)

• Example– Node ordering : A B C D E

A A B

C

A B

C

A B

D

C

A B

D

E

C

A B

D

E

13

Complex Structure• Theorem : Finding maximal scoring network structure with at most k

parents for each variables(nodes) is NP-hard for k >1• We solve the problems by using heuristic search

– Traverse the search space looking for high-scoring structures

• Caching : To update the score of after a local change, we only need to re-score the families that were changed in the last move(Decomposability)

S

E

C

D

S

E

C

D

S

E

C

D

S

E

C

D

Delete C EReverse C E

Add C D

14

Algorithm B

Score metric

• A greedy construction heuristic

15

Decision Networks• A decision network represents information about

– The agent’s current state– It’s possible actions– The state that will result from the agent’s action– The utility of that state

• Also called, Influence Diagrams

• Types of nodes– Chance nodes: Represents random variables (same as Bayesian

networks) – Decision nodes (rectangles) : Represent points where the decision

maker has a choice of actions– Utility nodes (diamonds) : Represent the agent’s utility function (also

called value nodes)

16

Example: Studying

• Trying to decide whether to study hard for the “Advanced Artificial Intelligence” exam

• Your mark will depend – How hard you study – How hard the exam is – How well you did at “Artificial Intelligence” course (which

indicates how well prepared you are)

• Expected utility for two possible actions (study hard or not) given that you received an HD (High Degree) for AI, but the exam is hard

17

Example: Studying

18

Research Overview

BayesianNetworks

Learning

StructureDesignInference

ManualConstruction

OntologyFuzzy

HierarchicalStructure

EvolutionaryLearning

DynamicBayesianNetworks

19

BN+BN: Behavior Network with Bayesian Network for Intelligent Agent

20

Agenda

• Motivation• Related Works• Backgrounds

– Behavior network– Bayesian network

• BN+BN• Experimental Results• Conclusions and Future Works

21

Motivation

• ASM (Action selection mechanism) : Combine behaviors for generating high-level behaviors– Impossible to insert global goals in explicit or implicit manner

• Behavior network– Inserting global goals into ASM– Difficult to insert human’s prior knowledge about goal activation

because there are many goals to be active• Bayesian network

– Computational method to represent human’s knowledge into graph model with inference capability

• BN+BN– Applying Bayesian network to represent prior knowledge about

goal activation in behavior network

22

Related Works

• Bayesian network structure learning from data– Conditional independence test– Scoring-based optimization– Hybrid of the two approaches

• Agent architectures– Reactive control : Don’t think, react– Deliberative control : Think hard, then act

• BDI (belief, desire, intention) model– Hybrid control : Think and act independently, in parallel– Behavior-based control : Think the way you act– Layered architecture

• Brian Duffy “Social robot”

23

Behavior-based AI

• Behavior-based AI– Controller consists of a collection of “behaviors”

• Coordinating multiple behaviors– Deciding what behavior to execute at each point in time

locomote

avoid hitting things

explore

manipulate the world

build maps

sensors actuators

behavior-based robotics[Brooks, 1986]

24

Layered Architecture

The social robot

[Brian Duffy, 2000]

25

Representation

• Basic characteristic Competition of behaviors • Behavior

– Precondition : A set of states that have to be true for the execution of behavior

– Add list : A set of states that are likely to be true by the execution of behavior

– Delete list : A set of states that are likely to be false by theexecution of behavior

– Activation level • External links

– From goals to behavior– From environmental states to behavior

• Internal links

26

An Example

AB

C D

E F

G H

Predecessor linkSuccessor link

27

Spreading Activation

State Behavior

If (state=true) & (states ∈ precondition of behavior)

Then activation (behavior) += activation (state)

Goal Behavior

If (goal=true) & ( goal ∈ add list)

Then activation (behavior) += activation (goal)

Behavior1 Behavior2 Behavior1 Behavior2

If predecessor (behavior1, behavior2) = true

Then activation (behavior2) += activation (behavior1)

predecessor successor

If successor link (behavior1, behavior2) = true

Then activation (behavior1) += activation (behavior2)

28

Action Selection

/* Select one action among candidates */WHILE (1) {

initialization(); // clear candidate listspreading activation(); // updates activation levelnormalization(); // normalize activation level of behaviorsFOR all behaviors {

IF( all preconditions are true && activation (behavior) > threshold) {

candidate (behavior); // register to candidate list}

}/* select one candidate behavior with the highest activation */IF( candidate () = NULL) { /* there is no candidate behavior in the list */

threshold = 0.9 * threshold; /* decrease threshold */}ELSE{

select();break;

}}

29

Overview

• BAYES rule– Pr (A|B) = (Pr (B|A) X Pr (A)) / Pr (B)

• How to encode dependencies among variables?– Impossible to encode all dependencies– Bayesian network Approximation for real dependencies

among variables with graphical model• How to construct Bayesian network?

– By expert Manual construction– By learning from data

• Structure learning• Parameter learning

• Bayesian network = Structure (dependencies among variables) + Conditional probability table (parameters)

30

An Example

31

Overview

S1

Sp

S3

S2

G1

Gn

G2

B1

Bk-1B3

Bk

B2

B : Behavior

S : Environmental state

G : Goal

: Variable of Bayesian netV

w1

w2

wn

… …

Bayesian network(Goal coordination)

V1

V3Vr

V2

…Environmental states

Weights of goals

32

Algorithm

• Spreading activation

• Action selection…initialization(); // clear candidate listBayesian(); // infer weights of goalsspreading activation(); // updates activation levelnormalization(); // normalize activation level of behaviors…

Goal Behavior

If (goal=true) & ( goal ∈ add list)

Then activation (behavior) += activation (goal) * weight (goal)

w

33

Environments

Area 1 Area 4

Area 2 Area 3

Area 1 Area 4

Area 2 Area 3

Area 1 : Many small obstaclesArea 2 : One light sourceArea 3 : Two light sourcesArea 4 : Long obstacles

How to know area without map?Bayesian network

Coordinating goal of BN with area information

Goals of behavior network- Minimizing bumping in two different obstacle styles- Go to light source

34

Behavior Network Design

ObstacleIs Near

NothingAround Robot

Light Level I

Light Level II

No LightSource

Going to Light Source

Minimizing Bumping A

MinimizingBumping B

FollowingLight

GoingStraight

AvoidingObstacle

Sensors Behaviors Goals

ObstacleIs Near

NothingAround Robot

Light Level I

Light Level II

No LightSource

Going to Light Source

Minimizing Bumping A

MinimizingBumping B

FollowingLight

GoingStraight

AvoidingObstacle

Sensors Behaviors Goals

35

Bayesian Network Learning

Area Evaluation Sensor Value

Data Generation

Distance1500.0498.0

Distance 2 Distance 3400.0 1022.0300.0 1020.0

Light 1499.0450.0

Light 2 Area200.0 1220.0 2

Bayesian Network Learning

Area Evaluation Sensor Value

Data Generation

Distance1500.0498.0

Distance 2 Distance 3400.0 1022.0300.0 1020.0

Light 1499.0450.0

Light 2 Area200.0 1220.0 2

Bayesian Network Learning

Area Evaluation Sensor Value

Data Generation

Distance1500.0498.0

Distance 2 Distance 3400.0 1022.0300.0 1020.0

Light 1499.0450.0

Light 2 Area200.0 1220.0 2

Bayesian Network Learning

Area Evaluation Sensor Value

Data Generation

Distance1500.0498.0

Distance 2 Distance 3400.0 1022.0300.0 1020.0

Light 1499.0450.0

Light 2 Area200.0 1220.0 2

Bayesian Network Learning

36

Bayesian Network Learning (2)Light8 Light2

Area2Light7

Area4 Light4

Area1

Area3

Distance6

Distance5

Distance1

Distance2

Distance3

Distance4

37

Simulation Results

Only behavior network BN+BN

-Two results of only behavior network Ignoring light source though it is nearMinimizing bumping is well satisfied

-BN+BN Robot does not ignore light source but it bumps many times in area 1Controlling degree of Bayesian network’s incorporation into behavior network

is needed

38

An Efficient Attribute Ordering Optimization in Bayesian Networks for Prognostic Modeling

of the Metabolic Syndrome

39

Outline• Motivation

– Bayesian Networks– Why Is Attribute Ordering Optimization in BN Needed?

• Backgrounds– Metabolic Syndrome– Bayesian Networks in Medical Domain

• Proposed Method– Overall Flow– Preprocessing & Attribute Selection – Attribute Ordering Optimization – Structure & Parameter Learning

• Experiments– Dataset – Parameter & Setting– Results & Analyses

• Conclusions & Future Works

40

Bayesian Networks• Bayesian networks

– Be represented as directed acyclic graph• Nodes Probabilistic variables• Arches Dependencies between variables

– Powerful technique for handling uncertainty

• BN structure– Learned from the learning data– Designed by domain expert

• Several learning algorithms for BN structureThe K2 algorithm

41

Why Is Attribute Ordering in BN is Needed?• The K2 algorithm

– When BN is learned, former attributes can be the parents of following attributes, but following ones cannot

– Attribute ordering influences BN structure

• Different BN structure with the same attributes (Example)

Attribute ordering matters !

1 2

42

Metabolic Syndrome• It requires the presence of three or more of the followings

(NCEP-ATP III):– Abdominal obesity:

• waist circumference >=102 cm in men• waist circumference >=88 cm in women

– Hypertriglyceridemia: • Triglyceridemia >= 150 mg/dL

– Low HDL cholesterol: • HDL cholesterol <40 mg/dL in men• HDL cholesterol <50 mg/dL in women

– High blood pressure: • Blood pressure >=130/<=85 mmHG

– High fasting glucose: • Fasting glucose >= 110 mg/dL

43

Bayesian Networks in Medical Domain

• Strengths of the BNs– Allow researchers to use the domain knowledge– Be interpretable and easily understood– Be superior in capturing interactions among input variables– Be less influenced by small sample size

• BN applications in medical domain– Antal et al used the BN to construct diagnostic model of ovarian

cancer & to classify its samples (2004)– Aronsky & Haug used the BN to diagnose of pneumonia (2000)– Burnside used BN to diagnose breast cancer (2000)– In addition, BN is utilized for several purposes such as patient

caring, tuberculosis model

44

Overall Flow

Pre-processing

AttributeOrdering

BN learning

Metabolic Syndrome Data

Pre-processing

Ordering by Attribute Group

Attribute Grouping

Attribute Selection

Ordering in Groups

Structure Learning

Change?

Prediction

Metabolic Syndrome Normal

Yes

No

Yes

No

Change?

Parameter Learning

45

Preprocessing & Attribute Selection

• Attribute selection– 11 informative attributes are

selected with medical knowledge

– 11 attributes include 8 attributes for definition plus 3 attributes from reference (Girod et al., 2003)

• Preprocessing– BN requires discretized input– Medical knowledge helped

discretization process (Mykkanen et al., 2004)

46

Attribute Ordering Optimization

47

Representation & Fitness Evaluation• Chromosome representation

GID GSize A1 A2 An3...

Chromosome

Attribute Group

G1 G2 G3 ... Gm-1 Gm

2/)1(1))(( ),(

),( +

+−=

nnIfRankn

p jgjg

• Fitness evaluation: By prediction rate of learning data

• Initialization: Performed at random

• Selection: Rank-based selection– p(g,j), the probability that each individual I(g,j) is selected, is

48

Genetic Operations• Cycle crossover operation & displacement mutation operation are used

G1 G2G3G5G6 G7G4

G1G2 G3G5G6 G7 G4

G2G6 G1 G4

G7 G5 G3

G5 G3 G7

G1 G6 G4 G2

G1 G2G6 G4 G3G5G7

G3G5 G7G1G2G6 G4

An example of cycle crossover operation

G1 G2 G3 G5G4 G6 G7

G6 G7

G1 G2 G3 G5G 4

G1 G2 G3 G5G4G6 G7

삽입

An example of displacement mutation operation

Proposed Method

49

Structure & Parameter Learning• Structure Learning: The K2 algorithm

• Parameter learning: Parameters are calculated from learning data

50

Dataset• Dataset (Shin et al., 1996)

– Surveys were conducted twice in 1993 and 1995 in Yonchon County, Korea

– Dataset contains• 1135 subjects including no missing value are used• 18 attributes that could influence the prediction of the

metabolic syndrome

1995

Metabolic SyndromeMetabolic Syndrome

NormalNormal

1993

??Prediction

&Analysis

BMIBPAgeSex,…

• Problem– Data in 1993 State in 1995 (Normal vs. Metabolic syndrome)

51

Parameter & Setting• GA

– Population size: 20– Generation limit: 100– Selection rate: 0.8– Crossover rate: 1– Mutation rate: 0.02

• Models for Comparison– For neural networks

• 11 (input) - 20 (hidden) - 2 (output)– For k-nearest neighbors

• k=3 (by preliminary experiment)• Data partition

– For evolution process• 3:1:1 (learning data : validation data : test data)

– For comparison experiment• 5 fold cross validation

52

Data Analysis by Age

0

0.1

0.2

0.3

0.4

0.5

0.6

26-35 36-45 46-55 56-65 66-75

Age

Rat

e of norm

alRat

e of M

S

• The rate of metabolic syndrome increases by age• Decrease of last part could be influenced by death by

complications of MS

53

Comparison by Attribute Selection

0.65

0.66

0.67

0.68

0.69

0.7

0.71

0.72

0.73

0.74

8 11 18

The number of attributes

Pred

ictio

n ra

te

54

Ordering Optimization Process

0.6700

0.6800

0.6900

0.7000

0.7100

0.7200

0.7300

0.7400

1 11 21 31 41 51 61 71 81 91

Generation

Fitn

ess

AverageHigheset

• Evolves well• Converges after 60th generation

55

Comparison Before and After Optimization

56

Comparison with Other Models

Recommended