1
Bayesian Networks
2
Bayes’ Rule & Bayesian Inference
• Bayesian inference is statistical inference in which probabilities are interpreted as degrees of belief
• The name comes from the frequent use of the Bayes’ rule
)()()|(
)(),()|(
EPHPHEP
EPEHPEHP ==
3
Bayesian Networks
• Many variables exist and joint probability distribution is important– How to represent joint probability distribution effectively
• Bayesian networks: Graphical representation (acyclic directed graph) of joint probability distribution – Node : Variables– Edge : Probabilistic dependency
• A node can represent any kind of variable, be it an observed measurement, a parameter, a latent variable, or a hypothesis
• Node are not restricted to representing random variables
4
Why Use BNs?• Explicit management of uncertainty
• Modularity (modular specification of a joint distribution) implies maintainability
• Better, flexible and robust decision making – MEU (Maximization of Expected Utility), VOI (Value of Information)
• Can be used to answer arbitrary queries - multiple fault problems (General purpose “inference” algorithm)
• Easy to incorporate prior knowledge and to understand
5
= P(A) P(S) P(T|A) P(L|S) P(B|S) P(C|T,L) P(D|T,L,B)
P(A, S, T, L, B, C, D)
Conditional Independencies Efficient Representation
T L B D=0 D=10 0 0 0.1 0.90 0 1 0.7 0.30 1 0 0.8 0.20 1 1 0.9 0.1
...
Lung Cancer (L)
Smoking (S)
Chest X-ray (C)
Bronchitis (B)
Dyspnoea (D)
Tuberculosis (T)
Visit to Asia (A)
P(D|T,L,B)
P(B|S)
P(S)
P(C|T,L)
P(L|S)
P(A)
P(T|A)
[Lauritzen & Spiegelhalter, 95]
An Example of Bayesian Network (Asia Network)
A kind of disease
A kind of disease
Difficult respiration
6
Bayesian Network Knowledge Engineering
• Objective: Construct a model to perform a defined task
• Participants: Collaboration between domain expert and BN modeling expert
• Process: iterate until “done”– Define task objective– Construct model– Evaluate model
7
The Knowledge Acquisition Task
• Variables:
– Collectively exhaustive, mutually exclusive values
– Clarity test: value should be knowable in principle
• Structure
– If data available, can be learned
– Constructed by hand (using “expert” knowledge)
– variable ordering matters: Causal knowledge usually simplifies
• Probabilities
– Can be learned from data
– Sensitivity analysis
8
Learning Bayesian Network
• Why learn a Bayesian network?– It is expensive to gain expert knowledge ...
• But Collecting data is cheap!!– It is difficult to use Expert knowledge for BN
• Applying data to learning method... Easy!!– It can be mixed expert knowledge and data– Knowledge discovery
DATADATA
PriorPriorInformationInformation
LearningMethod
LearningLearningMethodMethod
S
E
C
D
…………0.30.7FT0.10.9TT
P(E|S,C)CS
Structure
CPT(parameter)
9
Compactness and Node Ordering
• Compact BN has order k (number of edges) < N (number of nodes)– Sparsely connected
• The optimal order to add nodes is to add the “root causes” first, then variable they influence, so on until “leaves” reached
• Example of poor ordering (which still represent the same joint distribution)
10
Network Construction Algorithm
1) Choose the set of relevant variables Xi that describe the domain
2) Choose an ordering for the variables
3) While there are variable left:
1) Pick a variable Xi and add a node to the network for it
2) Set Parents(Xi ) to some minimal set of nodes already in the net such that the conditional independence property is satisfied
3) Define the CPT (Conditional Probability Table) for Xi
))(|(=),...,|( 11- iiii XParentsXPXXXP
11
K2 Algorithm (1)
• Assume that node input in order• A set of n nodes in order, an upper bound u on the # of parents a
node may have, and a database D containing m cases
{K2} end{for} end
),:'node thisof Parents' ,,:'Node' write({while}; end
false; : dOKToProcee else };{:
;: then if
});{,(: });{,( maximizes that -)Pred(in node thebe let
do and dOKToProcee whiletrue;: dOKToProcee );,(:
;: do to1:for
K2; procedure
ii
ii
newold
oldnew
inew
iii
i
iold
i
x
zPP
PPziscoreP
ziscorexzu
iscoreP
ni
π
ππ
πππ
π
πφπ
=∪=
=>
∪=∪
<
==
==
Pred(node) that returns the set of nodes that precede node in the node ordering
12
K2 Algorithm (2)
• Example– Node ordering : A B C D E
A A B
C
A B
C
A B
D
C
A B
D
E
C
A B
D
E
13
Complex Structure• Theorem : Finding maximal scoring network structure with at most k
parents for each variables(nodes) is NP-hard for k >1• We solve the problems by using heuristic search
– Traverse the search space looking for high-scoring structures
• Caching : To update the score of after a local change, we only need to re-score the families that were changed in the last move(Decomposability)
S
E
C
D
S
E
C
D
S
E
C
D
S
E
C
D
Delete C EReverse C E
Add C D
14
Algorithm B
Score metric
• A greedy construction heuristic
15
Decision Networks• A decision network represents information about
– The agent’s current state– It’s possible actions– The state that will result from the agent’s action– The utility of that state
• Also called, Influence Diagrams
• Types of nodes– Chance nodes: Represents random variables (same as Bayesian
networks) – Decision nodes (rectangles) : Represent points where the decision
maker has a choice of actions– Utility nodes (diamonds) : Represent the agent’s utility function (also
called value nodes)
16
Example: Studying
• Trying to decide whether to study hard for the “Advanced Artificial Intelligence” exam
• Your mark will depend – How hard you study – How hard the exam is – How well you did at “Artificial Intelligence” course (which
indicates how well prepared you are)
• Expected utility for two possible actions (study hard or not) given that you received an HD (High Degree) for AI, but the exam is hard
17
Example: Studying
18
Research Overview
BayesianNetworks
Learning
StructureDesignInference
ManualConstruction
OntologyFuzzy
HierarchicalStructure
EvolutionaryLearning
DynamicBayesianNetworks
19
BN+BN: Behavior Network with Bayesian Network for Intelligent Agent
20
Agenda
• Motivation• Related Works• Backgrounds
– Behavior network– Bayesian network
• BN+BN• Experimental Results• Conclusions and Future Works
21
Motivation
• ASM (Action selection mechanism) : Combine behaviors for generating high-level behaviors– Impossible to insert global goals in explicit or implicit manner
• Behavior network– Inserting global goals into ASM– Difficult to insert human’s prior knowledge about goal activation
because there are many goals to be active• Bayesian network
– Computational method to represent human’s knowledge into graph model with inference capability
• BN+BN– Applying Bayesian network to represent prior knowledge about
goal activation in behavior network
22
Related Works
• Bayesian network structure learning from data– Conditional independence test– Scoring-based optimization– Hybrid of the two approaches
• Agent architectures– Reactive control : Don’t think, react– Deliberative control : Think hard, then act
• BDI (belief, desire, intention) model– Hybrid control : Think and act independently, in parallel– Behavior-based control : Think the way you act– Layered architecture
• Brian Duffy “Social robot”
23
Behavior-based AI
• Behavior-based AI– Controller consists of a collection of “behaviors”
• Coordinating multiple behaviors– Deciding what behavior to execute at each point in time
locomote
avoid hitting things
explore
manipulate the world
build maps
sensors actuators
behavior-based robotics[Brooks, 1986]
24
Layered Architecture
The social robot
[Brian Duffy, 2000]
25
Representation
• Basic characteristic Competition of behaviors • Behavior
– Precondition : A set of states that have to be true for the execution of behavior
– Add list : A set of states that are likely to be true by the execution of behavior
– Delete list : A set of states that are likely to be false by theexecution of behavior
– Activation level • External links
– From goals to behavior– From environmental states to behavior
• Internal links
26
An Example
AB
C D
E F
G H
Predecessor linkSuccessor link
27
Spreading Activation
State Behavior
If (state=true) & (states ∈ precondition of behavior)
Then activation (behavior) += activation (state)
Goal Behavior
If (goal=true) & ( goal ∈ add list)
Then activation (behavior) += activation (goal)
Behavior1 Behavior2 Behavior1 Behavior2
If predecessor (behavior1, behavior2) = true
Then activation (behavior2) += activation (behavior1)
predecessor successor
If successor link (behavior1, behavior2) = true
Then activation (behavior1) += activation (behavior2)
28
Action Selection
/* Select one action among candidates */WHILE (1) {
initialization(); // clear candidate listspreading activation(); // updates activation levelnormalization(); // normalize activation level of behaviorsFOR all behaviors {
IF( all preconditions are true && activation (behavior) > threshold) {
candidate (behavior); // register to candidate list}
}/* select one candidate behavior with the highest activation */IF( candidate () = NULL) { /* there is no candidate behavior in the list */
threshold = 0.9 * threshold; /* decrease threshold */}ELSE{
select();break;
}}
29
Overview
• BAYES rule– Pr (A|B) = (Pr (B|A) X Pr (A)) / Pr (B)
• How to encode dependencies among variables?– Impossible to encode all dependencies– Bayesian network Approximation for real dependencies
among variables with graphical model• How to construct Bayesian network?
– By expert Manual construction– By learning from data
• Structure learning• Parameter learning
• Bayesian network = Structure (dependencies among variables) + Conditional probability table (parameters)
30
An Example
31
Overview
S1
Sp
S3
S2
G1
Gn
G2
B1
Bk-1B3
Bk
B2
B : Behavior
S : Environmental state
G : Goal
: Variable of Bayesian netV
w1
w2
wn
… …
…
Bayesian network(Goal coordination)
V1
…
V3Vr
V2
…Environmental states
Weights of goals
32
Algorithm
• Spreading activation
• Action selection…initialization(); // clear candidate listBayesian(); // infer weights of goalsspreading activation(); // updates activation levelnormalization(); // normalize activation level of behaviors…
Goal Behavior
If (goal=true) & ( goal ∈ add list)
Then activation (behavior) += activation (goal) * weight (goal)
w
33
Environments
Area 1 Area 4
Area 2 Area 3
Area 1 Area 4
Area 2 Area 3
Area 1 : Many small obstaclesArea 2 : One light sourceArea 3 : Two light sourcesArea 4 : Long obstacles
How to know area without map?Bayesian network
Coordinating goal of BN with area information
Goals of behavior network- Minimizing bumping in two different obstacle styles- Go to light source
34
Behavior Network Design
ObstacleIs Near
NothingAround Robot
Light Level I
Light Level II
No LightSource
Going to Light Source
Minimizing Bumping A
MinimizingBumping B
FollowingLight
GoingStraight
AvoidingObstacle
Sensors Behaviors Goals
ObstacleIs Near
NothingAround Robot
Light Level I
Light Level II
No LightSource
Going to Light Source
Minimizing Bumping A
MinimizingBumping B
FollowingLight
GoingStraight
AvoidingObstacle
Sensors Behaviors Goals
35
Bayesian Network Learning
Area Evaluation Sensor Value
Data Generation
Distance1500.0498.0
Distance 2 Distance 3400.0 1022.0300.0 1020.0
Light 1499.0450.0
Light 2 Area200.0 1220.0 2
Bayesian Network Learning
Area Evaluation Sensor Value
Data Generation
Distance1500.0498.0
Distance 2 Distance 3400.0 1022.0300.0 1020.0
Light 1499.0450.0
Light 2 Area200.0 1220.0 2
Bayesian Network Learning
Area Evaluation Sensor Value
Data Generation
Distance1500.0498.0
Distance 2 Distance 3400.0 1022.0300.0 1020.0
Light 1499.0450.0
Light 2 Area200.0 1220.0 2
Bayesian Network Learning
Area Evaluation Sensor Value
Data Generation
Distance1500.0498.0
Distance 2 Distance 3400.0 1022.0300.0 1020.0
Light 1499.0450.0
Light 2 Area200.0 1220.0 2
Bayesian Network Learning
36
Bayesian Network Learning (2)Light8 Light2
Area2Light7
Area4 Light4
Area1
Area3
Distance6
Distance5
Distance1
Distance2
Distance3
Distance4
37
Simulation Results
Only behavior network BN+BN
-Two results of only behavior network Ignoring light source though it is nearMinimizing bumping is well satisfied
-BN+BN Robot does not ignore light source but it bumps many times in area 1Controlling degree of Bayesian network’s incorporation into behavior network
is needed
38
An Efficient Attribute Ordering Optimization in Bayesian Networks for Prognostic Modeling
of the Metabolic Syndrome
39
Outline• Motivation
– Bayesian Networks– Why Is Attribute Ordering Optimization in BN Needed?
• Backgrounds– Metabolic Syndrome– Bayesian Networks in Medical Domain
• Proposed Method– Overall Flow– Preprocessing & Attribute Selection – Attribute Ordering Optimization – Structure & Parameter Learning
• Experiments– Dataset – Parameter & Setting– Results & Analyses
• Conclusions & Future Works
40
Bayesian Networks• Bayesian networks
– Be represented as directed acyclic graph• Nodes Probabilistic variables• Arches Dependencies between variables
– Powerful technique for handling uncertainty
• BN structure– Learned from the learning data– Designed by domain expert
• Several learning algorithms for BN structureThe K2 algorithm
41
Why Is Attribute Ordering in BN is Needed?• The K2 algorithm
– When BN is learned, former attributes can be the parents of following attributes, but following ones cannot
– Attribute ordering influences BN structure
• Different BN structure with the same attributes (Example)
Attribute ordering matters !
1 2
42
Metabolic Syndrome• It requires the presence of three or more of the followings
(NCEP-ATP III):– Abdominal obesity:
• waist circumference >=102 cm in men• waist circumference >=88 cm in women
– Hypertriglyceridemia: • Triglyceridemia >= 150 mg/dL
– Low HDL cholesterol: • HDL cholesterol <40 mg/dL in men• HDL cholesterol <50 mg/dL in women
– High blood pressure: • Blood pressure >=130/<=85 mmHG
– High fasting glucose: • Fasting glucose >= 110 mg/dL
43
Bayesian Networks in Medical Domain
• Strengths of the BNs– Allow researchers to use the domain knowledge– Be interpretable and easily understood– Be superior in capturing interactions among input variables– Be less influenced by small sample size
• BN applications in medical domain– Antal et al used the BN to construct diagnostic model of ovarian
cancer & to classify its samples (2004)– Aronsky & Haug used the BN to diagnose of pneumonia (2000)– Burnside used BN to diagnose breast cancer (2000)– In addition, BN is utilized for several purposes such as patient
caring, tuberculosis model
44
Overall Flow
Pre-processing
AttributeOrdering
BN learning
Metabolic Syndrome Data
Pre-processing
Ordering by Attribute Group
Attribute Grouping
Attribute Selection
Ordering in Groups
Structure Learning
Change?
Prediction
Metabolic Syndrome Normal
Yes
No
Yes
No
Change?
Parameter Learning
45
Preprocessing & Attribute Selection
• Attribute selection– 11 informative attributes are
selected with medical knowledge
– 11 attributes include 8 attributes for definition plus 3 attributes from reference (Girod et al., 2003)
• Preprocessing– BN requires discretized input– Medical knowledge helped
discretization process (Mykkanen et al., 2004)
46
Attribute Ordering Optimization
47
Representation & Fitness Evaluation• Chromosome representation
GID GSize A1 A2 An3...
Chromosome
Attribute Group
G1 G2 G3 ... Gm-1 Gm
2/)1(1))(( ),(
),( +
+−=
nnIfRankn
p jgjg
• Fitness evaluation: By prediction rate of learning data
• Initialization: Performed at random
• Selection: Rank-based selection– p(g,j), the probability that each individual I(g,j) is selected, is
48
Genetic Operations• Cycle crossover operation & displacement mutation operation are used
G1 G2G3G5G6 G7G4
G1G2 G3G5G6 G7 G4
G2G6 G1 G4
G7 G5 G3
G5 G3 G7
G1 G6 G4 G2
G1 G2G6 G4 G3G5G7
G3G5 G7G1G2G6 G4
An example of cycle crossover operation
G1 G2 G3 G5G4 G6 G7
G6 G7
G1 G2 G3 G5G 4
G1 G2 G3 G5G4G6 G7
삽입
An example of displacement mutation operation
Proposed Method
49
Structure & Parameter Learning• Structure Learning: The K2 algorithm
• Parameter learning: Parameters are calculated from learning data
50
Dataset• Dataset (Shin et al., 1996)
– Surveys were conducted twice in 1993 and 1995 in Yonchon County, Korea
– Dataset contains• 1135 subjects including no missing value are used• 18 attributes that could influence the prediction of the
metabolic syndrome
1995
Metabolic SyndromeMetabolic Syndrome
NormalNormal
1993
??Prediction
&Analysis
BMIBPAgeSex,…
• Problem– Data in 1993 State in 1995 (Normal vs. Metabolic syndrome)
51
Parameter & Setting• GA
– Population size: 20– Generation limit: 100– Selection rate: 0.8– Crossover rate: 1– Mutation rate: 0.02
• Models for Comparison– For neural networks
• 11 (input) - 20 (hidden) - 2 (output)– For k-nearest neighbors
• k=3 (by preliminary experiment)• Data partition
– For evolution process• 3:1:1 (learning data : validation data : test data)
– For comparison experiment• 5 fold cross validation
52
Data Analysis by Age
0
0.1
0.2
0.3
0.4
0.5
0.6
26-35 36-45 46-55 56-65 66-75
Age
Rat
e of norm
alRat
e of M
S
• The rate of metabolic syndrome increases by age• Decrease of last part could be influenced by death by
complications of MS
53
Comparison by Attribute Selection
0.65
0.66
0.67
0.68
0.69
0.7
0.71
0.72
0.73
0.74
8 11 18
The number of attributes
Pred
ictio
n ra
te
54
Ordering Optimization Process
0.6700
0.6800
0.6900
0.7000
0.7100
0.7200
0.7300
0.7400
1 11 21 31 41 51 61 71 81 91
Generation
Fitn
ess
AverageHigheset
• Evolves well• Converges after 60th generation
55
Comparison Before and After Optimization
56
Comparison with Other Models