15
Rafi Bojmel supervised by Dr. Boaz Lerner Automatic Threshold Automatic Threshold Selection Selection for conditional for conditional independence tests in independence tests in learning learning a Bayesian network a Bayesian network

Rafi Bojmel supervised by Dr. Boaz Lerner Automatic Threshold Selection for conditional independence tests in learning a Bayesian network

Embed Size (px)

Citation preview

Rafi Bojmel

supervised by Dr. Boaz Lerner

Automatic Threshold Selection Automatic Threshold Selection for conditional independence for conditional independence

tests in learning tests in learning a Bayesian networka Bayesian network

OverviewOverview Machine Learning (ML) investigates the mechanisms by

which knowledge is acquired through experience.

Hard-core ML based applications:

Web search engines, On-line help services

Document processing (text classification, OCR)

Biological data analysis, Military applications

The Bayesian network (BN) has become one of the most The Bayesian network (BN) has become one of the most

studied machine learning models for knowledge studied machine learning models for knowledge

representation, probabilistic inference and recently also representation, probabilistic inference and recently also

classification classification

Recent visit to Asia

Tuberculosis

Smoker

Lung cancer

Positive X-ray

Either Tuberculosis or

Lung cancer

Bronchitis

Dyspnea(shortness-of-breath)

BN Example (1)BN Example (1)A=yes A=no

P(A) 50%50% 50%50%

D=yes D=no

P(D | B=yes) 90%90% 10%10%

P(D | B=no) 5%5% 95%95%

Chest Clinic (Asia) ProblemChest Clinic (Asia) Problem

'

( | , , , , , , )

( , , , , , , , )

( , , , , , , )

( , , , , , , , )

( ', , , , , , , )L

P L D S X A B E T

P L D S X A B E T

P D S X A B E T

P L D S X A B E T

P L D S X A B E T

'

( | , , , , , , )

( ) ( ) ( | ) ( | ) ( | ) ( | , ) ( | ) ( | , )

( ) ( ) ( | ) ( ' | ) ( | ) ( | , ') ( | ) ( | , )L

P L D S X A B E T

P A P S P T A P L S P B S P E T L P X E P D E B

P A P S P T A P L S P B S P E T L P X E P D E B

Recent visit to Abroad

Tuberculosis

Smoker

Lung cancer

Positive X-ray

Either Tuberculosis or Lung cancer

Bronchitis

Dyspnea (shortness-of-

breath)

'

( | ) ( | , )

( ' | ) ( | , ')L

P L S P E T L

P L S P E T L

Markov Blanketof Lung cancer

BN Example (2)BN Example (2) Chest Clinic (Asia) ProblemChest Clinic (Asia) Problem

Bayesian NetworksBayesian Networks

Learning Bayesian networks

Structure learning Parameter learning

Search-and-score

Constraint-based

Inference(e.g., classification)

Bayesian networkStructure/Graph

BN Structure LearningBN Structure Learning

Database Database Training Set Training Set Model Construction Model Construction

Test set Test set Bayesian inference (classification) Bayesian inference (classification)

Two main approaches in the area of BN Structure learning:Two main approaches in the area of BN Structure learning:

Search-and-Score, uses heuristic search method

Constraint based, analyzes dependency relationships among nodes, using

conditional independence (CI) tests. The PC algorithm is a CB based algorithm.

………………………………………………

1100000000000000#6#6

0011110000110011#5#5

0011110011111100#4#4

1100111111001111#3#3

1100001100000000#2#2

1100001100001100#1#1

DDyspneaXX-ray-rayEEitheritherBBronchitisLLung cancerung cancerTTuberculosisuberculosisSSmokermokerAAsiasia

PC algorithm (1)PC algorithm (1) Inputs:Inputs:

V: set of variables (and corresponding database)

I*(Xi,Xj|{S}) <> ε: A test of conditional independence

ε: Threshold

Order{V}: Ordering of V

Output:Output:

Directed Acyclic Graph (DAG)

V

*, |

, | , , log| |

j j i i

i j

i j i j i js x X x X si j

P x x sI X X P x x s X X s

P x s P x s

S S

S

Xi,Xj = any two nodes in the graph

I*(Xi,Xj|{S}) = Normalized Conditional Mutual Information

{S} = subset of variables (other than Xi,Xj)

PC algorithm (2)PC algorithm (2) The algorithm contains three stagesThe algorithm contains three stages::

Stage I: Start from the complete graph and find an undirected graph using conditional independence tests

Stage II: Find some head to head (V-Structures) links( X – Y – Z becomes X Y Z )

Stage III: Orient all those links that can be oriented

V

Recent visit to Asia

Tuberculosis

Smoker

Lung cancer

Positive X-ray

Either Tuberculosis or

Lung cancer

Bronchitis

Dyspnea(shortness-of-breath)

PC Algorithm SimulationPC Algorithm SimulationStageI

,I A S , |I S D B

, | ,I A D T E

END

StageII

V-structure

V-structure

StageIIIPre

cise

Struct

ure

Threshold SelectionThreshold Selection – existing methods – existing methods

Arbitrary (trial-and-error) selectionArbitrary (trial-and-error) selection

Disadvantages: haphazardness, inaccuracy, time

Likelihood or Classifier Accuracy based selectionLikelihood or Classifier Accuracy based selection

Disadvantages: exponentially run-time

V

The “risk” in selecting the wrong threshold:

Too small too many edges causality run-time

Too large loose important edges inaccuracy

Threshold selection - Novel Technique (1)Threshold selection - Novel Technique (1)

Mutual information Probability Density Functions based:

Calculate the MI values, I*(Xi,Xj | {S})I*(Xi,Xj | {S}), for different sizes

(orders) of condition set, S.

Create histograms (PDF estimation technique).Create histograms (PDF estimation technique).

Techniques to define the best threshold automatically:Techniques to define the best threshold automatically:

Zero-Crossing-Decision (ZCD)Zero-Crossing-Decision (ZCD)

Best-Candidate (BC)Best-Candidate (BC)

V

V

0 0.2 0.4 0.6 0.8 1

Mutual Information I (Xi,X

j) = mi

f MI (

mi)

Ideal and bimodal PDF of Mutual Information

bimodal MI PDFideal MI PDF

Threshold selection - Novel Technique (2)Threshold selection - Novel Technique (2)

V

Histogram of CMI values - Illustration

0

50

100

150

200

250

300

00.

020.

040.

060.

08 0.1

0.12

0.14

0.16

0.18 0.

20.

220.

240.

260.

28 0.3

CMI Values

CM

I C

ou

nte

r

Order 0 (Mutual Information)

Order 1 (Conditional Mutual Information, |S|=1)

ZCD (order=0) ZCD (order=1)

Zero-Crossing-Decision (ZCD)Zero-Crossing-Decision (ZCD)

V

Experiment and ResultsExperiment and Results

Classification experiments with 8 real-world

databases have been performed (UCI Repository)

Databases sizes: 128 - 3,200 cases.

Graph sizes: 5 - 17 nodes.

Dimension of class variable: 2 - 10.

Results - Classification Performance

0

10

20

30

40

50

60

70

80

90

100

Australian Car Cmc Corral Crx Flare Iris Mofn-3-7-10

Cla

ssific

atio

n a

ccu

racy (

%)

OTHER CB PC (Manual)PC (ZCD) PC (BC)PC (AVG) ZCD (AVG)

SummarySummary

The PC algorithm requires selecting a threshold for The PC algorithm requires selecting a threshold for

structure learning, which is a time-consuming process structure learning, which is a time-consuming process

that also undermines automatic structure learning.that also undermines automatic structure learning.

Initial examination of our novel techniques testifies that Initial examination of our novel techniques testifies that

there is a potential of both enjoying the automatic there is a potential of both enjoying the automatic

process and improving performance.process and improving performance.

Further research is executed in order to valid and Further research is executed in order to valid and

improve the proposed techniques.improve the proposed techniques.