29
Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory The Chinese University of Hong Kong Shatin, NT. Hong Kong {kzhuang, king, lyu}@cse.cuhk.edu.hk ICANN&ICONIP2003, June, 2003 Istanbul, Turkey

Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

Finite mixture model of Bounded Semi-Naïve Bayesian Network Classifiers

Kaizhu Huang, Irwin King, Michael R. Lyu

Multimedia Information Processing Laboratory

The Chinese University of Hong KongShatin, NT. Hong Kong

{kzhuang, king, lyu}@cse.cuhk.edu.hk

ICANN&ICONIP2003, June, 2003Istanbul, Turkey

Page 2: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

2

Outline

Abstract Background

Classifiers Naïve Bayesian Classifiers Semi-Naïve Bayesian Classifiers Chow-Liu Tree

Bounded Semi-Naïve Bayesian Classifiers Mixture of Bounded Semi-Naïve Bayesian Classifiers Experimental Results Discussion Conclusion

Page 3: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

3

Abstract

Propose a technique for constructing semi-naïve Bayesian classifiers.

It is bounded by the number of variables that can be combined into a node.

It has a less computational cost than the traditional semi-naïve Bayesian networks.

Experiments show the proposed technique is more accurate. Upgrade the Semi-Naïve structure into a mixture

structure The expression power is increased Experiments show the mixture approach outperforms other

types of classifiers

Page 4: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

4

A Typical Classification Problem

Given a set of symptoms, one wants to find out whether these symptoms give rise to a particular disease.

Page 5: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

5

Probabilistic Classifiers The classification mapping function is defined as:

The joint probability is not easily estimated from the dataset; Usually, the assumption about the distribution has to be made, e.g., dependent or independent?

a constant for a given x w.r.t. cl

Background

Posterior probability

Joint probability

Page 6: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

6

Naïve Bayesian Classifiers (NB) Assumption: Given the class label C, the attributes

are independent: Classification mapping function

Related Work

(1)

Page 7: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

7

Related Work

Naïve Bayesian Classifiers NB’s performance is comparable with some state-

of-the-art classifiers even when its independency assumption does not hold in normal cases.

Question: Question: Can the performance be better when the conditional Can the performance be better when the conditional

independency assumption of NB is independency assumption of NB is relaxedrelaxed??

Page 8: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

8

Semi-Naïve Bayesian Classifiers(SNB) A looser assumption than NB. Independency occurs among the jointed variables,

given the class label C.

Related Work

Page 9: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

9

A tree dependence structure

Related Work

Chow-Liu Tree (CLT) Another looser assumption than NB. A dependence tree exists among the variables,

given the class variable C.

Page 10: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

10

A conditional tree

dependency assumption

among variables

A conditional independency

assumption among jointed

variables

Chow & Liu68 developed a

global optimal and polynomial

time cost algorithm

Traditional SNBs are not

well developed like CLT

Summary of Related Work

Page 11: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

11

Kononenko91 Pazzani96

Local heuristicLocal heuristic

Efficient?

Accurate?

NoInefficient even in

jointing 3 variables

No

Exponential time cost

Problems of Traditional SNBs

Yes

Semi-dependence does not hold

in real cases as wellStrong

Assumption?

Page 12: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

12

Our Solution

Bounded Semi-Naïve Bayesian Network(B-SNB) Accurate?

We use a global combinatorial optimization method. Efficient?

We find the network based on Linear Programming, which can be solved in polynomial time.

Mixture of B-SNB (MBSNB) Strong assumption?

Mixture structure is a superclass of B-SNB

Page 13: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

13

Our Solution

Improved significantly

Page 14: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

14

Jointed variables

Completely covering the variable set without overlapping

Conditional independency

Bounded

Bounded Semi-Naïve Bayesian Network Model Definition

Page 15: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

15

Large search space

Reduced by adding the constraint as follows: The cardinality of each jointed variable is exactly equal to K

Hidden principle: When K is small, a K cardinality of jointed variables will be more accurate than

separating them into several jointed variables. Example: P(a,b) P(c,d) is more close to P(a,b,c,d) than P(a,b)P(c)P(d).

Search space after reduction:

Constraining the Search Space

Page 16: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

16

How to search for the appropriate model? Finding the m= [n/K ] K-cardinality subsets (jointed variables)

from variables (features) set which satisfy the SNB conditions to maximize the Log likelihood.

[x] means rounding the x to the nearest integer

Searching K-Bounded-SNB Model

Page 17: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

17

Relax the previous constraints into 0x1--an integer programming

(IP) problem is changed into a linear programming (LP)

problem

Relax the previous constraints into 0x1--an integer programming

(IP) problem is changed into a linear programming (LP)

problem

No coverage among jointed

variables

All the jointed variables forms the variable set

Rounding Scheme:Rounding LP solution into an IP

Solution.

Rounding Scheme:Rounding LP solution into an IP

Solution.

Global Optimization Procedure

Page 18: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

18

Mixture Upgrading (using EM)

E STEP

M STEP

, update Sk dby B-SNB method

Page 19: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

19

Experimental Setup

Datasets 6 benchmark datasets from UCI machine learning repository 1 synthetically generated dataset named “XOR”

Experimental Environments Platform:Windows 2000 Developing tool: Matlab 6.1

Page 20: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

20

Overall Prediction Rate(%)

• We set the bound parameter K to 2 and 3.• 2-BSNB means the BSNB model for bounded parameter set to 2.

Experimental Results

Page 21: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

21

NB vs MBSNB

Page 22: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

22

BSNB vs MBSNB

Page 23: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

23

CLT vs MBSNB

Page 24: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

24

C4.5 vs MBSNB

Page 25: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

25

Average Error Rate

Average Error Rate Chart

Page 26: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

26

Observations

Large K B-SNBs are not good for sparse datasets. Post dataset: 90 samples; K=3, the accuracy

decreases.

Which value for K is good depends on the properties of the datasets. For example, Tic-Tac-Toe, Vehicle: 3-variable bias;

K=3, the accuracy increases.

Page 27: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

27

Discussion

When n cannot be divided by K exactly (n mod K)=l, l0, The assumption that all the joined variable has

the same cardinality K will be violated.Solution:

Find an l-cardinality jointed variable with the minimum entropy Do the optimization on the other n-l variables since (n-l mod K) will be

0.

How to choose K ? When the sample number of the dataset is small, a large K may

not get a good performance. A good K should be related to the nature of the datasets. A natural way is to use the cross validation methods to find the

optimal K.

Page 28: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

28

Conclusion

A novel Bounded Semi-Naïve Bayesian classifier is proposed.

Direct combinatorial optimization method enables B-SNB to have global optimization.

The transformation from IP into an LP problem reduces the computational complexity into a polynomial one.

A Mixture of BSNB is developed Expand the expression power of B-SNB Experimental results show the mixture approach outperforms

other types of classifiers.

Page 29: Finite mixture model of Bounded Semi- Naïve Bayesian Network Classifiers Kaizhu Huang, Irwin King, Michael R. Lyu Multimedia Information Processing Laboratory

ICANN&ICONIP 2003, JUNE, 2003 The Chinese University of Hong Kong Multimedia Information Processing Lab

30

Thank you!