Yushi Jing GVU, College of Computing, Georgia Institute of Technology Vladimir Pavlovi ć

Boosted Augmented Naive BayesEfficient discriminative learning of

Bayesian network classifiers

Yushi Jing GVU, College of Computing, Georgia Institute of Technology

Vladimir PavlovićDepartment of Computer Science, Rutgers University

James M. RehgGVU, College of Computing, Georgia Institute of Technology

Contribution 1. Boosting approach to Bayesian network classification

o Additive combination of simple models (e.g. Naïve Bayes)

o Weighted maximum likelihood learning

o Generalizes Boosted Naïve Bayes (Elkan 1997)o Comprehensive experimental evaluation of BNB.

2. Boosted Augmented Naïve Bayes (BAN)

o Efficient training algorithm

o Competitive classification accuracyo Naïve Bayes, TAN, BNC (2004), ELR (2001)

Bayesian network Modular and Intuitive graphical representation

Explicit Probabilistic Representation

Bayesian network classifiers Joint distribution Conditional distribution Class Label

How to efficiently train Bayesian network discriminatively to improve its classification accuracy?

Parameter Learning

...................

Maximum Likelihood parameter learning Efficient parameter learning algorithm Maximizes LLG score

No analytic solution for parameters that maximizes CLLG

log ( | ) log ( )M M

LL P y x P x

Model selection

ML does not optimize CLLA

ELRA optimizes CLLA

(Greiner and Zhou, 2002)

ML optimizes CLLB when B is optimal

BNC algorithm searches for the

optimal structure (Grossman and Domingos, 2004)

C Ensemble of sparse model as an alternative to B Using ML to train each sparse model

Excellent classification accuracy

Computationally expensive in training

Talk outline

o Minimization function for Boosted Bayesian networko Empirical Evaluation of Boosted Naïve Bayeso Boosted Augmented Naïve Bayes (BAN)o Empirical Evaluation of BAN

Our Goal:

o Combine parameter and structure optimization

o Avoid over-fitting

o Retain training efficiency

Exponential Loss Function (ELF)

Boosted Bayesian network classifier minimizes ELF function.

1 exp{ 2 ( )}yF x

( | )FP y x

1 ( | )1exp log

2 ( | )

F i ii F

P y xELF

exp ( )M

ELF y F x

i ii FP y x

ELFF is an upper bound of –CLLF

( ) ( )k

F x f x

Minimizing ELF via ensemble method Ensemble method

Adaboost (Population version) constructs F(x) additively to approximately minimizes ELFF

Discriminatively updates the data weights

Tractable ML learning to train the parameters

1( )f x

2( )f x

( ) ( )k

F x f x

3( )f x

Results: 25 UCI datasets (BNB)

BNB vs. NB 0.151 vs.

BNB (10)

NB (2)

Results: 25 UCI datasets (BNB)

BNB vs. NB 0.151 vs.

BNB vs. ELR-NB

0.151 vs. 0.161

BNB vs. TAN 0.151 vs. 0.184

BNB vs. BNC-2P0.151 vs. 0.164

BNB (9)

TAN (2)

BNB (5*)

ELR-NB (4*)

BNB (7)

BNC-2P (3)

BNB (10)

NB (2)

(13) (14)

(16) (15)

Evaluation of BNB Computationally Efficient method

O(MNT) , T = 5~20, O(MN)

Good classification Accuracy Outperforms NB, TAN Competitive with ELR, BNC Sparse structure + boosting = competitive accuracy

Potential drawbacks Strongly correlated features (Corral, etc)

Structure Learning Challenge:

Efficiency NP-hard problem

K-2, Hill Climbing search still examines polynomial number of structures

Resisting overfitting Structure controls classifier capacity

Our proposed solution: Combines sparse model to form an ensemble

Constrains edge selection

Creating Step 1 (Friedman et al. 1999)

Build pair-wise conditional mutual information table

Create maximum spanning tree using conditional mutual information as edge weight

Convert a undirected graph into a directed graphtreeG

treeG4

Initial structure

1. Select Naïve Bayes

2. Create BNB via AdaBoost

3. Evaluate BNB

Iteratively adding edges

Ensemble CLL = -0.65

Ensemble CLL = -0.55?

Final BAN structure

BANGEnsemble of the final structure produced by

Analysis of BAN BAN

The base structure is sparser than BNC model

BAN uses an ensemble of sparser models to approximate a densely connected structure

Example of BAN model Example of BNC-2P model

Computational complexity of BAN Training Complexity: O(MN^2+ MNTS)

O (MN^2) G_tree O (MNTS) Structure Search

T => boosting iteration per structure S => number of structure examined S < N

Empirical training time T = 5~25, S = 0~5 Approximately 25-100 times the training of NB

Result (simulated dataset):

25 different distribution CPT table Number of features

4000 samples, 5 fold cross validation

True structure:

Naïve Bayes:

Results: (simulated dataset):

BAN(19)

NB (0)

BAN VS NB

BNB (0)

Results: (simulated dataset):

BAN VS BNB

BAN (3)

• Correct edges added under BAN

22 True structure:

BNB achieved optimal error in 22 datasets

BAN outperforms BNB in the remaining 3

Results: 25 UCI datasets (BAN)

Standard datasets for Bayesian network classifiers Friedman et. al. 1999 Greiner and Zhou 2002 Grossman and Domingos 2004

5 fold cross validation Implemented NB, TAN, BAN, BNB, BNC-2P Obtained results for ELR-NB, ELR-TAN

BAN VS NB 0.141 VS 0.173

BAN VS TAN 0.141 VS 0.184

BAN (10)

NB (2)

BAN (10)

TAN (2)

Results: BAN vs. Standard method

BAN VS BNC-2P 0.141 VS 0.164

BAN (7)

BNC (1)

Results: BAN vs. Structure Learning

BAN contains 0-5 augmented edges BNC-2P contains 4-16 augmented

BAN VS ELR-TAN 0.141 vs. 0.155

BAN VS ELR-NB 0.141 vs. 0.161

BAN (4)*

BAN (5)*

BAN (6)*

Error stats directly taken from published results

BAN is more efficient to train

Results: BAN vs. ELR

BAN (8)* (14)

Evaluation of BAN vs. BNB

BAN VS BNB 0.141 VS 0.151

Comparison under significance test BAN outperforms BNB (7)

Corral 2% - 5%

BNB outperforms BAN (2) 0.5%-2%

Not significant 13 BAN choose BNB as base structure

IRIS, MOFN

Average testing error 0.141 vs. 0.151 BAN outperforms BNB (16) BNB outperforms BAN (6)

BAN (7)

BNB (2)

Conclusion An ensemble of sparse model as an alternative to

structure and parameter optimization Simple to implement Very efficient in training Competitive classification accuracy

NB, TAN, HGC BNC ELR

Future Work Extend BAN to handle sequential data

Analyze the class of Bayesian network classifiers that can be approximated with an ensemble of sparse structures.

Can the BAN model parameters be obtained through parameter learning given the final model structure?

Can we use BAN approach to learn generative models?

Yushi Jing GVU, College of Computing, Georgia Institute of Technology Vladimir Pavlovi ć

Documents

2013 tms gvu pore water samples

Technical Report GIT GVU 07 11

Kursuscentret SAP MM - KCV€¦ · SAP │ IT & Økonomi │ Ledelse │ Salg & kommunikation │ Lean & Projektledelse │ GVU/IKV │ Sprog │ Konsulentydelser │ Læsning & regning

1 Georgia Tech, IIC, GVU, 2006 AnimationAnimation Rossignac, Collision detection Static interference tests Exacxt

1 5 15 c h 12 - Georgia Institute of Technologymshci.gatech.edu/sites/default/files/pdf/GVU-labs... · 2014-08-21 · Social Computing, and User Experience. The GVU Center welcomes

3D Compression, SM’02 1Jarek Rossignac, GVU Center, Georgia Tech1: T-meshes Triangle meshes Jarek Rossignac GVU Center and College of Computing Georgia

S.m.entertainment by zhu yushi

Team 29 Caine Pan, Kevin Qiao, Lorah Chong, Manman Lai, Yushi Xian October – April 2010 Sponsored and supported by BearingPoint

IPaT Industry Innovation Day - GVU Centergvu.gatech.edu/sites/default/files/showcase_spr17_final1... · 2017-04-13 · GVU Center & Digital Media Research Showcase 1 IPaT Industry

List of Primary Assistant Teachers / GVU(EGS) FOR 6 ...himachal.nic.in/WriteReadData/l892s/16_l892s/Regarding...List of Primary Assistant Teachers / GVU(EGS) FOR 6 Month P.D.P.E.T

civilnodrustvo.gov.rs · Web viewFindings, Best Practices Examples and Recommendations Prepared by Branka Pavlovi ć July 2011 CONTENT OF DOCUMENT Page LIST OF ABREVIATIONS . 3

· 2020-01-26 · 1.10 cÖwZôv‡bi evwlK weµq (cÖ‡hvR †ÿ‡Î) t 1.11 cÖwZôv‡bi †gvU evwlK Avq t 1.12 cÖwZôv‡bi †gvU evwlK eq t 1.13 cÖwZôv‡bi vqx mú` (f~wg

Mvggg7IgvU wvggg7-gvU &vgg7MgvUtinoosh/cmpe311/resources/partslist.pdf · Ivgggg7IgvU-vgg7IgvU Mvggg7IgvU wvggg7-gvU &vgg7MgvU Fvgboardg7IgvU xvggg7IgvU 0vggg7IgvU Gvgggg gggggncludegg9ggg7IgvU

Teaching Programming to Everyone through Media Computation Mark Guzdial College of Computing/GVU Georgia Institute of Technology

Interaction Techniques for Ambiguity Resolution in Recognition-based Interfaces Jennifer Mankoff CoC & GVU Center Georgia Tech

Prof. Clement Dzidonu President, Global Village University (GVU)

2013 tms gvu ksat

É ¯zj feb wbgv©Y gvU msLÉvt 15 wU, gvU LiPt 4 KvwU 21 jÿ 65 nvRvi UvKv

Localized bi-Laplacian Solver on a Triangle Mesh and Its Applicationsjarek/papers/Laplace.pdf · 2004. 5. 2. · GVU Technical Report Number: GIT-GVU-04-12 2 of the contact point

Master Global Vision Degree programs | gvu-edu