Upload
andrew-kreimer
View
101
Download
0
Embed Size (px)
Citation preview
A Novel Structure Learning Algorithm for Optimal Bayesian
Network: Best ParentsAndrew Kreimer, Dr. Maya Herman
Dept. of Mathematics and Computer ScienceThe Open University of Israel, Ra’anana, Israel
Agenda Introduction Motivation Rational Best Parents algorithm Experiments Conclusions & future research
Introduction: Bayesian Networks DAG (Directed Acyclic Graph) CPT (Conditional Probability Tables) for each feature Guiding estimation rule: Bayes’ theorem P(A|B) = P(B|A)P(A)/P(B) Categorical features
Introduction: Iris example
Source: UCI ML, WEKA
Introduction: Structure Learning Search problem (best structure) Optimization problem (structure metrics minimization/maximization) K2, TAN (Tree Augmented Naive Bayes), Hill Climbing, Simulated Annealing,
Tabu, GA (Genetic Algorithms) etc.
Motivation Structure learning is a complex optimization problem Avoid feature ordering, DAG validity or structure metrics Deterministic solution Optimal Bayesian Network
Best Parents: Rational Rely on direct quality of features Incorporating attribute relational metrics to find the best rules
(Conditional Entropy) Greedy construction method Structure learning in a deterministic simple way Top down approach No feature ordering, DAG validation, structure metrics
Best Parents: Feature Direction Find the optimal structure using only attributes relations Bayesian Networks provide an immediate visual dependency of
attributes relative to each other Some attributes may influence several other attributes Usually we apply a bounded number of relations to avoid unfeasible
structures
Best Parents: Feature Relations A measurement of direction impact is applied (Conditional Entropy)
Zero conditional entropy reflects a complete dependence Let us define relations:
A B as best child of A is B A B as B is the best parent of A
Best Parents: Pseudo Code BestParents(dataset)
Count instantiations Calculate Conditional Entropy Save expansion rules (sorted) Expand structure (greedy algorithm with black list)
Best Parents: Construction Example a. best child rules: the best child of attribute f5 is f4
Best child rule notation: we have f4 f6, f1 f8 b. shows best parent rules: the best parent of attribute f2 is f7
Best parent rule notation: we have f3 f1, f9 f8
Source: WEKA BN explorer
Best Parents: Sorted Rules Best Children
f4 f6, 0.2 f1 f8, 0.3
Best Parents f3 f1, 0.5 f9 f8, 0.8
How to expand: Best Parents (the best) Best Children Combine them all (full list)
Best Parents: Complexity Iterate over all features m and samples n Counting instantiations is Feature direction search is Combining the two running times we get No iterative optimization or closed space search
Experiments Comparison to Random Forest, K2, TAN, Hill Climber, Tabu search and Naïve
Bayes Implemented in Java using the WEKA environment Public datasets Minor feature selection (cardinality) and data preprocessing No feature engineering Two key factors for performance assessment: normalized AUC and running
time Random split, 0.7 / 0.3
Experiment: Kaggle BNP Paribas Goal:
Reveal optimal BN Accelerating claims management process
Data: contains several nominal features with high cardinality (excluded) Results: Best Parents converged faster, TAN has higher AUC Conclusion: Best Parents is better when combining AUC and performance
RandomForest BN: Local TAN New BN: Best Parents
New BN: Full List BN: Local K2 BN: Local Hill Climber
BN: Local Tabu Naive Bayes0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
AUC Performance
Source: Kaggle BNP Paribas
Experiment: Criteo Goal:
Reveal optimal BN Predict clicks (estimate CTR)
Data: Widely used as a benchmark for large scale training High cardinality of nominal features Originally contains 40 features having 14 numerical features Tested on a small sample dataset having only 20 features with the lowest cardinality
Results: Best Parents converged faster, TAN has higher AUC Conclusion: Best Parents is better when combining AUC and performance
RandomForest BN: Local TAN New BN: Best Parents
New BN: Full List BN: Local K2 BN: Local Hill Climber
BN: Local Tabu Naive Bayes0.68
0.69
0.7
0.71
0.72
0.73
0.74
0.75
0.76
0.77
AUC Performance
Source: Kaggle Criteo, Criteo
Experiment: Kaggle Homesite Goal:
Reveal optimal BN Targeting potential customers of insurance plans
Data: mostly numeric features Results: Best Parents converged faster, TAN has higher AUC Conclusion: Best Parents is better when combining AUC and performance
RandomForest BN: Local TAN New BN: Best Parents
New BN: Full List BN: Local Tabu BN: Local K2 BN: Local Hill Climber
Naive Bayes0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
AUC Performance
Source: Kaggle Homesite
Experiment: Poker Hand Goal:
Reveal optimal BN Classify poker hands
Data: 5 cards and hand ML is not efficient at solving this Simple algorithm can identify the strength of a given hand deterministically
Results: Best Parents converged faster, TAN has higher AUC Conclusion: Best Parents is better when combining AUC and performance
RandomForest BN: Local TAN New BN: Best Parents
New BN: Full List BN: Local K2 Naive Bayes BN: Local Tabu BN: Local Hill Climber
0.48
0.58
0.68
0.78
0.88
0.98
AUC Performance
Source: UCI ML
Experiment: NYSE Stocks Goal:
Reveal optimal BN Classify trends and reveal trading signals
Data: 1.59 million samples and 95 features Technical indicators Mostly numeric features Binary class: trend up or down
Steps: Repeated random sampling (30 times), 5% sample size, 70%-30% train-test split ANOVA between all of the classifier performances Two t-tests between Best Parents and the two top classifiers (highest AUC)
Results: Significant difference of variances Significant difference of means Best Parents has higher mean (combined AUC and running time)
Conclusion: Best Parents is an optimal algorithm in terms of performance comprised of AUC and
running time
Experiment: Features analysis
RF BN: Local TAN
New BN: Best ParentsNew BN: Full List
BN: Local K2BN: Local Hill Climber
BN: Local TabuNaive Bayes
0.65
0.7
0.75
0.8
0.85
0.9
0.95
131 44 20 11
Classifier
AUC
Attributes
Source: Kaggle, UCI MLNumber of features does not improve mining
Experiment: Dataset analysis
Ran-domFor-
estBN: Local
TAN New BN: Best
ParentsNew BN: Full List BN: Local K2
BN: Local Hill Climber BN: Local
Tabu Naive Bayes
0.650.7
0.750.8
0.850.9
0.951
100000
114321
260753
1000000
100000 114321 260753 1000000
Classifier
AUC
Samples
Source: Kaggle, UCI ML
Higher number of samples improve mining
Conclusion Avoided preprocessed metadata
Feature ordering DAG validity checks Structure metrics
Deterministic solution Substantial optimality in terms of running time and AUC combination
Future Research Parallelized implementation Applications in large scale Improve attribute relation selection Expansion paths
Thank You!Andrew Kreimer, Dr. Maya Herman
Dept. of Mathematics and Computer ScienceThe Open University of Israel, Ra’anana, Israel