A Hierarchy of Independence Assumptions for Multi-Relational Bayes Net Classifiers

Slide 1

A Hierarchy of Independence Assumptions for Multi-Relational Bayes Net Classifiers

Oliver Schulte

BaharehBina

BrandenCrawford

DerekBinghamYiXiong

School of Computing ScienceSimon Fraser UniversityVancouver, Canada

#/18If you use insert slide number under Footer, that text box only displays the slide number, not the total number of slides. So I use a new textbox for the slide number in the master.This is a version of Equity.1OutlineA Hierarchy of Independence AssumptionsMulti-Relational ClassifiersMulti-Relational Independence AssumptionsClassification FormulasBayes NetsEvaluation

#/18Database TablesStudents-idIntelligenceRankingJack???1Kim21Paul12Professorp-idPopularityTeaching-aOliver31Jim21Coursec-idRatingDifficulty1013110222A Hierarchy of Independence AssumptionsLink-based ClassificationTarget table: StudentTarget entity: JackTarget attribute (class): Intelligence Tables for Entities, Relationships Can visualize as networkRegistrations-idc.idGradeSatisfactionJack101A1Jack102B2Kim102A1Paul101B1101JackRanking = 1Diff = 1Registration#/18Extended Database TablesA Hierarchy of Independence Assumptionss-idc.idGradeSatisfactionIntelligenceRankingRatingDifficultyJack101A1???131Jack102B2???122Kim102A12122Paul101B11231Students-idIntelligenceRankingJack???1Kim21Paul12Coursec-idRatingDifficulty1013110222Registrations-idc.idGradeSatisfactionJack101A1Jack102B2Kim102A1Paul101B1#/18Multi-Relational ClassifiersA Hierarchy of Independence AssumptionsAggregate relational featuresPropositionalizationExample: use average gradeDisadvantages: loses information slow to learn (up to several CPU days)Log-Linear ModelsExample: use number of A,s number of Bs, ln(P(class)) = xi wi ZDisadvantage: slow learningCount relational featuresLog-Linear Models With Independencies Fast to learnIndependence Assumptions may be only approximately true+ Independence Assumptions#/18Add references to Jensen, Jaiwei Han5Independence AssumptionsA Hierarchy of Independence Assumptions#/18Naive Bayes:non-class attributes are independent of each other, given the target class label.s-idc.idGradeSatisfactionIntelligenceRankingRatingDifficultyJack101A1???131Jack102B2???122Kim102A12122Paul101B11231Legend: Given the blue information,the yellow columns are independent.Independence Assumptions: Nave BayesA Hierarchy of Independence Assumptions#/18have data table highlighting colours to show independence.7Path Independence:Links/paths are independent of each other, given the attributes of the linked entities.Naive Bayes:non-class attributes are independent of each other, given the target class label.s-idc.idGradeSatisfactionIntelligenceRankingRatingDifficultyJack101A1???131Jack102B2???122Kim102A12122Paul101B11231Legend: Given the blue information,the yellow rows are independent.Path Independence#/18In table, just single links. More generally, independence among paths.8Path Independence:Links/paths are independent of each other, given the attributes of the linked entities.Influence Independence:Attributes of the target entity are independent of attributes of related entities, given the target class label.

Naive Bayes:non-class attributes are independent of each other, given the target class label.Path-Class Independence:the existence of a link/path is independent of the class label.s-idc.idGradeSatisfactionIntelligenceRankingRatingDifficultyJack101A1???131Jack102B2???122Kim102A12122Paul101B11231Legend: Given the blue information,the yellow columns are independent from the orange columns

Influence Independence#/18have data table highlighting colours to show independence.9Classification FormulasCan rigorously derive log-linear prediction formulas from independence assumptions.Path Independence: predict max class for: log(P(class|target attributes)) + sum over each table, each row: [log(P(class|information in row)) log(P(class|target atts))]PI + Influence Independence:predict max class for: log(P(class|target attributes)) + sum over each table, each row: [log(P(class|information in row)) log(prior P(class))]A Hierarchy of Independence Assumptions#/18Other formulas are in the paper.10Relationship to Previous FormulasAssumptionPrevious Work with Classification FormulaPath Independencenone; our new model.PI + Influence IndependenceHeterogeneous Naive Bayes ClassifierManjunath et al. ICPR 2010.PI + II + Naive BayesExists + Naive Bayes (single relation only)Getoor, Segal, Taskar, Koller 2001PI + II + NB + Path-ClassMulti-relational Bayesian ClassifierChen, Han et al. Decision Support Systems 2009A Hierarchy of Independence Assumptions#/18Getoor is IJCAI workshop reference11EvaluationA Hierarchy of Independence Assumptions#/18Data Sets and Base ClassifierBiopsyPatientOut-HospIn-HospInterferonHepatitisCountryBordersContinentCountry2EconomyGovernmentMondialLoanAccountOrderTransactionDispositionDistrictCardClientFinancialStandard DatabasesKDD Cup, UC IrvineMovieLens not shown.ClassifierCan plug in any single-table probabilistic base classifier with classification formula . We use Bayes nets.#/1813Qualitative part: Directed acyclic graph (DAG)Nodes - random vars. Edges - direct influenceQuantitative part: Set of conditional probability distributions0.90.1ebe0.20.80.01 0.990.90.1bebbeBEP(A | E,B)Family of AlarmEarthquakeRadioBurglaryAlarmCallCompact representation of joint probability distributions via conditional independenceTogether:Define a unique distribution in a factored form

What is a Bayes net?Figure from N. Friedman#/18See UBC www.aispace.org14Independence-Based Learning is FastA Hierarchy of Independence Assumptions

Training Time in secondsweakest assumptionstrongest assumption#/18Tilde is fast because it doesnt learn much.15Independence-Based Models are AccurateA Hierarchy of Independence Assumptions

Similar results for F-measure, Area Under Curveweakest assumptionstrongest assumption#/18Competitive AccuracyPath Independence hits sweet spot: fast, top accuracy.Performance improvement is not necessarily statistically significant.Similar results for f-measure, Area Under Curve.

16ConclusionSeveral plausible independence assumptions/classification formulas investigated in previous work.Organized in unifying hierarchy.New assumption: multi-relational path independence.most general, implicit in other models.Big advantage: Fast scalable simple learning. Plug in single-table probabilistic classifier.Limitation: no pruning or weighting of different tables.Can use logistic regression to learn weights (Bina, Schulte et al. 2013). Bina, B.; Schulte, O.; Crawford, B.; Qian, Z. & Xiong, Y. Simple decision forests for multi-relational classification, Decision Support Systems, 2013 #/18Thank you!A Hierarchy of Independence AssumptionsAny questions?

#/18

Documents

A Hierarchy of Independence Assumptions for Multi-Relational Bayes Net Classifiers