Boosting Markov Logic Networks Tushar Khot Joint work with Sriraam Natarajan, Kristian Kersting and...

Boosting Markov Logic Networks

Tushar Khot

Joint work with Sriraam Natarajan, Kristian Kersting and Jude Shavlik

Sneak Peek Present a method to learn structure and

parameter for MLNs simultaneously

Use functional gradients to learn many weakly predictive models

Use regression trees/clauses to fit the functional gradients

Faster and more accurate results than state-of-the-art structure learning methods

q(X,Y)

n[p(X) ] > 0

n[q(X,Y) ] > 0

n[q(X,Y)] = 0

1.0 publication(A,P),

publication(B, P) → advisedBy(A,B)

c1 c2 c30

UsThem

Outline Background Functional Gradient Boosting Representations

Regression Trees Regression Clauses

Experiments Conclusions

Traditional Machine Learning

DataFeatures

B E A M J

1 0 1 1 0

0 0 0 0 1

0 1 1 0 1

Earthquake

Burglary

MaryCalls

JohnCalls

Task: Predicting whether burglary occurred at the home

Structure Learning

Earthquake

Burglary

MaryCalls JohnCalls

B E 0.9

B E 0.5

B E 0.4

B E 0.1P(M)

Parameter Learning

Real-World Datasets

Previous Mammogra

Previous Blood Tests

Previous Rx

Patients

Inductive Logic Programming ILP directly learns first-order rules from

structured data Searches over the space of possible rules Key limitation

The rules are evaluated to be true or false, i.e. deterministic

)()2,1( ),2,( ),1,( pbiopsyttnextTesttpmasstpmass

Logic + Probability = Statistical Relational Learning Models

Probabilities

Add Probabilities

Statistical Relational

Learning (SRL)

Add Relations

Friends(A,A)

Friends(A,B)

Smokes(A)

Friends(B,B)

Friends(B,A)

Smokes(B)

Friends(A,A)

Friends(A,B)

Smokes(A)

Friends(B,B)

Friends(B,A)

Smokes(B)

Weighted logic Markov Logic Networks

)()(),,(,

xSmokesySmokesyxFriendsyx

xCancerxSmokesx

(Richardson & Domingos, MLJ 2005)

Weight of formula i

Number of true groundings of formula i in worldState

iii worldStatenw

ZworldStateP )( exp

Structure

Weights

)()(),,(, xSmokesySmokesyxFriendsyx

Learning MLNs – Prior Approaches Weight learning

Requires hand-written MLN rules Uses gradient descent Needs to ground the Markov network Hence can be very slow

Structure learning Harder problem Needs to search space of possible clauses Each new clause requires weight-learning step

Motivation for Boosting MLNs True model may have a complex structure

Hard to capture using a handful of highly accurate rules

Our approach Use many weakly predictive rules Learn structure and parameters simultaneously

Problem Statement Given Training Data

First Order Logic facts Ground target predicates

Learn weighted rules for target predicates

student(Alice)professor(Bob)publication(Alice, Paper157)advisedBy(Alice,Bob)

publication(A,P), publication(B, P) → advisedBy(A,B) . . .

Functional Gradient Boosting Model = weighted combination of a large number of simple

functions

Predictions

Gradients

=Initial Model

Induce

Iterate

Final Model = + + + +…

J.H. Friedman. Greedy function approximation: A gradient boosting machine.

Probability of an example

We define the function ψ as

ntj corresponds to non-trivial groundings of clause Cj

Using non-trivial groundings allows us to avoid unnecessary computation

Function Definition for Boosting MLNs

( Shavlik & Natarajan IJCAI'09)

Functional Gradients in MLN Probability of example xi

Gradient at example xi

Learning Trees for Target(X)

q(X,Y)

n[p(X) ] > 0

n[p(X)] = 0

• Closed-form solution for weights given residues (see paper)• False branch sometimes introduces existential variables

n[q(X,Y)] > 0

n[q(X,Y)] = 0

Learning Clauses

• Same as squared error for trees• Force weight on false branches (W3 ,W2) to be 0• Hence no existential vars needed

Jointly Learning Multiple Target Predicates

Approximate MLNs as a set of conditional models Extends our prior work on RDNs (ILP’10, MLJ’11) to

MLNs Similar approach by Lowd & Davis (ICDM’10) for

propositional Markov Networks Represent every MN conditional potentials with a single

targetX targetY Data

Predictions

Gradients

= Induce

targetX

Boosting MLNsFor each gradient step

m=1 to M

For each query predicate, P

Generate trainset usingprevious model, Fm-1

Learn a regression function,

For each example, x

Compute gradient for x

Add <x, gradient(x)> to trainset

Add Tm,p to the model, Fm

Set Fm as current modelLearn Horn clauses with P(X) as head

Agenda Background Functional Gradient Boosting Representations

Experiments Approaches

MLN-BT MLN-BC Alch-D LHL BUSL Motif

Datasets UW-CSE IMDB Cora WebKB

Boosted Trees

Boosted Clauses

Discriminative Weight Learning (Singla’05)

Learning via Hypergraph Lifting (Kok’09)

Bottom-up Structure Learning (Mihalkova’07)

Structural Motif (Kok’10)

Results – UW-CSE

advisedBy AUC-PR CLL Time

MLN-BT 0.94 ± 0.06 -0.52 ±

0.45 18.4 sec

MLN-BC 0.95 ± 0.05 -0.30 ±

0.06 33.3 sec

Alch-D 0.31 ± 0.10 -3.90 ±

0.41 7.1 hrs

Motif 0.43 ± 0.03 -3.23 ±

0.78 1.8 hrs

LHL 0.42 ± 0.10 -2.94 ± 0.31 37.2 sec

Predict advisedBy relation Given student, professor, courseTA, courseProf,

etc relations 5-fold cross validation Exact inference since only single target predicate

Task: Entity Resolution Predict: SameBib, SameVenue, SameTitle,

SameAuthor Given: HasWordAuthor, HasWordTitle, HasWordVenue

Joint model considered for all predicates

Results – Cora

SameBib SameVenue SameTitle SameAuthor0

MLN-BT MLN-BC Alch-D LHL Motif

Target Predicates

Future Work Maximize the log-likelihood instead of

pseudo log-likelihood

Learn in presence of missing data

Improve the human-readability of the learned MLNs

Conclusion Presented a method to learn structure and

parameter for MLNs simultaneously FGB makes it possible to learn many effective

short rules Used two representation of the gradients

Efficiently learn order-of-magnitude more rules

Superior test set performance vs. state-of-the-art MLN structure-learning techniques

Thanks

Supported By DARPA Fraunhofer ATTRACT fellowship

STREAM European Commission

Boosting Markov Logic Networks Tushar Khot Joint work with Sriraam Natarajan, Kristian Kersting and...

Documents

AND JEFFREY KERSTING KATIE MIKULA, SARA C. … KATIE MIKULA, SARA C. KERSTING, KYLE KERSTING, ALEX KERSTING, AND JEFFREY KERSTING. Susan Grogan Faller (#0017777) …

Penetration and negative modifiers - Kersting · Penetration and negative modifiers Jones ... ... and Aces & Eights will be compatible

Probabilistic Inductive Logic Programmingpeople.csail.mit.edu/kersting/ecmlpkdd05_pilp/pilp_ida... · 2007. 2. 2. · 5.Discriminative ILP 6.Applications 6th International Symposium

Statistical Machine Learning · 2020-06-24 · Statistical Machine Learning Lecture 01: Introduction Kristian Kersting TU Darmstadt Summer Semester 2020 K. Kersting based on Slides

Graphical Models - Inference - Wolfram Burgard, Luc De Raedt, Kristian Kersting, Bernhard Nebel Albert-Ludwigs University Freiburg, Germany PCWP CO HRBP

Subhash Khot Georgia Tech

Dana Moshkovitz MIT Joint work with Subhash Khot, NYU 1

Khot Infrastructure Holdings, Ltd. · Khot Infrastructure Holdings, Ltd., (formerly Undur Tolgoi Minerals Inc.) [“KOT” or the “Company”] was incorporated on December 22, 2010

Learning from Human Teachers: Issues and Challenges for ILP in Bootstrap Learning Sriraam Natarajan 1, Gautam Kunapuli 1, Richard Maclin 3, David Page

An Exposition of Dinur-Khot-Kindler-Minzer-Safra’s … Exposition of Dinur-Khot-Kindler-Minzer-Safra’s Proof for the 2-to-2 Games Conjecture Mitali Bafna Chi-Ning Chou Zhao Song

SPECIFIC ACUTE INJURIES: THE ANKLE - · PDF fileUwe Kersting –MiniModule 05 –2011 Idræt –Biomekanik 2 2 © Uwe Kersting, 2011 Objectives ... normal __ injury trial ... 24 °front

Scanned by CamScanner · 2018-10-03 · keny greeshma jagdish kharabe akanksha sanjay khot farhat abu khot shubham hanmant kulkarni shamal mahesh lalwani muskan sunil maiya rajat

Circular 01/2020 - mumbaicustomszone1.gov.inmumbaicustomszone1.gov.in/writereaddata/images/... · 11 Amit Bharat Khot (ABK035M90001 ) 24.03.2020 12 Anand Nivas Patil (ANP035M95001

Martin Brokate Götz Kersting Measure Integralzusmanovich/teach/2018-measure-th/...M. Brokate, G. Kersting, Measure and Integral, Compact Textbooks in Mathematics, DOI 10.1007/978-3-319-15365-0_1

AnvBiomkAcc Force WebVersion - Aalborg Universitet · Anvendt Biomekanik Uwe Kersting –MiniModule 2008 2 ©Uwe Kersting, 2008 Objectives • Basic working principles of accelerometers

Learning First-Order Probabilistic Models with Combining Rules Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo

Hardness of Reconstructing Multivariate Polynomials. Parikshit Gopalan U. Washington Parikshit Gopalan U. Washington Subhash Khot NYU/Gatech Rishi Saket

Alejandro Patrick Kristian Schramowski Kersting · 2020-07-14 · Kristian Kersting -Hybrid AI: DeepMachines that knowwhentheydo not know Nature 466, 531–532 (29 July 2010) The

Transfer in Reinforcement Learning via Markov Logic Networks Lisa Torrey, Jude Shavlik, Sriraam Natarajan, Pavan Kuppili, Trevor Walker University of Wisconsin-Madison,

WindMil 101 - Engineering Analysis Software - Bill Kersting