65
Probabilistic Models of Relational Data Daphne Koller Stanford University Joint work with: Lise Getoor Ming-Fai Wong Eran Segal Avi Pfeffer Pieter Abbeel Nir Friedman Ben Taskar

Probabilistic Models of Relational Data

  • Upload
    finna

  • View
    44

  • Download
    6

Embed Size (px)

DESCRIPTION

Probabilistic Models of Relational Data. Daphne Koller Stanford University Joint work with:. Ben Taskar. Pieter Abbeel. Lise Getoor. Eran Segal. Nir Friedman. Avi Pfeffer. Ming-Fai Wong. Why Relational?. - PowerPoint PPT Presentation

Citation preview

Page 1: Probabilistic Models  of Relational Data

Probabilistic Models of Relational Data

Daphne KollerStanford University

Joint work with:

Lise GetoorMing-Fai Wong

Eran SegalAvi Pfeffer

Pieter AbbeelNir Friedman

Ben Taskar

Page 2: Probabilistic Models  of Relational Data

Why Relational? The real world is composed of objects that

have properties and are related to each other

Natural language is all about objects and how they relate to each other “George got an A in Geography 101”

Page 3: Probabilistic Models  of Relational Data

Attribute-Based Worlds

Smart students get A’s in easy classesSmart_Jane & easy_CS101 GetA_Jane_CS101 Smart_Mike & easy_Geo101 GetA_Mike_Geo101 Smart_Jane & easy_Geo101 GetA_Jane_Geo101 Smart_Rick & easy_CS221 GetA_Rick_C

World = assignment of values to attributes / truth values to propositional symbols

Page 4: Probabilistic Models  of Relational Data

Object-Relational Worlds

World = relational interpretation: Objects in the domain Properties of these objects Relations (links) between objects

x,y(Smart(x) & Easy(y) & Take(x,y) Grade(A,x,y))

Page 5: Probabilistic Models  of Relational Data

Why Probabilities? All universals are false

Smart students get A’s in easy classes True universals are rarely useful

Smart students get either A, B, C, D, or F C student

The actual science of logic is conversant at present only The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful …with things either certain, impossible, or entirely doubtful …

(almost)

James Clerk MaxwellJames Clerk Maxwell

Therefore the true logic for this world is the calculusTherefore the true logic for this world is the calculusof probabilities …of probabilities …

Page 6: Probabilistic Models  of Relational Data

Probable Worlds Probabilistic semantics:

A set of possible worlds Each world associated with a probability

hardsmart

A

hardsmart

B

hardsmart

C

hardweak

A

hardweak

B

hardweak

C

easysmart

A

easysmart

B

easysmart

C

easyweak

A

easyweak

B

easyweak

C

course difficultystudent intell.grade

Page 7: Probabilistic Models  of Relational Data

Representation: Design Axes

Attributes Objects

Cat

egor

ical

Prob

abili

stic

Epis

tem

ic s

tate

World state

Propositional logic Propositional logic CSPsCSPs

First-order logicFirst-order logicRelational databasesRelational databases

Sequences

AutomataAutomataGrammarsGrammars

Bayesian netsBayesian netsMarkov netsMarkov nets

n-gram modelsn-gram modelsHMMsHMMs

Prob. CFGsProb. CFGs

Page 8: Probabilistic Models  of Relational Data

Outline Bayesian Networks

Representation & Semantics Reasoning

Probabilistic Relational Models Collective Classification Undirected discriminative models Collective Classification Revisited PRMs for NLP

Page 9: Probabilistic Models  of Relational Data

Bayesian Networks

nodes = variablesedges = direct influence

Graph structure encodes independence assumptions: Letter conditionally independent of Intelligence given Grade

0% 20% 40% 60% 80% 100%hard,highhard,low

easy,higheasy,low

A B CCPD P(G|D,I)

Letter

Grade

SAT

IntelligenceDifficulty

Page 10: Probabilistic Models  of Relational Data

BN semantics

Compact & natural representation: nodes have k parents 2kn vs. 2n params parameters natural and easy to elicit

conditionalindependenciesin BN structure

+local

probabilitymodels

full jointdistribution

over domain=

LG

S

ID

i)|P(sg)|P(li)d,|P(gP(i)P(d)s)l,g,i,P(d,

Page 11: Probabilistic Models  of Relational Data

Full joint distribution specifies answer to any query: P(variable | evidence about others)

Reasoning using BNs

Letter

Grade

SAT

IntelligenceDifficulty

Letter SAT

Probability theory is nothing butProbability theory is nothing butcommon sense reduced to calculation.common sense reduced to calculation.

Pierre Simon LaplacePierre Simon Laplace

Page 12: Probabilistic Models  of Relational Data

BN Inference BN Inference is NP-hard Structure can use graph structure:

Graph separation conditional independence Do separate inference in parts Results combined over interface.

A

C

B

D

FE Complexity: exponential in largest separator

Structured BNs allow effective inference Exact inference in dense BNs is intractable

Page 13: Probabilistic Models  of Relational Data

Approximate BN Inference Belief propagation is an iterative message

passing algorithm for approximate inference in BNs

Each iteration (until “convergence”): Nodes pass “beliefs” as messages to neighboring

nodes Cons:

Limited theoretical guarantees Might not converge

Pros: Linear time per iteration Works very well in practice, even for dense networks

Page 14: Probabilistic Models  of Relational Data

Outline Bayesian Networks Probabilistic Relational Models

Language & Semantics Web of Influence

Collective Classification Undirected discriminative models Collective Classification Revisited PRMs for NLP

Page 15: Probabilistic Models  of Relational Data

Bayesian Networks: Problem

Bayesian nets use propositional representation Real world has objects, related to each other

Intelligence Difficulty

Grade

Intell_Jane Diffic_CS101

Grade_Jane_CS101

Intell_George Diffic_Geo101

Grade_George_Geo101

Intell_George Diffic_CS101

Grade_George_CS101A C

These “instances” are not independent

Page 16: Probabilistic Models  of Relational Data

Probabilistic Relational Models

Combine advantages of relational logic & BNs: Natural domain modeling: objects, properties,

relations Generalization over a variety of situations Compact, natural probability models

Integrate uncertainty with relational model: Properties of domain entities can depend on

properties of related entities Uncertainty over relational structure of domain

Page 17: Probabilistic Models  of Relational Data

St. Nordaf University

Teac

hes

Teac

hes

In-course

In-course

Registered

In-course

Prof. SmithProf. Jones

George

Jane

Welcome to

CS101

Welcome to

Geo101

Teaching-abilityTeaching-ability

Difficulty

Difficulty Registered

RegisteredGrade

Grade

Grade

Satisfac

Satisfac

Satisfac

Intelligence

Intelligence

Page 18: Probabilistic Models  of Relational Data

Relational Schema Specifies types of objects in domain, attributes of

each type of object & types of relations between objects

Teach

StudentIntelligence

RegistrationGradeSatisfaction

CourseDifficulty

ProfessorTeaching-Ability

In

Take

ClassesClasses

RelationsRelationsAttributesAttributes

Page 19: Probabilistic Models  of Relational Data

Probabilistic Relational Models

Universals: Probabilistic patterns hold for all objects in class Locality: Represent direct probabilistic dependencies

Links define potential interactions

StudentIntelligence

RegGradeSatisfaction

CourseDifficulty

ProfessorTeaching-Ability

[K. & Pfeffer; Poole; Ngo & Haddawy]

0% 20% 40% 60% 80% 100%

hard,high

hard,low

easy,high

easy,lowA B C

Page 20: Probabilistic Models  of Relational Data

Prof. SmithProf. Jones

Welcome to

CS101

Welcome to

Geo101

PRM SemanticsTeaching-abilityTeaching-ability

Difficulty

Difficulty

Grade

Grade

Grade

Satisfac

Satisfac

Satisfac

Intelligence

Intelligence

Instantiated PRM BN variables: attributes of all objects dependencies: determined by links & PRM

George

Jane

Page 21: Probabilistic Models  of Relational Data

Welcome to

CS101

low / high

The Web of Influence

0% 50% 100%0% 50% 100%

Welcome to

Geo101 A

C

low high

0% 50% 100%

easy / hard

Page 22: Probabilistic Models  of Relational Data

Outline Bayesian Networks Probabilistic Relational Models Collective Classification & Clustering

Learning models from data Collective classification of webpages

Undirected discriminative models Collective Classification Revisited PRMs for NLP

Page 23: Probabilistic Models  of Relational Data

Learning PRMs

LearnerLearnerRelationalDatabase

Course Student

Reg

D

Expert knowledge

[Friedman, Getoor, K., Pfeffer]

Page 24: Probabilistic Models  of Relational Data

Learning PRMs Parameter estimation:

Probabilistic model with shared parameters Grades for all students share same model

Can use standard techniques for max-likelihood or Bayesian parameter estimation

Structure learning: Define scoring function over structures Use combinatorial search to find high-scoring

structure

).,.*,.(#).,.,.(#

).,.|.(ˆ

loDiffCoursehiIntellStudentGradeRegloDiffCoursehiIntellStudentAGradeReg

loDiffCoursehiIntellStudentAGradeRegP

Page 25: Probabilistic Models  of Relational Data

Web KBTom MitchellProfessor

WebKBProject

Sean SlatteryStudent

Advisor-of

Project-of

Member

[Craven et al.]

Page 26: Probabilistic Models  of Relational Data

Web Classification Experiments

WebKB dataset Four CS department websites Bag of words on each page Links between pages Anchor text for links

Experimental setup Trained on three universities Tested on fourth Repeated for all four combinations

Page 27: Probabilistic Models  of Relational Data

Professordepartment

extractinformationcomputersciencemachinelearning

Standard Classification

Categories:facultycourseprojectstudentother

00.050.1

0.150.2

0.250.3

0.35

words only

Naïve Bayes

Page

...Category

Word1 WordN

Page 28: Probabilistic Models  of Relational Data

Exploiting Links

... LinkWordN

workingwithTom Mitchell …

00.050.1

0.150.2

0.250.3

0.35

words only link words

Page

...Category

Word1 WordN

Page 29: Probabilistic Models  of Relational Data

Collective Classification

...

PageCategory

Word1 WordN

From-

...

PageCategory

Word1 WordN

LinkExists

To-

[Getoor, Segal, Taskar, Koller]Approx. inference: belief propagation 0

0.050.1

0.150.2

0.250.3

0.35

words only link words collective

Classify all pages collectively,

maximizing the joint label probability

Page 30: Probabilistic Models  of Relational Data

P(Registration.Grade | Course.Difficulty, Student.Intelligence)

0% 20% 40% 60% 80% 100%hard,highhard,low

easy,higheasy,low

Learning w. Missing Data: EM

0% 20% 40% 60% 80% 100%hard,highhard,low

easy,higheasy,low

0% 20% 40% 60% 80% 100%hard,highhard,low

easy,higheasy,low

0% 20% 40% 60% 80% 100%hard,highhard,low

easy,higheasy,low

0% 20% 40% 60% 80% 100%hard,highhard,low

easy,higheasy,low

low / higheasy / hard

A B C

CoursesStudents

[Dempster et al. 77]

Page 31: Probabilistic Models  of Relational Data

Discovering Hidden Types

Internet Movie Databasehttp://www.imdb.com

Page 32: Probabilistic Models  of Relational Data

Actor

Director

MovieGenres Rating

Year #VotesMPAA Rating

Discovering Hidden Types

Type Type

Type

[Taskar, Segal, Koller]

Page 33: Probabilistic Models  of Relational Data

Directors

Steven SpielbergTim BurtonTony ScottJames CameronJohn McTiernanJoel Schumacher

Alfred HitchcockStanley KubrickDavid LeanMilos FormanTerry GilliamFrancis Coppola

Actors

Anthony HopkinsRobert De NiroTommy Lee JonesHarvey KeitelMorgan FreemanGary Oldman

Sylvester StalloneBruce WillisHarrison FordSteven SeagalKurt RussellKevin CostnerJean-Claude Van DammeArnold Schwarzenegger

MoviesWizard of OzCinderellaSound of MusicThe Love BugPollyannaThe Parent TrapMary PoppinsSwiss Family Robinson

Terminator 2BatmanBatman ForeverGoldenEyeStarship TroopersMission: Impossible Hunt for Red October

Discovering Hidden Types

Page 34: Probabilistic Models  of Relational Data

Outline Bayesian Networks Probabilistic Relational Models Collective Classification & Clustering Undirected Discriminative Models

Markov Networks Relational Markov Networks

Collective Classification Revisited PRMs for NLP

Page 35: Probabilistic Models  of Relational Data

Directed Models: Limitations

Acyclicity constraint limits expressive power:

Two objects linked to by a student probably not both professors

Allow arbitrary patterns over sets of objects & links

Acyclicity forces modeling of all potential links:

Network size O(N2) Inference is quadratic

Generative training: Train to fit all of data, not

to maximize accuracy

Influence flows over existing links, exploiting link graph sparsity

Network size O(N)

Allow discriminative training: Max P (labels | observations)

Solution: Undirected Models

[Lafferty, McCallum, Pereira]

Page 36: Probabilistic Models  of Relational Data

Markov Networks

Graph structure encodes independence assumptions: Chris conditionally independent of Eve given Alice & Dave

Chris Dave

EveAlice

A)(E,E)(D,D)(C,C)B,(A,E)D,C,B,P(A, Z1

Betty

0 0.5 1 1.5 2TTTTTFTFTTFFFTTFTFFFTFFF

ABC Compatibility (A,B,C)

Page 37: Probabilistic Models  of Relational Data

Relational Markov Networks

Universals: Probabilistic patterns hold for all groups of objects

Locality: Represent local probabilistic dependencies Sets of links give us possible interactions

Study Group

Student2

Reg2GradeIntelligence

Course

RegGrade

Student

Difficulty

Intelligence

[Taskar, Abbeel, Koller ‘02]

0 0.5 1 1.5 2AAABACBABBBCCACBCC

Template potential

Page 38: Probabilistic Models  of Relational Data

RMN SemanticsInstantiated RMN MN variables: attributes of all objects dependencies: determined by links & RMN

George

Jane

Welcome to

CS101

Welcome to

Geo101

Difficulty

Difficulty

Jill

Geo Study Group

CS Study Group

Intelligence

Intelligence

Intelligence

Grade

Grade

Grade

Grade

Page 39: Probabilistic Models  of Relational Data

Outline Bayesian Networks Probabilistic Relational Models Collective Classification & Clustering Undirected Discriminative Models Collective Classification Revisited

Discriminative training of RMNs Webpage classification Link prediction

PRMs for NLP

Page 40: Probabilistic Models  of Relational Data

Learning RMNs Parameter estimation is not closed form

Convex problem unique global maximum

(Reg1.Grade,Reg2.Grade)

0 0.5 1 1.5 2AAABACBABBBCCACBCC

P(Grades,Intelligence|Difficulty)

0 0.5 1 1.5 2AAABACBABBBCCACBCC

0 0.5 1 1.5 2AAABACBABBBCCACBCC

Difficulty

Difficulty

Intelligence

Intelligence

Intelligence

Grade

Grade

Grade

Grade

low / higheasy / hard ABC L = log

)|,(

),(#

DifficAGradeAGradeP

AGradeAGradeLAA

Intelligence

Intelligence

Intelligence

Grade

Grade

Grade

Grade

Intelligence

Intelligence

Intelligence

Grade

Grade

Grade

Grade

Maximize

Page 41: Probabilistic Models  of Relational Data

Flat Models

...

PageCategory

Word1 WordN LinkWordN...

P(Category|Words)

Logistic Regressio

n

0

0.05

0.1

0.15

0.2

0.25

0.3

Naïve Bayes Logistic SVM

Page 42: Probabilistic Models  of Relational Data

Exploiting Links

...

PageCategory

Word1 WordN

From-

Link ...

PageCategory

Word1 WordN

To-

0

0.05

0.1

0.15

0.2

0.25

PRM Logistic RMN-link

42.1% relative reduction in error relative to generative approach

Page 43: Probabilistic Models  of Relational Data

More Complex Structure

CWn

W1Faculty

S

Students

S

Courses

Page 44: Probabilistic Models  of Relational Data

Collective Classification: Results

00.020.040.060.080.1

0.120.140.160.18

Logistic Links Section Link+Section

35.4% relative reduction in error relative to strong flat approach

Page 45: Probabilistic Models  of Relational Data

Scalability WebKB data set size

1300 entities 180K attributes 5800 links

Network size / school: Directed model

200,000 variables 360,000 edges

Undirected model 40,000 variables 44,000 edges

Difference in training time decreases substantially when

some training data is unobserved want to model with hidden variables

3 sec 180 sec

20 minutes 15-20 sec

Directedmodels

Undirectedmodels

Training Classification

Page 46: Probabilistic Models  of Relational Data

Predicting Relationships

Even more interesting are the relationships between objects e.g., verbs are almost always relationships

Tom MitchellProfessor

WebKBProject

Sean SlatteryStudent

Advisor-of

Member

Member

Page 47: Probabilistic Models  of Relational Data

Rel

Flat Model

...PageWord1 WordN

From- ...

PageWord1 WordN

To-

Type

...LinkWord1 LinkWordN

NONEadvisor

instructor

TAmemberproject-

of

Page 48: Probabilistic Models  of Relational Data

Flat Model

...

......

...

...

...

Page 49: Probabilistic Models  of Relational Data

Collective Classification: Links

Rel

...

Page

Word1 WordN

From-

...

Page

Word1 WordN

To-

Type

...LinkWord1 LinkWordN

Category Category

Page 50: Probabilistic Models  of Relational Data

Link Model

...

......

...

...

...

Page 51: Probabilistic Models  of Relational Data

Triad Model

Professor Student

Group

Advisor

MemberMember

Page 52: Probabilistic Models  of Relational Data

Triad Model

Professor Student

Course

Advisor

TAInstructor

Page 53: Probabilistic Models  of Relational Data

Triad Model

Page 54: Probabilistic Models  of Relational Data

WebKB++ Four new department web sites:

Berkeley, CMU, MIT, Stanford Labeled page type (8 types):

faculty, student, research scientist, staff, research group, research project, course, organization

Labeled hyperlinks and virtual links (6 types): advisor, instructor, TA, member, project-of, NONE

Data set size: 11K pages 110K links 2million words

Page 55: Probabilistic Models  of Relational Data

Link Prediction: Results

Error measured over links predicted to be present

Link presence cutoff is at precision/recall break-even point (30% for all models) 0

5

10

15

20

25

30

Flat Labels Triad

...

... ...72.9% relative reduction in error relative to strong flat approach

Page 56: Probabilistic Models  of Relational Data

Summary PRMs inherit key advantages of

probabilistic graphical models: Coherent probabilistic semantics Exploit structure of local interactions

Relational models inherently more expressive

“Web of influence”: use all available information to reach powerful conclusions

Exploit both relational information and power of probabilistic reasoning

Page 57: Probabilistic Models  of Relational Data

Outline Bayesian Networks Probabilistic Relational Models Collective Classification & Clustering Undirected Discriminative Models Collective Classification Revisited PRMs for NLP

Word-Sense Disambiguation Relation Extraction Natural Language Understanding (?)

* An outsider’s perspective

or “Why Should I Care?”*

Page 58: Probabilistic Models  of Relational Data

Her advisor gave her feedback about the draft.

Word Sense Disambiguation

Neighboring words alone may not provide enough information to disambiguate

We can gain insight by considering compatibility between senses of related words

financialacademic

physicalfigurative

electricalcriticism

windpaper

Page 59: Probabilistic Models  of Relational Data

Collective Disambiguation

Objects: words in text Attributes: sense, gender, number, pos, … Links:

Grammatical relations (subject-object, modifier,…) Close semantic relations (is-a, cause-of, …) Same word in different sentences (one-sense-per-discourse)

Compatibility parameters: Learned from tagged data Based on prior knowledge (e.g., WordNet, FrameNet)

Her advisor gave her feedback about the draft.

financialacademic

physicalfigurative

electricalcriticism

windpaper

Can we infer grammatical structure and disambiguate word senses

simultaneously rather than sequentially?

Can we integrate inter-word relationships directly into our

probabilistic model?

Page 60: Probabilistic Models  of Relational Data

Relation Extraction

Announcement

MillerJacksonMade

Candidate

Concerns

DepartsCEO

Of

ACME’s board of directors began a search for a new CEO after the departure of current CEO, James Jackson, following allegations of creative accounting practices at ACME. [6/01] … In an attempt to improve the company’s image, ACME is considering former judge Mary Miller for the job. [7/01] … As her first act in her new position, Miller announced that ACME will be doing a stock buyback. [9/01] …

Hired??

Page 61: Probabilistic Models  of Relational Data

Professor Sarah met Jane.She explained the hole in her proof.

Understanding Language

Proof:Theorem: P=NP

N=1

Most likely interpretation:

Student Jane Professor Sarah

Page 62: Probabilistic Models  of Relational Data

Resolving Ambiguity

Professors often meet with students Jane is probably a student

Professors like to explain “She” is probably Prof. Sarah

Attribute valuesLink typesObject identity

[Goldman & Charniak, Pasula & Russell]

Professor Sarah met Jane.She explained the hole in her proof.

Probabilistic reasoning about objects, their attributes, and the relationships between them

Page 63: Probabilistic Models  of Relational Data

Acquiring Semantic Models Statistical NLP reveals patterns:

Standard models learn patterns at word level But word-patterns are only implicit surrogates for

underlying semantic patterns “Teacher” objects tend to participate in certain relationships Can use this pattern for objects not explicitly labeled as a

teacher

teacher

betrainhire

pay

fireserenade

24%3%3%1.5%

1.4%0.3%

Page 64: Probabilistic Models  of Relational Data

Competing Approaches

Logical

Statistical

SemanticUnderstanding

Scaling Up(via learning)

PRMs

Noise &AmbiguityDesiderata:

Complementary Approaches

Page 65: Probabilistic Models  of Relational Data

Statistics: from Words to Semantics Represent statistical patterns at semantic

level What types of objects participate in what types

of relationships

Learn statistical models of semantics from text

Reason using the models to obtain global semantic understanding of the text

Georgia O’KeefeLadder to the Moon