Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Efficient Random Walk Inference with Knowledge Bases
Ni Lao
Carnegie Mellon University
2012-7-11
1
Committee: William W. Cohen (Chair), Teruko Mitamura, Tom Mitchell,
C. Lee Giles (Pennsylvania State University)
Knowledge itself is power. --Francis Bacon
7/13/2012 2 an algorithm which tries to achieve something,
e.g. IR/IE/QA/MT
KB as an edge-labeled graph
Link Prediction -- a generic relational learning task
Given
a directed edge-labeled graph
a relation type r
a source node s (also called a query)
Find
the set of nodes G, so that r(s,t) for each t in G
7/13/2012 3
Infer New Knowledge
7/13/2012 4
Application
Charlotte
BrontëPainter
Writer
Profession?Carpenter
What is the profession of Charlotte Brontë?
Consider Friends/Family
7/13/2012 5
Charlotte
Brontë
Patrick Brontë
HasFather
Writer
Profession
Consider Behaviors
7/13/2012 6
IsA-1 is the inverse of IsA Wrote-1 is the inverse of Wrote
Charlotte
BrontëWriter
Jane
Eyre
Wrote
Novel
A Tale of
Two Cities
IsA-1
IsA
Charles Dickens
Profession
Wrote-1
Consider Literatures/Publications
7/13/2012 7
Mentioned
Charlotte
BrontëWriter
Mentioned-1
Painter
Profession
Reading Recommendation
7/13/2012 8
a paper stream
these are interesting papers
Application
7/13/2012 9
a paper river
7/13/2012 10
a paper river
new development of an interesting topic
citewrite
7/13/2012 11
a paper river
new papers of my favorite author
readwrite
write
my favorite author
7/13/2012 12
a paper river
social recommendation
scientist who have similar interests
read
readreadwrite
read
Relational Learning Goals
7/13/2012 13
robust
scalable
expressive define features expressing
sequences of relations on graph
combine many such features when making decisions
efficiently discover and calculate such features
Relational learning is a subfield of artificial intelligence, that learns with expressive logical or relational representations.
Why is relational learning computationally challenging?
7/13/2012 14
Exponentially many path types
IsAIsA-1 AthletePlaysSport
Exponentially many path instantiations
s
Our solution: feature metrics
Our solution: sampling
Thesis Outline
15
Ch. 2: motivation
Ch. 3: knowledge base inference (Lao+, EMNLP 2011)
Ch. 4: literature recommendation (Lao & Cohen, DILS 2012)
Ch. 6: relation extraction from parsed text (Lao+, EMNLP 2012)
Applications
Ch. 7: coordinate term extraction
Ch. 2: Path Ranking Algorithm (Lao & Cohen, MLJ 2010)
Ch. 5: efficient RW (Lao & Cohen, KDD 2010)
Ch. 6: distributed computing
Algorithms
Ch. 7: more expressive features (submitted)
Ch. 8: future work
Outline
16
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
Inductive Logic Programming
e.g.
First Order Inductive Learner--FOIL (Quinlan, ECML’93)
7/13/2012 17
HasFather(a,b) ^ Profession(b,y) Profession(a,y)
not robust
High precision Horn clauses
not scalable
experimental comparison later
expressive
Undirected Graphical Models -- combine logics with GM
e.g. Markov Logic Networks (Kok & Domingos, ICML’05) Relational CRFs (Lao+, NIPS’10)
7/13/2012 18
Horn clauses smokes(A) & Friends(A,B)smokes(B)
as CRFs features smokes(A) & Friends(A,B) & !smokes(B)
robust
not scalable
expressive
Random Walk with Restart -- ignore logic
19
P(Charlotte Writer)
P(Charlotte Painter)
7/13/2012
e.g. Tong+, ICDM’06
experimental comparison later
robust
not expressive
scalable
Mentioned
Charlotte
Brontë
Patrick Brontë
HasFather
Writer
Profession
Mentioned-1
Jane
Eyre
Wrote
Novel
A Tale of
Two Cities
IsA-1
IsA
Charles Dickens
Profession
Wrote-1
PainterProfession
Outline
20
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
Relational Classification -- combine logics with RWs
21
e.g. Path Ranking algorithm (Lao & Cohen, MLJ’10)
7/13/2012
P(Charlotte Writer; <HasFather,IsA>)
P(Charlotte Writer; <Mention,Mention-1,IsA>)
…
P(Charlotte Painter; <HasFather,IsA>)
P(Charlotte Painter; <Mention,Mention-1,IsA>)
…
Mentioned
Profession?
Charlotte
Brontë
Patrick Brontë
HasFather
Writer
Profession
Mentioned-1
Profession
Jane
Eyre
Wrote
Novel
A Tale of
Two Cities
IsA-1
IsA
Charles Dickens
Profession
Wrote-1
Painter
Profession
robust
scalable
expressive
Contribution
22
made possible by
a family of easy-to-learn features
fast random walk
distributed computing
Apply relational learning
at scales not possible before
Outline
23
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
Path Ranking Algorithm (PRA)
24
(Lao & Cohen, MLJ 2010)
7/13/2012
( , ) ( ; )B
score s t P s t
e.g. π=<Mention,Mention-1,IsA>
robust
expressive
a weight
( , ) ( ; )B
score s t P s t
Random Walk Calculation
25 7/13/2012
e.g.
π’=<Mention,Mention-1>
r=Profession
( ; ) ( ; ') ( ; )z
P s t P s z P z t r
Dynamic Programing
π’
π’
π’
z3
s z2
z1
t2
t1
rr
rr
later about how to do it x100 more efficiently using sampling scalable
( , ) ( ; )B
score s t P s t
Feature Selection with Labeled Data
given training query set {(si, Gi)}
7/13/2012 26
( ) ( , )i
i j
i j G
hits f I f s t h
( , )1
( )( , )
i
i j
j G
i i j
j
f s t
accuracy f aN f s t
I(): the indicator function N: total number of queries
( , ) ( ; )B
score s t P s t
Estimating θ
for a relation r
generate positive and negative node pairs {(si, ti)}
for each (si, ti) generate (xi, yi)
xi is a vector of RW features of different paths π
yi is a binary label r(si, ti)
estimate θ by L1/L2 regularized (elastic-net) logistic regression
27 7/13/2012
Outline
28
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
Knowledge Base Inference
NELL (Never Ending Language Learner) v165
353 relations
0.7M nodes (concepts)
1.7M edges
7/13/2012 29
(Lao, Mitchell, Cohen, EMNLP 2010)
Example NELL relations
Application
IsAIsA-1 AthletePlaysSport
7/13/2012 30
PRA Uses Broad Coverage Features
AthletePlaysSport(HinesWard, ?)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
p@10 p@100 p@1000
Pre
cisi
on
FOIL
PRA
PRA Has Much Higher Recall and Is Much Faster
7/13/2012 31
Mechanical Turk evaluate new beliefs of 8 functional relations
PRA trains in an hour vs. FOIL trains in a few days
Outline
32
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
Biology Literatures
Databases
Yeast: 0.8M nodes, 3.5M edges
Fly: 0.7M nodes, 16.9M edges
33
Application
7/13/2012
author
71k
paper
50k
gene
5.6k 160k
Relates to
1.6K
Cites
0.3M
year
64 Before
title word
40k 0.5M
journal
1.0k
institute
6k
39k
mesh
descriptors/
qualifiers
29k
1.1M
chemical
14k
0.3M
Recommendation Tasks
Literature Recommendation year, author papers a user is going to read
training data --- 1 user over 20 years
34
(collected from Dr. John Woolford’s computer)
7/13/2012
PRA Combines Dozens of Recommendation Strategies
7/13/2012 35
citewrite
readwrite
write
7/13/2012 36
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
2 3 4 5
MR
R
Maximum Path Length (L)
PRARWRRWR(no training)Community-basedContent-based
Reading Recommendation
Mean Reciprocal
Rank
Outline
37
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
Efficient Random Walks
Exact calculation of random walks results in non-zero probabilities for many internal nodes
38
(Lao & Cohen, KDD 2010)
1 billion
nodesquery
node A few nodes that
we care about
Charlotte
Writer
Painter
Zebra
7/13/2012
Idea: a few random walkers (particles) are enough to distinguish good target
nodes from bad ones
39
1 billion
nodesquery
node A few nodes that
we care about
Writer
Painter
Zebra Charlotte
7/13/2012
details
0.17
0.18
0.19
0.20
0.21
0.22
0.23
1 10 100 1000
MR
R
Speedup
200
1k1k
300
Finger PrintingParticle FilteringFixed TruncationBeam Truncation
Compare Speedup Approaches
40 7/13/2012
10x ~ 100x faster with little loss of quality
exact random walks
exact random walks
Gene Recommendation on Fly Data (N=2k)
Mean Reciprocal
Rank
number of particles
Outline
41
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
Relation Extraction
7/13/2012 42
wrote
She
Mention
dobj
Charlotte
was
nsubjnsubj
Jane Eyre
Charlotte
Bronte
Mention
Jane Eyre
Mention
Coreference Resolution
Entity
Resolution
Freebase
News Corpus
Dependency Trees
Write
Patrick BrontëHasFather
?
Profession
Writer
(21M concepts, 70M edges)
(60M )
Application
7/13/2012 43
Can PRA scale?
Can PRA learn syntactic-semantic rules?
Distributed Computing
7/13/2012 44
Large number of queries e.g. 0.3M/2M persons have known profession Solution: map/reduce to explore path, generate training samples, calculate gradient, and do predictions for each query
Large text graph e.g. 60M documents Solution: each node keeps the Freebase graph in memory relevant sentences are loaded/unloaded for each query
<M, conj, M-1, Profession>
7/13/2012 45
Combine Syntax with Semantics
“McDougall and Simon Phillips collaborated…”
M
Profession
IanMcDougall
?
M
conj
Performer
SimonPhilips
subj
Freebase
Parsed TextIan McDougall
Simon Phillips
collaborated
<M, WORD, CW-1, profession-1, profession>
7/13/2012 46
Combine Text with Semantics
M
BarackObama
CW
President
“he”
Freebase
Parsed Text
leaderProfession
“leader”
other
persons
TW
tokens
words
host
CW
“host”
“president”
e.g.“The president said …”
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Profession Nationality Parent
MR
R Freebase-only
Text-only
Freebase+Text
Text and KB Work Better Together
7/13/2012 47
Tested by existing knowledge in Freebase
Mean Reciprocal
Rank
with closed world assumption
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Profession Nationality Parents
Pre
cisi
on
p@100
p@1k
p@10k
Highly Accurate New Beliefs
7/13/2012 48
manually evaluated new beliefs
Outline
49
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
Coordinate Term Extraction Task
parsed MUC-6 corpus 153k nodes, 748K edges
30 queries given 4 person names as seeds, find other persons
7/13/2012 50
Words/POSs
Tokens
Tokens
(Minkov & Cohen, ECML 2010)
Application
W
BillGates
BillGates
founded
founded
nsubj
W
W
SteveJobs
SteveJobs founded
nsubj
W
vbd
POS
POS
nnp
POS
POS
W: word POS: part of speech nnp: singular proper nouns vbd: verb, past tense nsubj: subject of a verb
Good Paths Are Quite Long
7/13/2012 51
<W-1,nsubj,W,W-1,nsubj-1,W>
find entities with similar behaviors
Words
Tokens
Tokens
W
BillGates
BillGates
founded
founded
nsubj
W
W
SteveJobs
SteveJobs founded
nsubj
W
Good Paths Are Quite Long
7/13/2012 52
0.00
0.05
0.10
0.15
0.20
3 4 5 6
MA
P
Max Path Length
kF
kF+1B
kF+2B
kF+3B
Mean Average
Precision
Forward Search Is Wasteful
7/13/2012 53
s
t
Find paths that connect s and t
Bidirectional Search Is More Efficient
7/13/2012 54
s
t
z
challenge is to calculate P(s→t;π)
Forward vs. Backward RWs
7/13/2012 55
( ; ) ( ; ') ( ; )z
P t s P t z P z s r
( ; ) ( ; ') ( ; )z
P s t P s z P z t r π’
π’
π’
z3
s z2
z1
t2
t1
rr
rr
evaluate P(s→t;π)
for many s
π’
π’
π’
z3
t z2
z1
s2
s1
rr
rr
Forward
Backward
evaluate P(s→t;π)
for many t
Details
Bidirectional Search with RW
7/13/2012 56
1 2 1 2( ; ) ( ; ) ( ; )z
P s t P s z P t z
π2
π2
π2 t
z3
z2
z1π1
π1sπ1
Forward RW Backward RW
0.1
1
10
100
1000
3 4 5 6
Pat
h F
ind
ing
Tim
e (
s)
Max Path Length
2F+1B
3F+1B
3F
4F
2F+2B3F+2B
5F
4F+2B
3F+3B
4F+1B
Bidirectional Search Is Much Faster
7/13/2012 57
Exceed 20Gb memory limit
1000x faster
Need for Lexicalized Paths
7/13/2012 58
P(vbd→t | <POS-1, nsubj-1, W>)
Task: find person entities
W
BillGates
BillGates founded
nsubjW
SteveJobs
SteveJobs founded
nsubj
vbd
POSPOS
Tokens
Words
POSs
Evaluate Lexicalized Paths
7/13/2012 59
Given an example (si, ti)
calculate P(z→ti;π) for many z
π
π
πt
z3
z2
z1
0.0
0.1
0.2
0.3
0.4
L=6
MA
P
RWR(no train)
RWR
FOIL
PRA
PRA+c2
PRA+c3
Person Name Extraction
7/13/2012 60
~1000 correct answers
Mean Average
Precision
Outline
61
previous work
idea
contribution
Motivation
knowledge base inference
literature recommendation
relation extraction from parsed text
Applications
coordinate term extraction
Path Ranking Algorithm (PRA)
efficient RW
distributed computing
Algorithms
more expressive features
the problem
Future Work
Apply knowledge to NLP/IE/IR/CV tasks
7/13/2012 62
arg max ( | , )decision
P decision context KB
wrote
She
Mention
dobj
Charlotte
was
nsubjnsubj
Jane Eyre
Charlotte
Bronte Jane Eyre
Mention
Coreference?
Entity
Resolution
Dependency Trees
Write
IsA FemaleKB
Text
Future Work
Conjunctions of Paths
rules can have tree structures
with source/constant/target nodes as leafs
7/13/2012 63
founded
t3
nsubj
W
dobj
W
Apple
W
SteveJobs
t2 t1
source s constant z target t
Future Work
Conjunctions of Paths
forward PCRW with multiple walkers
7/13/2012 64
founded
t3
nsubj
W
dobj
W
Apple
W
SteveJobs
t2 t1
source s constant z target t
Future Work
Conjunctions of Paths
backward PCRW with multiple walkers
7/13/2012 65
founded
t3
nsubj
W
dobj
W
Apple
W
SteveJobs
t2 t1
source s constant z target t
Contribution
66
Made possible by
a family of easy-to-learn features (3 types)
fast random walk (sampling)
distributed computing
Apply relational learning at scales not possible before.
Leads to new applications!
other work I did at CMU
Relational CRFs (Lao+, NIPS’10)
Question answering (Lao+, NTCIR’08)
Utility based retrieval evaluation (Yang+, SIGIR’07)
7/13/2012 67
7/13/2012 68
Future Work
KB extension new relation types, new concepts
7/13/2012 69
arg max ( | )KB
P corpus KB KB
arg max ( | , )KB
P decisions contexts KB KB
Unsupervised
Supervised
Directed Graphical Models e.g. Probabilistic Relational Models (Getoor+, ICML’01)
7/13/2012 70
model structure restricted to DAG
cannot express features corresponding to chains
Coverage of top k triples
7/13/2012 71
Profession Triples Unique Persons
1k 970
10k 8,726
100k 79,885
Repeatedly Combine Forward and Backward RWs
7/13/2012 72
Forward search
+Backward Search
+Backward Search
W-1,conj_and,W>
<W-1,conj_and,W,
W-1,conj_and,W,
Summary of PRA
7/13/2012 73
Stage Computation
Path Discovery
given {(si, Gi)}, find {f ; acc(f)>=a, hits(f)>=h}
Generate Training Samples
generate {(si, ti)} and {(xi, yi)}
Logistic Regression Training
Prediction
apply model to nodes s in domain(r)
2
1 1 2 2arg max ( ) || || || ||i
i
l
Need for Lexicalized Paths
7/13/2012 74
Bias toward MLB
A prior over the leagues participated by Boston Braves university athletes
Task=AthletePlaysInLeague
( ; )P mlb t
1,;
AthletePlaysForTeamP BostonBraves t
AthletePlaysInLeagure
Need for Lexicalized Paths
7/13/2012 75
Bias toward Google
Companies around Google
Task=CompetesWith
( ; )P google t
; ,P Google t CompetesWith CompetesWith
0.0
0.1
0.2
0.3
0.4
0.5
0.6
L=3
MR
R
RWR(no train)
RWR
PRA
PRA+c1
PRA+c2
Knowledge Base Inference
7/13/2012 76
16 tasks