Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Victoria Lin, Richard Socher, Caiming Xiong{xilin, rsocher, cxiong}@salesforce.com

EMNLP 2018

Multi-Hop Knowledge Graph Reasoning with Reward Shaping

mailto:[email protected]?subject=

Question Answering System

Knowledge Graph

Text

ImagesWhich directors has Tom Hanks collaborated with?

�2

Question Answering System

Knowledge Graph

collaboratorHanks ?

Which directors has Tom Hanks collaborated with?

�3

collaboratorHanks ?Knowledge Graph

Structured Query AnsweringTopic relationTopic entity

�4

collaborator?

Tom Hanks

Structured Query Answering

The Post

Meryl Streep

United States

Californialocate_in

live_inproduced_in

directorborn_in

cast_in

Steven Spielberg

cast_in

Phyllida Lloyd

collaborator

Incompletecollaborator

Hanks ?

collaborator?

�5

res eo

Knowledge Graph Embeddings

DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers et al. 2018).

ConvE

Rn

Conv

Linear

·MRRConvE 0.957 (max = 1)

Tab 1. ConvE query answering performance on the UMLS benchmark dataset (Kok and Domingos 2007)

Highly accurate & Efficient

Lack interpretabilit

Tom Hanks

Steven Spielberg

collaborator

�6

Why Spielberg is a collaborator of Hanks ?

Multi-Hop Reasoning Models

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

born_in

cast_incast_in

collaborator?

collaborator?

Phyllida Lloyd

director

collaborator

�7

Reasoning over discrete

structures


Sequential decision making

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

born_in

cast_incast_in

collaborator?

collaborator?

Phyllida Lloyd

director

collaborator

�8

Multi-Hop Reasoning ModelsTopic entity

Tom Hanks

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

born_in

cast_incast_in

collaborator?

Phyllida Lloyd

director

collaborator

collaborator?

�9

Tom Hanks


Hop!

The Post

cast_in

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

directorborn_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

collaborator

collaborator?

�10

director


Hop!

Stephen Spielberg

director

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_inTom Hanks

The Post

cast_in

collaborator

collaborator?

�11


Answer

<END>

Stephen Spielberg

director

Tom Hanks

The Post

cast_in

director The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_in

collaborator

collaborator?

�12


<END>

MINERVA (Das et al. 2018)

Stephen Spielberg

director

Tom Hanks

The Post

cast_in

director The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_in

collaborator

collaborator?

�13

Reinforcement Learning


Interpretable

director The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_in

collaborator

collaborator?

MRRConvE 0.957

RL 0.825

Significant performance gap

Tab 2. ConvE and RL (MINERVA) query answering performance on the UMLS benchmark dataset (Kok and Domingos 2007)

�14

Multi-Hop Reasoning Models: Ideal Case

Interpretable

director The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_in

collaborator

collaborator?

MRRConvE 0.957

RL 0.825

Significant performance gap

Tab 2. ConvE and RL (MINERVA) query answering performance on the UMLS benchmark dataset (Kok and Domingos 2007)

�15

ChallengesIncompleteness

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

born_in

cast_incast_in

collaborator?

collaborator?

Phyllida Lloyd

director

collaborator

Training example

�16

ChallengesIncompleteness

director The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_in

collaborator?

collaborator

No reward

Training example

Overfit to the observed answers

�17

ChallengesPath Diversity

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

born_in

cast_incast_in

collaborator?

collaborator?

Phyllida Lloyd

director

collaborator

①

①① ②②

②③

③

�18

ChallengesPath Diversity

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

born_in

cast_incast_in

collaborator?

collaborator?

Phyllida Lloyd

director

collaborator

①① ②②

②③

③①

Overfit to the spurious paths

�19

False positive (spurious) paths① ②

Proposed Solutions

�20

Policy Gradient

Incompleteness Path Diversity

Reward Shaping Action Dropout

Reinforcement Learning FrameworkAction RewardStateEnvironment

esrq

Transition

�21

State

Reinforcement Learning Framework

es

State

esrq

Action RewardEnvironment TransitionEnvironment

�22

Reinforcement Learning FrameworkRewardEnvironment State Action

es

esrq

e1

eN

r1

rN

⋮⋮

Environment TransitionEnvironment

e2r2

At

�23


es

esrq

⋮⋮

RewardEnvironment State Action Transition

e1

eN

r1

rN

e2r2

At

�24


es

esrq

es e1r1

RewardEnvironment State TransitionAction

�25


es

esrq

es e1r1


r2 e2

�26


es

esrq

es e1r1


et

…

rt eT

…

rT

�27

Max # steps


es

esrq

es e1r1


et

…

rt eT

…

rT

�28

Predicted answer


es

esrq

es e1r1


et

…

rt eT

…

rT

Transition

Rb(sT) = 1{(es, rq, eT) ∈ G}

�29

Predicted answer


es

esrq

es e1r1


et

…

rt eT

…

rT

Transition

Rb(sT) = 1{(es, rq, eT) ∈ G}

Policy gradient Learn which

action to choose given a state

�30

Policy Gradient

⋮⋮

eses e1r1 et

…

rt

πΘ(at |st)Policy function

πΘ(a1t |st)

πΘ(a2t |st)

πΘ(aNtt |st) ai

t = (rit , ei

t)

⋮

Probability of choosing an action given the current state

e1t

e2t

eNtt

r1t

r2t

rNtt

�31

Policy GradientπΘ(at |st)Policy function

eses e1r1 et

…

rtr0

LSTM LSTM LSTM

htrq et

…

Linear

⋮⋮

RELU

Linear

⋮

πΘ(at |st)

πΘ(a1t |st)

πΘ(a2t |st)

πΘ(aNtt |st)

current state action space

Design choice: include search history in the state representation

�32

a1t

a2t

aNtt


Policy GradientπΘ(at |st)Policy function

eses e1r1 et

…

rtr0

LSTM LSTM LSTM

htrq et

…

Linear

⋮⋮

RELU

Linear

⋮

πΘ(at |st)

πΘ(a1t |st)

πΘ(a2t |st)

πΘ(aNtt |st)

current state action space

Design choice: include search history in the state representation

�33

a1t

a2t

aNtt

Our model extensions are applicable to any parameterization of


πΘ

REINFORCE Training

+1

Tom Hanks

California

United States Steven Spielberg

Steven Spielberg

The Post

The Post

United StatesSteven

Spielberg

The Post Meryl Streep Phyllida Lloyd

⋮

⋮ ⋮

cast_in

cast_in

cast_in

born_in

cast_in collaborator

produced_in live_in

director

locate_in live_in

⋮

0

0

0

�34

collaboratorTom Hanks ?

Sample according to πΘ(at |st)

Update by maximizing the expected reward

Θ

REINFORCE (Williams, 1992)

REINFORCE Training

+1

Tom Hanks

California


Steven Spielberg

The Post

The Post

United StatesSteven

Spielberg


⋮

⋮ ⋮

cast_in

cast_in

cast_in

born_in


produced_in live_in

director

locate_in live_in

⋮

0

0

0

False-negative & nearly-correct entities ➡ true- negatives

�35

Reward Shaping

z

Tom Hanks

Steven Spielbergcollaborator

fθ

Topic relation

Topic entity

�36

Unobserved facts Soft correctness

Entity returned by search

Reward Shaping

z

Tom Hanks

Steven Spielbergcollaborator

fθ

Unobserved facts

Topic relation

Topic entity

Soft correctnessGeneral

�37

Entity returned by search

collaborator

Reward Shaping

+1

Tom Hanks

California


Steven Spielberg

The Post

The Post

United StatesSteven

Spielberg


⋮

⋮ ⋮

cast_in

cast_in

cast_in

born_in


produced_in live_in

director

locate_in live_in

⋮

Line

·ConvE

Pre-trained

+

�38

Steven Spielberg

Tom Hanks

collaboratorTom Hanks ?

Value range: (0, 1)

Action DropoutIntuition: avoid sticking to past actions that had received rewards

�39

Action Dropout

⋮

e1t

e2t

eNtt

r1t

r2t

rNtt

⋮

eses e1r1 et

…

rt

πΘ(ait |st)

⋮

0.08

0.6

⋮

0.01

More likely to be chosen

�40

Intuition: avoid sticking to past actions that had received rewards

Action Dropout

⋮

e1t

e2t

eNtt

r1t

r2t

rNtt

⋮

eses e1r1 et

…

rt

⋮

π̃Θ(at |st) ∝ πΘ(at |st) ⋅ m + ϵ

mi ∼ Bernoulli(1 − α), i = 1,…N

π̃Θ(ait |st)

0.9

0

⋮

0


0.08

πΘ(ait |st)

0.01

Randomly offset the sampling probabilities w/ rate and renormalize α

�41

0.6

⋮

Randomly offset the sampling probabilities w/ rate and renormalize

Action Dropout

⋮

e1t

e2t

eNtt

r1t

r2t

rNtt

⋮

eses e1r1 et

…

rt

⋮



π̃Θ(ait |st)

0.9

0

⋮

0


Force exploration

0.08

πΘ(ait |st)

0.01

α

�42

Randomly offset the sampling probabilities w/ rate and renormalize

Action Dropout

⋮

e1t

e2t

eNtt

r1t

r2t

rNtt

⋮

eses e1r1 et

…

rt

⋮



π̃Θ(ait |st)

0.9

0

⋮

0


0.08

πΘ(ait |st)

0.01

α

�43

Force exploration

Up to ⨉8 # path traversed

Experiment Setup

Name # Ent. # Rel. # Fact # Degree Avg

# Degree Median

Kinship 104 25 8,544 85.15 82

UMLS 135 46 5,216 38.63 28

FB15k-237 14,505 237 272,115 19.74 14

WN18RR 40,945 11 86,835 2.19 2

NELL-995 75,492 200 154,213 4.07 1

KG Benchmarks

Decreasing connectivity

Evaluation Protocol: MRR (Mean Reciprocal Rank)�44

Ablation Studies

20

35

50

65

80

Kinship UMLS FB15k-237 WN18RR NELL-995

PG +RS +AD Ours

Fig 2. Dev set MRR (x100) comparison�45

Ablation Studies

20

35

50

65

80


PG +RS +AD Ours


Ablation Studies

20

35

50

65

80


PG +RS +AD Ours


Ablation Studies

20

35

50

65

80


PG +RS +AD Ours

Fig 2. Dev set MRR (x100) comparison

Esp. helpful on dense graphs

�48

Main Results

20

40

60

80

100


NeuralLP NTP-λ MINERVA Ours DistMult ComplEx ConvE

Fig 3. Test set MRR (x100) compared to SOTA multi-hop reasoning and embedding-based approaches �49

Main Results

20

40

60

80

100



improved SOTA multi-hop reasoning performance


Main Results

20

40

60

80

100



Consistently matched SOTA knowledge graph embedding performance


FB15k-237 (Toutanova and Chen 2016)

profession

Interpretable ResultsLaura Carmichael ?

�52


profession


�53


profession


�54

Code: https://github.com/salesforce/MultiHopKG

Thank you!

Future directions • Learn better reward shaping functions • Investigate similar techniques for other RL

paradigms (e.g. Q-learning) • Extend to more complicated structured queries (e.g.

more than one topic entities) • Extend to natural language QA

https://github.com/salesforce/MultiHopKG

BKI - Error AnalysisFB15k-237 NELL-995UMLS

Ours

ConvE

Fig 4. Dev set top-1 prediction error overlap of ConvE, Ours and Ours-RS. The absolute error rate of Ours is shown.

Ours-RS

8.9% 65.9% 25.6%

BK II - ChallengesIncompleteness

director The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_in

collaborator?

collaborator

Fig 1. % of false negative hit in the first 20 epochs of RL training on the UMLS KG benchmark (Kok and Domingos 2007)

≈ 30% false negative feedback

�57

�58

Questions for Future Research

1. One natural question to ask is why a perceivable performance gap exists between the embedding-based (EB) model and the RL approach using the same EB model as the reward shaping module (slide 51), especially on FB15k-237 and NELL-995, the two larger and sparser KGs. Since the RL model has full access to the EB model, why could it still lose information? A possible explanation is that for examples where the RL models make mistakes, the topic entity and the target answer are not connected within the specified # hops. Yet our sanity check disproved this — for all examples where only the RL model makes mistakes the topic entity and the target answer were connected by at least one path. (We did not check the quality of these paths.) Conjecture: It is possible that the performance loss comes from the difficulty of RL optimization as it operates over a more complex model space. The RL model + training procedure have much more hyper-parameters than the EB models.

2. In our experiments, very large action dropout rates (0.9 and 0.95) yield good performance on the dense KGs (Kinship and UMLS), but the same strategy does not work for sparser KGs. We observed significant performance drop for FB15k-237, WN18RR and NELL-995 when using very large action dropout rates. And for WN18NN and NELL-995, action dropout rate > 0.1 hurts performance. It is unclear why REINFORCE training on the denser KGs can tolerate a larger shift from the actual policy during path sampling. Conjecture: It seems that the shape of the original policy function ought to be preserved to some degree during training. For Kinship and UMLS, the average node degrees are 85 and 39. In this case on average >= 2 edges remains on when we randomly turned off 95% of the edges. Since other KGs have smaller average node degrees, using a large action dropout rate is equivalent to doing random exploration most of the time.

3. Does EB models define the cap performance in the one-hop KG query answering set up? Could the tasks of path finding and learning KG embeddings be joined together in a way s.t. they can improve each other?

4. Our approach can be viewed as a way to explain pre-trained EB models. Are there better ways to do it?

�59

Acknowledgement upon slides release - I

These slides benefit tremendously from the constructive feedback offered by Salesforce Research team members, including Caiming Xiong, Richard Socher, Yingbo Zhou, Alex Trott, Jin Qu, Lily Hu, Vena Li, Kazuma Hashimoto, Stephan Zheng, Jason Wu (intern) and others.

�60

Acknowledgement upon slides release - II

I am grateful to Prof. Michael Ernst and his co-authors who released the slides of all of their paper presentations.

Especially, the and the highlighters were

borrowed from slides 1 and slides 2.

Victoria Lin Nov. 4, 2018

Bradley Hand Font Baloon

https://homes.cs.washington.edu/~mernst/pubs/

https://homes.cs.washington.edu/~mernst/pubs/verify-layout-pldi2018-slides.key

https://homes.cs.washington.edu/~mernst/pubs/generalized-synthesis-icse2018-slides.key

Documents

Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers