60
Victoria Lin, Richard Socher, Caiming Xiong {xilin, rsocher, cxiong}@salesforce.com EMNLP 2018 Multi-Hop Knowledge Graph Reasoning with Reward Shaping

Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Victoria Lin, Richard Socher, Caiming Xiong{xilin, rsocher, cxiong}@salesforce.com

EMNLP 2018

Multi-Hop Knowledge Graph Reasoning with Reward Shaping

Page 2: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Question Answering System

Knowledge Graph

Text

ImagesWhich directors has Tom Hanks collaborated with?

�2

Page 3: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Question Answering System

Knowledge Graph

collaboratorHanks ?

Which directors has Tom Hanks collaborated with?

�3

Page 4: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

collaboratorHanks ?Knowledge Graph

Structured Query AnsweringTopic relationTopic entity

�4

Page 5: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

collaborator?

Tom Hanks

Structured Query Answering

The Post

Meryl Streep

United States

Californialocate_in

live_inproduced_in

directorborn_in

cast_in

Steven Spielberg

cast_in

Phyllida Lloyd

collaborator

Incompletecollaborator

Hanks ?

collaborator?

�5

Page 6: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

res eo

Knowledge Graph Embeddings

DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers et al. 2018).

ConvE

Rn

Conv

Linear

·MRRConvE 0.957 (max = 1)

Tab 1. ConvE query answering performance on the UMLS benchmark dataset (Kok and Domingos 2007)

Highly accurate & Efficient

Lack interpretabilit

Tom Hanks

Steven Spielberg

collaborator

�6

Why Spielberg is a collaborator of Hanks ?

Page 7: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Multi-Hop Reasoning Models

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

born_in

cast_incast_in

collaborator?

collaborator?

Phyllida Lloyd

director

collaborator

�7

Reasoning over discrete

structures

Page 8: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Multi-Hop Reasoning Models

Sequential decision making

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

born_in

cast_incast_in

collaborator?

collaborator?

Phyllida Lloyd

director

collaborator

�8

Page 9: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Multi-Hop Reasoning ModelsTopic entity

Tom Hanks

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

born_in

cast_incast_in

collaborator?

Phyllida Lloyd

director

collaborator

collaborator?

�9

Page 10: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Tom Hanks

Multi-Hop Reasoning Models

Hop!

The Post

cast_in

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

directorborn_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

collaborator

collaborator?

�10

Page 11: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

director

Multi-Hop Reasoning Models

Hop!

Stephen Spielberg

director

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_inTom Hanks

The Post

cast_in

collaborator

collaborator?

�11

Page 12: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Multi-Hop Reasoning Models

Answer

<END>

Stephen Spielberg

director

Tom Hanks

The Post

cast_in

director The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_in

collaborator

collaborator?

�12

Page 13: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Multi-Hop Reasoning Models

<END>

MINERVA (Das et al. 2018)

Stephen Spielberg

director

Tom Hanks

The Post

cast_in

director The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_in

collaborator

collaborator?

�13

Reinforcement Learning

Page 14: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Multi-Hop Reasoning Models

Interpretable

director The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_in

collaborator

collaborator?

MRRConvE 0.957

RL 0.825

Significant performance gap

Tab 2. ConvE and RL (MINERVA) query answering performance on the UMLS benchmark dataset (Kok and Domingos 2007)

�14

Page 15: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Multi-Hop Reasoning Models: Ideal Case

Interpretable

director The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_in

collaborator

collaborator?

MRRConvE 0.957

RL 0.825

Significant performance gap

Tab 2. ConvE and RL (MINERVA) query answering performance on the UMLS benchmark dataset (Kok and Domingos 2007)

�15

Page 16: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

ChallengesIncompleteness

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

born_in

cast_incast_in

collaborator?

collaborator?

Phyllida Lloyd

director

collaborator

Training example

�16

Page 17: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

ChallengesIncompleteness

director The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_in

collaborator?

collaborator

No reward

Training example

Overfit to the observed answers

�17

Page 18: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

ChallengesPath Diversity

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

born_in

cast_incast_in

collaborator?

collaborator?

Phyllida Lloyd

director

collaborator

①① ②②

②③

�18

Page 19: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

ChallengesPath Diversity

The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

Californialocate_in

live_in produced_in

born_in

cast_incast_in

collaborator?

collaborator?

Phyllida Lloyd

director

collaborator

①① ②②

②③

③①

Overfit to the spurious paths

�19

False positive (spurious) paths① ②

Page 20: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Proposed Solutions

�20

Policy Gradient

Incompleteness Path Diversity

Reward Shaping Action Dropout

Page 21: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Reinforcement Learning FrameworkAction RewardStateEnvironment

esrq

Transition

�21

Page 22: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

State

Reinforcement Learning Framework

es

State

esrq

Action RewardEnvironment TransitionEnvironment

�22

Page 23: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Reinforcement Learning FrameworkRewardEnvironment State Action

es

esrq

e1

eN

r1

rN

⋮⋮

Environment TransitionEnvironment

e2r2

At

�23

Page 24: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Reinforcement Learning Framework

es

esrq

⋮⋮

RewardEnvironment State Action Transition

e1

eN

r1

rN

e2r2

At

�24

Page 25: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Reinforcement Learning Framework

es

esrq

es e1r1

RewardEnvironment State TransitionAction

�25

Page 26: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Reinforcement Learning Framework

es

esrq

es e1r1

RewardEnvironment State TransitionAction

r2 e2

�26

Page 27: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Reinforcement Learning Framework

es

esrq

es e1r1

RewardEnvironment State TransitionAction

et

rt eT

rT

�27

Max # steps

Page 28: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Reinforcement Learning Framework

es

esrq

es e1r1

RewardEnvironment State TransitionAction

et

rt eT

rT

�28

Predicted answer

Page 29: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Reinforcement Learning Framework

es

esrq

es e1r1

RewardEnvironment State TransitionAction

et

rt eT

rT

Transition

Rb(sT) = 1{(es, rq, eT) ∈ G}

�29

Predicted answer

Page 30: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Reinforcement Learning Framework

es

esrq

es e1r1

RewardEnvironment State TransitionAction

et

rt eT

rT

Transition

Rb(sT) = 1{(es, rq, eT) ∈ G}

Policy gradient Learn which

action to choose given a state

�30

Page 31: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Policy Gradient

⋮⋮

eses e1r1 et

rt

πΘ(at |st)Policy function

πΘ(a1t |st)

πΘ(a2t |st)

πΘ(aNtt |st) ai

t = (rit , ei

t)

Probability of choosing an action given the current state

e1t

e2t

eNtt

r1t

r2t

rNtt

�31

Page 32: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Policy GradientπΘ(at |st)Policy function

eses e1r1 et

rtr0

LSTM LSTM LSTM

htrq et

Linear

⋮⋮

RELU

Linear

πΘ(at |st)

πΘ(a1t |st)

πΘ(a2t |st)

πΘ(aNtt |st)

current state action space

Design choice: include search history in the state representation

�32

a1t

a2t

aNtt

MINERVA (Das et al. 2018)

Page 33: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Policy GradientπΘ(at |st)Policy function

eses e1r1 et

rtr0

LSTM LSTM LSTM

htrq et

Linear

⋮⋮

RELU

Linear

πΘ(at |st)

πΘ(a1t |st)

πΘ(a2t |st)

πΘ(aNtt |st)

current state action space

Design choice: include search history in the state representation

�33

a1t

a2t

aNtt

Our model extensions are applicable to any parameterization of

MINERVA (Das et al. 2018)

πΘ

Page 34: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

REINFORCE Training

+1

Tom Hanks

California

United States Steven Spielberg

Steven Spielberg

The Post

The Post

United StatesSteven

Spielberg

The Post Meryl Streep Phyllida Lloyd

⋮ ⋮

cast_in

cast_in

cast_in

born_in

cast_in collaborator

produced_in live_in

director

locate_in live_in

0

0

0

�34

collaboratorTom Hanks ?

Sample according to πΘ(at |st)

Update by maximizing the expected reward

Θ

REINFORCE (Williams, 1992)

Page 35: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

REINFORCE Training

+1

Tom Hanks

California

United States Steven Spielberg

Steven Spielberg

The Post

The Post

United StatesSteven

Spielberg

The Post Meryl Streep Phyllida Lloyd

⋮ ⋮

cast_in

cast_in

cast_in

born_in

cast_in collaborator

produced_in live_in

director

locate_in live_in

0

0

0

False-negative & nearly-correct entities ➡ true- negatives

�35

Page 36: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Reward Shaping

z

Tom Hanks

Steven Spielbergcollaborator

Topic relation

Topic entity

�36

Unobserved facts Soft correctness

Entity returned by search

Page 37: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Reward Shaping

z

Tom Hanks

Steven Spielbergcollaborator

Unobserved facts

Topic relation

Topic entity

Soft correctnessGeneral

�37

Entity returned by search

Page 38: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

collaborator

Reward Shaping

+1

Tom Hanks

California

United States Steven Spielberg

Steven Spielberg

The Post

The Post

United StatesSteven

Spielberg

The Post Meryl Streep Phyllida Lloyd

⋮ ⋮

cast_in

cast_in

cast_in

born_in

cast_in collaborator

produced_in live_in

director

locate_in live_in

Line

·ConvE

Pre-trained

+

�38

Steven Spielberg

Tom Hanks

collaboratorTom Hanks ?

Value range: (0, 1)

Page 39: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Action DropoutIntuition: avoid sticking to past actions that had received rewards

�39

Page 40: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Action Dropout

e1t

e2t

eNtt

r1t

r2t

rNtt

eses e1r1 et

rt

πΘ(ait |st)

0.08

0.6

0.01

More likely to be chosen

�40

Intuition: avoid sticking to past actions that had received rewards

Page 41: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Action Dropout

e1t

e2t

eNtt

r1t

r2t

rNtt

eses e1r1 et

rt

π̃Θ(at |st) ∝ πΘ(at |st) ⋅ m + ϵ

mi ∼ Bernoulli(1 − α), i = 1,…N

π̃Θ(ait |st)

0.9

0

0

More likely to be chosen

0.08

πΘ(ait |st)

0.01

Randomly offset the sampling probabilities w/ rate and renormalize α

�41

0.6

Page 42: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Randomly offset the sampling probabilities w/ rate and renormalize

Action Dropout

e1t

e2t

eNtt

r1t

r2t

rNtt

eses e1r1 et

rt

π̃Θ(at |st) ∝ πΘ(at |st) ⋅ m + ϵ

mi ∼ Bernoulli(1 − α), i = 1,…N

π̃Θ(ait |st)

0.9

0

0

More likely to be chosen

Force exploration

0.08

πΘ(ait |st)

0.01

α

�42

Page 43: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Randomly offset the sampling probabilities w/ rate and renormalize

Action Dropout

e1t

e2t

eNtt

r1t

r2t

rNtt

eses e1r1 et

rt

π̃Θ(at |st) ∝ πΘ(at |st) ⋅ m + ϵ

mi ∼ Bernoulli(1 − α), i = 1,…N

π̃Θ(ait |st)

0.9

0

0

More likely to be chosen

0.08

πΘ(ait |st)

0.01

α

�43

Force exploration

Up to ⨉8 # path traversed

Page 44: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Experiment Setup

Name # Ent. # Rel. # Fact # Degree Avg

# Degree Median

Kinship 104 25 8,544 85.15 82

UMLS 135 46 5,216 38.63 28

FB15k-237 14,505 237 272,115 19.74 14

WN18RR 40,945 11 86,835 2.19 2

NELL-995 75,492 200 154,213 4.07 1

KG Benchmarks

Decreasing connectivity

Evaluation Protocol: MRR (Mean Reciprocal Rank)�44

Page 45: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Ablation Studies

20

35

50

65

80

Kinship UMLS FB15k-237 WN18RR NELL-995

PG +RS +AD Ours

Fig 2. Dev set MRR (x100) comparison�45

Page 46: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Ablation Studies

20

35

50

65

80

Kinship UMLS FB15k-237 WN18RR NELL-995

PG +RS +AD Ours

Fig 2. Dev set MRR (x100) comparison�46

Page 47: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Ablation Studies

20

35

50

65

80

Kinship UMLS FB15k-237 WN18RR NELL-995

PG +RS +AD Ours

Fig 2. Dev set MRR (x100) comparison�47

Page 48: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Ablation Studies

20

35

50

65

80

Kinship UMLS FB15k-237 WN18RR NELL-995

PG +RS +AD Ours

Fig 2. Dev set MRR (x100) comparison

Esp. helpful on dense graphs

�48

Page 49: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Main Results

20

40

60

80

100

Kinship UMLS FB15k-237 WN18RR NELL-995

NeuralLP NTP-λ MINERVA Ours DistMult ComplEx ConvE

Fig 3. Test set MRR (x100) compared to SOTA multi-hop reasoning and embedding-based approaches �49

Page 50: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Main Results

20

40

60

80

100

Kinship UMLS FB15k-237 WN18RR NELL-995

NeuralLP NTP-λ MINERVA Ours DistMult ComplEx ConvE

improved SOTA multi-hop reasoning performance

Fig 3. Test set MRR (x100) compared to SOTA multi-hop reasoning and embedding-based approaches �50

Page 51: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Main Results

20

40

60

80

100

Kinship UMLS FB15k-237 WN18RR NELL-995

NeuralLP NTP-λ MINERVA Ours DistMult ComplEx ConvE

Consistently matched SOTA knowledge graph embedding performance

Fig 3. Test set MRR (x100) compared to SOTA multi-hop reasoning and embedding-based approaches �51

Page 52: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

FB15k-237 (Toutanova and Chen 2016)

profession

Interpretable ResultsLaura Carmichael ?

�52

Page 53: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

FB15k-237 (Toutanova and Chen 2016)

profession

Interpretable ResultsLaura Carmichael ?

�53

Page 54: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

FB15k-237 (Toutanova and Chen 2016)

profession

Interpretable ResultsLaura Carmichael ?

�54

Page 55: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

Code: https://github.com/salesforce/MultiHopKG

Thank you!

Future directions • Learn better reward shaping functions • Investigate similar techniques for other RL

paradigms (e.g. Q-learning) • Extend to more complicated structured queries (e.g.

more than one topic entities) • Extend to natural language QA

Page 56: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

BKI - Error AnalysisFB15k-237 NELL-995UMLS

Ours

ConvE

Fig 4. Dev set top-1 prediction error overlap of ConvE, Ours and Ours-RS. The absolute error rate of Ours is shown.

Ours-RS

8.9% 65.9% 25.6%

Page 57: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

BK II - ChallengesIncompleteness

director The Post

Meryl Streep

Tom Hanks

Steven Spielberg

United States

California

live_in produced_in

born_in

cast_in

collaborator?

cast_in

Phyllida Lloyd

locate_in

collaborator?

collaborator

Fig 1. % of false negative hit in the first 20 epochs of RL training on the UMLS KG benchmark (Kok and Domingos 2007)

≈ 30% false negative feedback

�57

Page 58: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

�58

Questions for Future Research

1. One natural question to ask is why a perceivable performance gap exists between the embedding-based (EB) model and the RL approach using the same EB model as the reward shaping module (slide 51), especially on FB15k-237 and NELL-995, the two larger and sparser KGs. Since the RL model has full access to the EB model, why could it still lose information? A possible explanation is that for examples where the RL models make mistakes, the topic entity and the target answer are not connected within the specified # hops. Yet our sanity check disproved this — for all examples where only the RL model makes mistakes the topic entity and the target answer were connected by at least one path. (We did not check the quality of these paths.) Conjecture: It is possible that the performance loss comes from the difficulty of RL optimization as it operates over a more complex model space. The RL model + training procedure have much more hyper-parameters than the EB models.

2. In our experiments, very large action dropout rates (0.9 and 0.95) yield good performance on the dense KGs (Kinship and UMLS), but the same strategy does not work for sparser KGs. We observed significant performance drop for FB15k-237, WN18RR and NELL-995 when using very large action dropout rates. And for WN18NN and NELL-995, action dropout rate > 0.1 hurts performance. It is unclear why REINFORCE training on the denser KGs can tolerate a larger shift from the actual policy during path sampling. Conjecture: It seems that the shape of the original policy function ought to be preserved to some degree during training. For Kinship and UMLS, the average node degrees are 85 and 39. In this case on average >= 2 edges remains on when we randomly turned off 95% of the edges. Since other KGs have smaller average node degrees, using a large action dropout rate is equivalent to doing random exploration most of the time.

3. Does EB models define the cap performance in the one-hop KG query answering set up? Could the tasks of path finding and learning KG embeddings be joined together in a way s.t. they can improve each other?

4. Our approach can be viewed as a way to explain pre-trained EB models. Are there better ways to do it?

Page 59: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

�59

Acknowledgement upon slides release - I

These slides benefit tremendously from the constructive feedback offered by Salesforce Research team members, including Caiming Xiong, Richard Socher, Yingbo Zhou, Alex Trott, Jin Qu, Lily Hu, Vena Li, Kazuma Hashimoto, Stephan Zheng, Jason Wu (intern) and others.

Page 60: Multi-Hop Knowledge Graph Reasoning with Reward Shapingvictorialin.net/pubs/MultiHopKG_emnlp2018_talk.pdf · DistMult (Yang et al. 2015), ComplEx (Trouillon et al. 2016), ConvE (Dettmers

�60

Acknowledgement upon slides release - II

I am grateful to Prof. Michael Ernst and his co-authors who released the slides of all of their paper presentations.

Especially, the and the highlighters were

borrowed from slides 1 and slides 2.

Victoria Lin Nov. 4, 2018

Bradley Hand Font Baloon