30
Published as a conference paper at ICLR 2020 DYNAMICALLY P RUNED MESSAGE PASSING N ET- WORKS FOR L ARGE - SCALE K NOWLEDGE G RAPH R EASONING Xiaoran Xu 1 , Wei Feng 1 , Yunsheng Jiang 1 , Xiaohui Xie 1 , Zhiqing Sun 2 , Zhi-Hong Deng 3 1 Hulu, {xiaoran.xu, wei.feng, yunsheng.jiang, xiaohui.xie}@hulu.com 2 Carnegie Mellon University, [email protected] 3 Peking University, [email protected] ABSTRACT We propose Dynamically Pruned Message Passing Networks (DPMPN) for large- scale knowledge graph reasoning. In contrast to existing models, embedding- based or path-based, we learn an input-dependent subgraph to explicitly model reasoning process. Subgraphs are dynamically constructed and expanded by ap- plying graphical attention mechanism conditioned on input queries. In this way, we not only construct graph-structured explanations but also enable message pass- ing designed in Graph Neural Networks (GNNs) to scale with graph sizes. We take the inspiration from the consciousness prior proposed by Bengio (2017) and develop a two-GNN framework to encode input-agnostic full structure represen- tation and learn input-dependent local one coordinated by an attention module. Experiments show the reasoning capability of our model to provide clear graph- ical explanations as well as predict results accurately, outperforming most state- of-the-art methods in knowledge base completion tasks. 1 I NTRODUCTION Modern deep learning systems should bring in explicit reasoning modeling to complement their black-box models, where reasoning takes a step-by-step form about organizing facts to yield new knowledge and finally draw a conclusion. Particularly, we rely on graph-structured representation to model reasoning by manipulating nodes and edges where semantic entities or relations can be explicitly represented (Battaglia et al., 2018). Here, we choose knowledge graph scenarios to study reasoning where semantics have been defined on nodes and edges. For example, in knowledge base completion tasks, each edge is represented by a triple hhead, rel, taili that contains two entities and their relation. The goal is to predict which entity might be a tail given query hhead, rel, ?i. Existing models can be categorized into embedding-based and path-based model families. The embedding-based (Bordes et al., 2013; Sun et al., 2018; Lacroix et al., 2018) often achieves a high score by fitting data using various neural network techniques but lacks interpretability. The path- based (Xiong et al., 2017; Das et al., 2018; Shen et al., 2018; Wang, 2018) attempts to construct an explanatory path to model an iterative decision-making process using reinforcement learning and recurrent networks. A question is: can we construct structured explanations other than a path to better explain reasoning in graph context. To this end, we propose to learn a dynamically induced subgraph which starts with a head node and ends with a predicted tail node as shown in Figure 1. Graph reasoning can be powered by Graph Neural Networks. Graph reasoning needs to learn about entities, relations, and their composing rules to manipulate structured knowledge and produce structured explanations. Graph Neural Networks (GNNs) provide such structured computation and also inherit powerful data-fitting capacity from deep neural networks (Scarselli et al., 2009; Battaglia et al., 2018). Specifically, GNNs follow a neighborhood aggregation scheme to recursively aggregate information from neighbors to update node states. After T iterations, each node can carry structure information from its T -hop neighborhood (Gilmer et al., 2017; Xu et al., 2018a). GNNs need graphical attention expression to interpret. Neighborhood attention operation is a popular way to implement attention mechanism on graphs (Velickovic et al., 2018; Hoshen, 2017) by 1

DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

DYNAMICALLY PRUNED MESSAGE PASSING NET-WORKS FOR LARGE-SCALE KNOWLEDGE GRAPHREASONING

Xiaoran Xu1, Wei Feng1, Yunsheng Jiang1, Xiaohui Xie1, Zhiqing Sun2, Zhi-Hong Deng3

1Hulu, {xiaoran.xu, wei.feng, yunsheng.jiang, xiaohui.xie}@hulu.com2Carnegie Mellon University, [email protected] University, [email protected]

ABSTRACT

We propose Dynamically Pruned Message Passing Networks (DPMPN) for large-scale knowledge graph reasoning. In contrast to existing models, embedding-based or path-based, we learn an input-dependent subgraph to explicitly modelreasoning process. Subgraphs are dynamically constructed and expanded by ap-plying graphical attention mechanism conditioned on input queries. In this way,we not only construct graph-structured explanations but also enable message pass-ing designed in Graph Neural Networks (GNNs) to scale with graph sizes. Wetake the inspiration from the consciousness prior proposed by Bengio (2017) anddevelop a two-GNN framework to encode input-agnostic full structure represen-tation and learn input-dependent local one coordinated by an attention module.Experiments show the reasoning capability of our model to provide clear graph-ical explanations as well as predict results accurately, outperforming most state-of-the-art methods in knowledge base completion tasks.

1 INTRODUCTION

Modern deep learning systems should bring in explicit reasoning modeling to complement theirblack-box models, where reasoning takes a step-by-step form about organizing facts to yield newknowledge and finally draw a conclusion. Particularly, we rely on graph-structured representationto model reasoning by manipulating nodes and edges where semantic entities or relations can beexplicitly represented (Battaglia et al., 2018). Here, we choose knowledge graph scenarios to studyreasoning where semantics have been defined on nodes and edges. For example, in knowledge basecompletion tasks, each edge is represented by a triple 〈head, rel, tail〉 that contains two entities andtheir relation. The goal is to predict which entity might be a tail given query 〈head, rel, ?〉.Existing models can be categorized into embedding-based and path-based model families. Theembedding-based (Bordes et al., 2013; Sun et al., 2018; Lacroix et al., 2018) often achieves a highscore by fitting data using various neural network techniques but lacks interpretability. The path-based (Xiong et al., 2017; Das et al., 2018; Shen et al., 2018; Wang, 2018) attempts to construct anexplanatory path to model an iterative decision-making process using reinforcement learning andrecurrent networks. A question is: can we construct structured explanations other than a path tobetter explain reasoning in graph context. To this end, we propose to learn a dynamically inducedsubgraph which starts with a head node and ends with a predicted tail node as shown in Figure 1.

Graph reasoning can be powered by Graph Neural Networks. Graph reasoning needs to learnabout entities, relations, and their composing rules to manipulate structured knowledge and producestructured explanations. Graph Neural Networks (GNNs) provide such structured computation andalso inherit powerful data-fitting capacity from deep neural networks (Scarselli et al., 2009; Battagliaet al., 2018). Specifically, GNNs follow a neighborhood aggregation scheme to recursively aggregateinformation from neighbors to update node states. After T iterations, each node can carry structureinformation from its T -hop neighborhood (Gilmer et al., 2017; Xu et al., 2018a).

GNNs need graphical attention expression to interpret. Neighborhood attention operation is apopular way to implement attention mechanism on graphs (Velickovic et al., 2018; Hoshen, 2017) by

1

Page 2: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

(a) The AthletePlaysForTeam task. (b) The OrganizationHiredPerson task.

Figure 1: Subgraph visualization on two examples from NELL995’s test data. Each task has tenthousands of nodes and edges. The big yellow represents a given head and the big red represents apredicted tail. Color indicates attention gained along T -step reasoning. Yellow means more attentionduring early steps while red means more attention at the end. Grey means less attention.

focusing on specific interactions with neighbors. Here, we propose a new graphical attention mech-anism not only for computation but also for interpretation. We present three considerations whenconstructing attention-induced subgraphs: (1) given a subgraph, we first attend within it to select afew nodes and then attend over those nodes’ neighborhood for next expansion; (2) we propagate at-tention across steps to capture long-term dependency; (3) our attention mechanism models reasoningprocess explicitly through pipeline disentangled from underlying representation computing.

GNNs need input-dependent pruning to scale. GNNs are notorious for their poor scalability.Consider one message passing iteration on a graph with |V | nodes and |E| edges. Even if the graphis sparse, the complexity of O(|E|) is still problematic on large graphs with millions of nodes andedges. Besides, mini-batch based training with batch size B and high dimensions D would lead toO(BD|E|) making things worse. However, we can avoid this situation by learning input-dependentpruning to run computation on dynamical graphs, as an input query often triggers a small fractionof the entire graph so that it is wasteful to perform computation over the full graph for each input.

Cognitive intuition of the consciousness prior. Bengio (2017) brought the notion of attentiveawareness from cognitive science into deep learning in his consciousness prior proposal. He pointedout a process of disentangling high-level factors from full underlying representation to form a low-dimensional combination through attention mechanism. He proposed to use two recurrent neuralnetworks (RNNs) to encode two types of state: unconscious state represented by a high-dimensionalvector before attention and conscious state by a derived low-dimensional vector after attention.

We use two GNNs instead to encode such states on nodes. We construct input-dependent subgraphsto run message passing efficiently, and also run full message passing over the entire graph to acquirefeatures beyond a local view constrained by subgraphs. We apply attention mechanism betweenthe two GNNs, where the bottom runs before attention, called Inattentive GNN (IGNN), and theabove runs on each attention-induced subgraph, called Attentive GNN (AGNN). IGNN providesrepresentation computed on the full graph for AGNN. AGNN reinforces representation within acohesive group of nodes to produce sharp semantics. Experimental results show that our modelattains very competitive scores on HITS@1,3 and the mean reciprocal rank (MRR) compared to thebest embedding-based method so far. More importantly, we provide explanations while they do not.

2 ADDRESSING THE SCALE-UP PROBLEM

Notation. We denote training data by {(xi, yi)}Ni=1. We denote a full graph by G = 〈V, E〉with relations R and an input-dependent subgraph by G(x) = 〈VG(x), EG(x)〉 which is an in-duced subgraph of G. We denote boundary of a graph by ∂G where V∂G = N(VG) − VG andN(VG) means neighbors of nodes in VG. We also denote high-order boundaries such as ∂2G whereV∂2G = N(N(VG)) ∪ N(VG) − VG. Trainable parameters include node embeddings {ev}v∈V ,

2

Page 3: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

relation embeddings {er}r∈R, and weights used in two GNNs and an attention module. When per-forming full or pruned message passing, node and relation embeddings will be indexed according tothe operated graph, denoted by θG or θG(x). For IGNN, we use Ht of size |V| ×D to denote nodehidden states at step t; for AGNN, we use Ht(x) of size |VG(x)| × D to denote. The objective iswritten as

∑Ni=1 l(xi, yi;θG(xi),θG), where G(xi) is dynamically constructed.

The scale-up problem in GNNs. First, we write the full message passing in IGNN as

Ht = fIGNN(Ht−1;θG), (1)

where fIGNN represents all involved operations in one message passing iteration over G, including:(1) computing messages along each edge with the complexity1 of O(BD|E|), (2) aggregating mes-sages received at each node with O(BD|E|), and (3) updating node states with O(BD|V|). ForT -step propagation, the per-batch complexity is O(BDT (|E|+ |V|)). Considering that backpropa-gation requires intermediate computation results to be saved during one pass, this complexity countsfor both time and space. However, since IGNN is input-agnostic, node representations can be sharedacross inputs in one batch so that we can remove B to get O(DT (|E| + |V|)). If we use a samplededge set E from E such that |E | ≈ k|V|, the complexity can be further reduced to O(DT |V|).

The pruned message passing in AGNN can be written as

Ht(x) = fAGNN(Ht−1(x),Ht;θG(x)). (2)

Its complexity can be computed similarly as above. However, we cannot remove B. Fortunately,subgraph G(x) is not G. If we let x be a node v, G(x) grows from a single node, i.e., G0(x) = {v},and expands itself each step, leading to a sequence of (G0(x), G1(x), . . . , GT (x)). Here, we de-scribe the expansion behavior as consecutive expansion, which means no jumping across neighbor-hood allowed, so that we can ensure that

Gt(x) ⊆ Gt−1(x) ∪ ∂Gt−1(x) ⊆ Gt−2(x) ∪ ∂2Gt−2(x). (3)

Many real-world graphs follow the small-world pattern, and the six degrees of separation impliesG0(x) ∪ ∂6G0(x) ≈ G. The upper bound of Gt(x) can grow exponentially in t, and there is noguarantee that Gt(x) will not explode.Proposition. Given a graph G (undirected or directed in both directions), we assume the probabilityof the degree of an arbitrary node being less than or equal to d is larger than p, i.e., P (deg(v) ≤d) > p,∀v ∈ V . Considering a sequence of consecutively expanding subgraphs (G0, G1, . . . , GT ),starting with G0 = {v}, for all t ≥ 1, we can ensure

P(|VGt | ≤

d(d− 1)t − 2

d− 2

)> p

d(d−1)t−1−2d−2 . (4)

The proposition implies the guarantee of upper-bounding |VGt(x)| becomes exponentially looser andweaker as t gets larger even if the given assumption has a small d and a large p (close to 1). Wedefine graph increment at step t as ∆Gt(x) such that Gt(x) = Gt−1(x) ∪ ∆Gt(x). To preventGt(x) from explosion, we need to constrain ∆Gt(x).

Sampling strategies. A simple but effective way to handle the large scale is to do sampling.

1. ∆Gt(x) = ∂Gt−1(x), where we sample nodes from the boundary of Gt−1(x).

2. ∆Gt(x) = ∂Gt−1(x), where we take the boundary of sampled nodes from Gt−1(x).

3. ∆Gt(x) = ∂Gt−1(x), where we sample nodes twice from Gt−1(x) and from ∂Gt−1(x).

4. ∆Gt(x) = ∂Gt−1(x), where we sample nodes three times with the last from ∂Gt−1(x).

Obviously, we have ∂Gt−1(x) ⊆ ∂Gt−1(x) ⊆ ∂Gt−1(x) and Gt−1(x) ∪ ∂Gt−1(x) ⊆ Gt−1(x) ∪∂Gt−1(x). Further, we let N1 and N3 be the maximum number of sampled nodes in ∂Gt−1(x) and

the last sampling of ∂Gt−1(x) respectively and let N2 be per-node maximum sampled neighbors in∂Gt−1(x), and then we can obtain much tighter guarantee as follow:

1We assume per-example per-edge per-dimension time cost as a unit time.

3

Page 4: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

AggrOp

... ...

... AggrOp

...

AttdOp

...

Inat tent ive GNN

At tent ive GNN

OneBatch

At tent i on Module

...

Pooling or returning the last

Figure 2: Model architecture used in knowledge graph reasoning.

1. P (|V∆Gt(x)| ≤ N1(d− 1)) > pN1 for ∂Gt−1(x).

2. P (|V∆Gt(x)| ≤ N1N2) = 1 and P (|V∆Gt(x)| ≤ N1 ·min(d−1, N2)) > pN1 for ∂Gt−1(x).

3. P (|V∆Gt(x)| ≤ min(N1N2, N3)) = 1 for ∂Gt−1(x).

Attention strategies. Although we guarantee |VGT (x)| ≤ 1 + T min(N1N2, N3) by ∂Gt−1(x) andconstrain the growth of Gt−1(x) by decreasing either N1N2 or N3, smaller sampling size meansless area explored and less chance to hit target nodes. To make efficient selection rather than ran-dom sampling, we apply attention mechanism to do the top-K selection where K can be small. We

change ∂Gt−1(x) to ∂Gt−1(x) where ∼ represents the operation of attending over nodes and pick-ing the top-K. There are two types of attention operations, one applied to Gt−1(x) and the other

applied to ∂Gt−1(x). Note that the size of ∂Gt−1(x) might be much larger if we intend to samplemore nodes with larger N2 to sufficiently explore the boundary. Nevertheless, we can address thisproblem by using smaller dimensions to compute attention, since attention on each node is a scalarrequiring a smaller capacity compared to node representation vectors computed in message passing.

3 DPMPN MODEL

3.1 ARCHITECTURE DESIGN FOR KNOWLEDGE GRAPH REASONING

Our model architecture as shown in Figure 2 consists of:

• IGNN module: performs full message passing to compute full-graph node representations.• AGNN module: performs a batch of pruned message passing to compute input-dependent node

representations which also make use of underlying representations from IGNN.• Attention Module: performs a flow-style attention transition process, conditioned on node repre-

sentations from both IGNN and AGNN but only affecting AGNN.

IGNN module. We implement it using standard message passing mechanism (Gilmer et al., 2017).If the full graph has an extremely large number of edges, we sample a subset of edges, Eτ ⊂ E ,randomly each step. For a batch of input queries, we let node representations from IGNN be sharedacross queries, containing no batch dimension. Thus, its complexity does not scale with batch sizeand the saved resources can be allocated to sampling more edges. Each node v has a state Hτ

v,: atstep τ , where the initial H0

v,: = ev . Each edge 〈v′, r, v〉 produces a message, denoted by Mτ〈v′,r,v〉,:

at step τ . The computation components include:

4

Page 5: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

• Message function: Mτ〈v′,r,v〉,: = ψIGNN(Hτ

v′,:, er,Hτv,:), where 〈v′, r, v〉 ∈ Eτ .

• Message aggregation: Mτ

v,: = 1√Nτ (v)

∑v′,rM

τ〈v′,r,v〉,:, where 〈v′, r, v〉 ∈ Eτ .

• Node state update function: Hτ+1v,: = Hτ

v,: + δIGNN(Hτv,:,M

τ

v,:, ev), where v ∈ V .

We compute messages only for sampled edges, 〈v′, r, v〉 ∈ Eτ , each step. Functions ψIGNN andδIGNN are implemented by a two-layer MLP (using leakyReLu for the first layer and tanh for thesecond) with input arguments concatenated respectively. Messages are aggregated by dividing thesum by the square root of Nτ (v), the number of neighbors that send messages to v, preserving thescale of variance. We use a residual adding to update each node state instead of a GRU or a LSTM.After running for T steps, we output a pooling result or simply the last, denoted by H = HT , tofeed into downstream modules.

AGNN module. AGNN is input-dependent, which means node states depend on input queryx = 〈head, rel, ?〉, denoted by Ht

v,:(x). We implement pruned message passing, running on smallsubgraphs each conditioned on an input query. We leverage the sparsity and only save Ht

v,:(x) forvisited nodes v ∈ VGt(x). When t = 0, we start from node head with VG0(x) = {vhead}. Whencomputing messages, denoted byM t

〈v′,r,v〉,:(x), we use an attending-sampling-attending procedure,explained in Section 3.2, to constrain the number of computed edges. The computation componentsinclude:

• Message function: M t〈v′,r,v〉,:(x) = ψAGNN(Ht

v′,:(x), cr(x),Htv,:(x)), where 〈v′, r, v〉 ∈

EGt(x)2, and cr(x) = [er, qhead, qrel] represents a context vector.

• Message aggregation: Mt

v,:(x) = 1√Nt(v)

∑v′,rM

t〈v′,r,v〉,:(x), where 〈v′, r, v〉 ∈ EGt(x).

• Node state attending function: Ht+1v,: (x) = at+1

v WHv,:, where at+1v is an attention score.

• Node state update function: Ht+1v,: (x) = Ht

v,:(x) + δAGNN(Htv,:(x),M

t

v,:(x), ct+1v (x)), where

ct+1v (x) = [Ht+1

v,: (x), qhead, qrel] also represents a context vector.

Query context is defined by its head and relation embeddings, i.e., qhead = ehead and qrel = erel.We introduce a node state attending function to pass node representation information from IGNNto AGNN weighted by a scalar attention score at+1

v and projected by a learnable matrix W . WeinitializeH0

v,:(x) = Hv,: for node v ∈ VG0(x), letting unseen nodes hold zero states.

Attention module. Attention over T steps is represented by a sequence of node probability dis-tributions, denoted by at (t = 1, 2 . . . , T ). The initial distribution a0 is a one-hot vector witha0[vhead] = 1. To spread attention, we need to compute transition matrices T t each step. Sinceit is conditioned on both IGNN and AGNN, we capture two types of interaction between v′ and v:Htv′,:(x) ∼Ht

v,:(x), andHtv′,:(x) ∼Hv,:. The former favors visited nodes, while the latter is used

to attend to unseen neighboring nodes.

T t:,v′ = softmaxv∈Nt(v′)(∑

rα1(Ht

v′,:(x), cr(x),Htv,:(x)) + α2(Ht

v′,:(x), cr(x),Hv,:))

α1(·) = MLP(Htv′,:(x), cr(x))TW1MLP(Ht

v,:(x), cr(x))

α2(·) = MLP(Htv′,:(x), cr(x))TW2MLP(Hv,:, cr(x))

(5)

whereW1 andW2 are two learnable matrices. Each MLP uses one single layer with the leakyReLuactivation. To reduce the complexity for computing T t, we use nodes v′ ∈ V

Gt(x), which contains

nodes with the k-largest attention scores at step t, and use nodes v sampled from v′’s neighborsto compute attention transition for the next step. Due to the fact that nodes v′ result from thetop-k pruning, the loss of attention may occur to diminish the total amount. Therefore, we use arenormalized version, at+1 = T tat/‖T tat‖, to compute new attention scores. We use attentionscores at the final step as the probability to predict the tail node.

2In practice, we can use a smaller set of edges than EGt(x) to pass messages as discussed in Section 3.2

5

Page 6: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

Sampling Horizon

Attending-to Horizon

Attending-from Horizon

Visited Nodes Visited Nodes

Full Neighborhood

Sampling Horizon

Attending-to Horizon

Attending-from Horizon

Full Neighborhood

Figure 3: Iterative attending-sampling-attending procedure balancing coverage and complexity.

3.2 COMPLEXITY REDUCTION BY ITERATIVE ATTENDING, SAMPLING AND ATTENDING

AGNN deals with each subgraph relying on input x and keeps a few selected nodes in VGt(x), calledvisited nodes. Initially, VG0(x) contains only one node vhead, and then VGt(x) is enlarged by addingnew nodes each step. When propagating messages, we can just consider the one-hop neighborhoodeach step. However, the expansion goes so rapidly that it covers almost all nodes after a few steps.The key to address the problem is to constrain the scope of nodes we can expand the boundary from,i.e., the core nodes which determine where we can go next. We call it the attending-from horizon,Gt(x), selected according to attention scores at. Given this horizon, we may still need node sam-pling over the neighborhood N(Gt(x)) in some cases where a hub node of extremely high degreeexists to cause an extremely large neighborhood. We introduce an attending-to horizon, denoted by

N(Gt(x)), inside the sampling horizon, denoted by N(Gt(x)). The attention module runs withinthe sampling horizon with smaller dimensions exchanged for sampling more neighbors for a largercoverage. In one word, we face a trade-off between coverage and complexity, and our strategy isto sample more but attend less plus using small dimensions to compute attention. We obtain theattending-to horizon according to newly computed attention scores at+1. Then, message passing

iteration at step t in AGNN can be further constrained on edges between Gt(x) and N(Gt(x)), asmaller set than EGt(x). We illustrate this procedure in Figure 3.

4 EXPERIMENTS

Datasets. We use six large KG datasets: FB15K, FB15K-237, WN18, WN18RR, NELL995, andYAGO3-10. FB15K-237 (Toutanova & Chen, 2015) is sampled from FB15K (Bordes et al., 2013)with redundant relations removed, and WN18RR (Dettmers et al., 2018) is a subset of WN18 (Bor-des et al., 2013) removing triples that cause test leakage. Thus, they are both considered morechallenging. NELL995 (Xiong et al., 2017) has separate datasets for 12 query relations each corre-sponding to a single-query-relation KBC task. YAGO3-10 (Mahdisoltani et al., 2014) contains thelargest KG with millions of edges. Their statistics are shown in Table 1. We find some statisticaldifferences between train and validation (or test). In a KG with all training triples as its edges, atriple (head, rel, tail) is considered as a multi-edge triple if the KG contains other triples that alsoconnect head and tail ignoring the direction. We notice that FB15K-237 is a special case comparedto the others, as there are no edges in its KG directly linking any pair of head and tail in valida-tion (or test). Therefore, when using training triples as queries to train our model, given a batch,for FB15K-237, we cut off from the KG all triples connecting the head-tail pairs in the given batch,ignoring relation types and edge directions, forcing the model to learn a composite reasoning patternrather than a single-hop pattern, and for the rest datasets, we only remove the triples of this batchand their inverse from the KG to avoid information leakage before training on this batch. This canbe regarded as a hyperparameter tuning whether to force a multi-hop reasoning or not, leading to aperformance boost of about 2% in HITS@1 on FB15-237.

Experimental settings. We use the same data split protocol as in many papers (Dettmers et al.,2018; Xiong et al., 2017; Das et al., 2018). We create a KG, a directed graph, consisting of alltrain triples and their inverse added for each dataset except NELL995, since it already includesreciprocal relations. Besides, every node in KGs has a self-loop edge to itself. We also add inverserelations into the validation and test set to evaluate the two directions. For evaluation metrics, we useHITS@1,3,10 and the mean reciprocal rank (MRR) in the filtered setting for FB15K-237, WN18RR,

6

Page 7: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

Table 1: Statistics of the six KG datasets. PME (tr) means the proportion of multi-edge triples intrain; PME (va) means the proportion of multi-edge triples in validation; AL (va) means the averagelength of shortest paths connecting each head-tail pair in validation.

Dataset #Entities #Rels #Train #Valid #Test PME (tr) PME (va) AL (va)FB15K 14,951 1,345 483,142 50,000 59,071 81.2% 80.6% 1.22FB15K-237 14,541 237 272,115 17,535 20,466 38.0% 0% 2.25WN18 40,943 18 141,442 5,000 5,000 93.1% 94.0% 1.18WN18RR 40,943 11 86,835 3,034 3,134 34.5% 35.5% 2.84NELL995 74,536 200 149,678 543 2,818 100% 31.1% 2.00YAGO3-10 123,188 37 1,079,040 5,000 5,000 56.4% 56.0% 1.75

Table 2: Comparison results on the FB15K-237 and WN18RR datasets. Results of [♠] are takenfrom (Nguyen et al., 2018), [♣] from (Dettmers et al., 2018), [♥] from (Shen et al., 2018), [♦] from(Sun et al., 2018), [4] from (Das et al., 2018), and [z] from (Lacroix et al., 2018). Some collectedresults only have a metric score while some including ours take the form of “mean (std)”.

FB15K-237 WN18RRMetric (%) H@1 H@3 H@10 MRR H@1 H@3 H@10 MRRTransE [♠] - - 46.5 29.4 - - 50.1 22.6DistMult [♣] 15.5 26.3 41.9 24.1 39 44 49 43DistMult [♥] 20.6 (.4) 31.8 (.2) - 29.0 (.2) 38.4 (.4) 42.4 (.3) - 41.3 (.3)ComplEx [♣] 15.8 27.5 42.8 24.7 41 46 51 44ComplEx [♥] 20.8 (.2) 32.6 (.5) - 29.6 (.2) 38.5 (.3) 43.9 (.3) - 42.2 (.2)ConvE [♣] 23.7 35.6 50.1 32.5 40 44 52 43ConvE [♥] 23.3 (.4) 33.8 (.3) - 30.8 (.2) 39.6 (.3) 44.7 (.2) - 43.3 (.2)RotatE [♦] 24.1 37.5 53.3 33.8 42.8 49.2 57.1 47.6ComplEx-N3[z] - - 56 37 - - 57 48NeuralLP [♥] 18.2 (.6) 27.2 (.3) - 24.9 (.2) 37.2 (.1) 43.4 (.1) - 43.5 (.1)MINERVA [♥] 14.1 (.2) 23.2 (.4) - 20.5 (.3) 35.1 (.1) 44.5 (.4) - 40.9 (.1)MINERVA [4] - - 45.6 - 41.3 45.6 51.3 -M-Walk [♥] 16.5 (.3) 24.3 (.2) - 23.2 (.2) 41.4 (.1) 44.5 (.2) - 43.7 (.1)DPMPN 28.6 (.1) 40.3 (.1) 53.0 (.3) 36.9 (.1) 44.4 (.4) 49.7 (.8) 55.8 (.5) 48.2 (.5)

FB15K, WN18, and YAGO3-10, and use the mean average precision (MAP) for NELL995’s single-query-relation KBC tasks. For NELL995, we follow the same evaluation procedure as in (Xionget al., 2017; Das et al., 2018; Shen et al., 2018), ranking the answer entities against the negativeexamples given in their experiments. We run our experiments using a 12G-memory GPU, TITANX (Pascal), with Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz. Our code is written in Pythonbased on TensorFlow 2.0 and NumPy 1.16 and can be found by the link3 below. We run threetimes for each hyperparameter setting per dataset to report the means and standard deviations. Seehyperparameter details in the appendix.

Baselines. We compare our model against embedding-based approaches, including TransE (Bordeset al., 2013), TransR (Lin et al., 2015b), DistMult (Yang et al., 2015), ConvE (Dettmers et al.,2018), ComplE (Trouillon et al., 2016), HolE (Nickel et al., 2016), RotatE (Sun et al., 2018), andComplEx-N3 (Lacroix et al., 2018), and path-based approaches that use RL methods, includingDeepPath (Xiong et al., 2017), MINERVA (Das et al., 2018), and M-Walk (Shen et al., 2018), andalso that uses learned neural logic, NeuralLP (Yang et al., 2017).

Comparison results and analysis. We report comparison on FB15K-23 and WN18RR in Table 2.Our model DPMPN significantly outperforms all the baselines in HITS@1,3 and MRR. Comparedto the best baseline, we only lose a few points in HITS@10 but gain a lot in HITS@1,3. Wespeculate that it is the reasoning capability that helps DPMPN make a sharp prediction by exploitinggraph-structured composition locally and conditionally. When a target becomes too vague to predict,reasoning may lose its advantage against embedding-based models. However, path-based baselines,with a certain ability to do reasoning, perform worse than we expect. We argue that it might beinappropriate to think of reasoning, a sequential decision process, equivalent to a sequence of nodes.The average lengths of the shortest paths between heads and tails as shown in Table 1 suggests a veryshort path, which makes the motivation of using a path almost useless. The reasoning pattern shouldbe modeled in the form of dynamical local graph-structured pattern with nodes densely connected

3https://github.com/anonymousauthor123/DPMPN

7

Page 8: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

0.5 1.0 1.5 2.0 2.5 3.0Epoch

40.0

42.5

45.0

47.5

50.0

52.5

55.0

57.5

60.0

Met

ric S

core

(%)

(A) Convergence Analysis (eval on test)H@1H@3H@10MMR

H@1 H@3 H@10 MMR40.0

42.5

45.0

47.5

50.0

52.5

55.0

57.5

60.0

Met

ric S

core

(%)

(B) IGNN Component AnalysisW/o IGNNWith IGNN

H@1 MMR40

42

44

46

48

50

52

54

Met

ric S

core

(%)

(C) Sampling Horizon AnalysisMax-sampling-per-node = 20Max-sampling-per-node = 50Max-sampling-per-node = 100Max-sampling-per-node = 200Max-sampling-per-node = 400

H@1 MMR40

42

44

46

48

50

52

54

Met

ric S

core

(%)

(D) Attending-to Horizon AnalysisMax-atteding-to-per-step = 20Max-atteding-to-per-step = 50Max-atteding-to-per-step = 100Max-atteding-to-per-step = 200Max-atteding-to-per-step = 400

H@1 MMR40

42

44

46

48

50

52

54

Met

ric S

core

(%)

(E) Attending-from Horizon AnalysisMax-atteding-from-per-step = 5Max-atteding-from-per-step = 10Max-atteding-from-per-step = 20Max-atteding-from-per-step = 40

H@1 MMR35.0

37.5

40.0

42.5

45.0

47.5

50.0

52.5

55.0

Met

ric S

core

(%)

(F) Searching Horizon Analysis#Steps-in-AGNN = 2#Steps-in-AGNN = 4#Steps-in-AGNN = 6#Steps-in-AGNN = 8

Figure 4: Experimental analysis on WN18RR. (A) Convergence analysis: we pick six model snap-shots during training and evaluate them on test. (B) IGNN component analysis: w/o IGNN uses zerostep to run message passing, while with IGNN uses two; (C)-(F) Sampling, attending-to, attending-from and searching horizon analysis. The charts on FB15K-237 can be found in the appendix.

with each other to produce a decision collectively. We also run our model on FB15K, WN18,and YAGO3-10, and the comparison results in the appendix show that DPMPN achieves a verycompetitive position against the best state of the art. We summarize the comparison on NELL995’stasks in the appendix. DPMPN performs the best on five tasks, also being competitive on the rest.

Convergence analysis. Our model converges very fast during training. We may use half of train-ing queries to train model to generalize as shown in Figure 4(A). Compared to less expensiveembedding-based models, our model need to traverse a number of edges when training on oneinput, consuming much time per batch, but it does not need to pass a second epoch, thus saving a lotof training time. The reason may be that training queries also belong to the KG’s edges and somemight be exploited to construct subgraphs during training on other queries.

Component analysis. Given the stacked GNN architecture, we want to examine how much eachGNN component contributes to the performance. Since IGNN is input-agnostic, we cannot relyon its node representations only to predict a tail given an input query. However, AGNN is input-dependent, which means it can be carried out to complete the task without taking underlying noderepresentations from IGNN. Therefore, we can arrange two sets of experiments: (1) AGNN + IGNN,and (2) AGNN-only. In AGNN-only, we do not run message passing in IGNN to compute Hv,:

but instead use node embeddings as Hv,:, and then we run pruned message passing in AGNN asusual. We want to be sure whether IGNN is actually useful. In this setting, we compare the firstset which runs IGNN for two steps against the second one which totally shuts IGNN down. Theresults in Figure 4(B) (and Figure 7(B) in Appendix) show that IGNN brings an amount of gains ineach metric on WN18RR (and FB15K-23), indicating that representations computed by full-graphmessage passing indeed help subgraph-based message passing.

Horizon analysis. The sampling, attending-to, attending-from and searching (i.e., propagationsteps) horizons determine how large area a subgraph can expand over. These factors affect com-putation complexity as well as prediction performance. Intuitively, enlarging the exploring area bysampling more, attending more, and searching longer, may increase the chance of hitting a targetto gain some performance. However, the experimental results in Figure 4(C)(D) show that it is notalways the case. In Figure 4(E), we can see that increasing the maximum number of attending-

8

Page 9: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

0 1 2 3 4Step

0

2

4

6

8

10

Entro

py o

f Atte

ntio

n Di

strib

utio

n(A) Attention Flow Analysis (on entroy)

AthletePlaysForTeamAthletePlaysInLeagueAthleteHomeStadiumAthletePlaysSportTeamPlaysSportOrgHeadQuarteredInCityWorksForPersonBornInLocationPersonLeadsOrgOrgHiredPersonAgentBelongsToOrgTeamPlaysInLeague

0 1 2 3 4Step

1.00

0.75

0.50

0.25

0.00

0.25

0.50

0.75

1.00(B) Attention Flow Analysis (top1's proportion)

AthletePlaysForTeamAthletePlaysInLeagueAthleteHomeStadiumAthletePlaysSportTeamPlaysSportOrgHeadQuarteredInCityWorksForPersonBornInLocationPersonLeadsOrgOrgHiredPersonAgentBelongsToOrgTeamPlaysInLeague

0 1 2 3 4Step

1.00

0.75

0.50

0.25

0.00

0.25

0.50

0.75

1.00(C) Attention Flow Analysis (top3's proportion)

AthletePlaysForTeamAthletePlaysInLeagueAthleteHomeStadiumAthletePlaysSportTeamPlaysSportOrgHeadQuarteredInCityWorksForPersonBornInLocationPersonLeadsOrgOrgHiredPersonAgentBelongsToOrgTeamPlaysInLeague

0 1 2 3 4Step

1.00

0.75

0.50

0.25

0.00

0.25

0.50

0.75

1.00(D) Attention Flow Analysis (top5's proportion)

AthletePlaysForTeamAthletePlaysInLeagueAthleteHomeStadiumAthletePlaysSportTeamPlaysSportOrgHeadQuarteredInCityWorksForPersonBornInLocationPersonLeadsOrgOrgHiredPersonAgentBelongsToOrgTeamPlaysInLeague

Figure 5: Analysis of attention flow on NELL995 tasks. (A) The average entropy of attentiondistributions changing along steps for each single-query-relation KBC task. (B)(C)(D) The changingof the proportion of attention concentrated at the top-1,3,5 nodes per step for each task.

0

2

4

6

8

10

Trai

ning

Tim

e fo

r One

Epo

ch (h

)

(A) Time Cost on Sampling HorizonsMax-sampling-per-node = 20Max-sampling-per-node = 50Max-sampling-per-node = 100Max-sampling-per-node = 200Max-sampling-per-node = 400

0

2

4

6

8

10

Trai

ning

Tim

e fo

r One

Epo

ch (h

)

(B) Time Cost on Attending-to HorizonsMax-atteding-to-per-step = 20Max-atteding-to-per-step = 50Max-atteding-to-per-step = 100Max-atteding-to-per-step = 200Max-atteding-to-per-step = 400

0

2

4

6

8

10

Trai

ning

Tim

e fo

r One

Epo

ch (h

)

(C) Time Cost on Attending-from HorizonsMax-atteding-from-per-step = 5Max-atteding-from-per-step = 10Max-atteding-from-per-step = 20Max-atteding-from-per-step = 40

0

2

4

6

8

10

Trai

ning

Tim

e fo

r One

Epo

ch (h

)

(D) Time Cost on Searching Horizons#Steps-in-AGNN = 2#Steps-in-AGNN = 4#Steps-in-AGNN = 6#Steps-in-AGNN = 8

0

2

4

6

8

10

Trai

ning

Tim

e fo

r One

Epo

ch (h

)

(E) Time Cost on Batch SizesBatch-size = 50Batch-size = 100Batch-size = 200Batch-size = 300

Figure 6: Analysis of time cost on WN18RR: (A)-(D) measure the one-epoch training time on dif-ferent horizon settings corresponding to Figure 4(C)-(F); (E) measures on different batch sizes us-ing horizon setting Max-sampling-per-node=20, Max-attending-to-per-step=20, Max-attending-from-per-step=20, and #Steps-in-AGNN=8. The charts on FB15K-237 can be found in the appendix.

from nodes per step is useful. That also explains why we call nodes in the attending-from horizonthe core nodes, as they determine where subgraphs can be expanded and how attention will bepropagated to affect the final probability distribution on the tail prediction. However, GPUs witha limited memory do not allow for a too large number of sampled or attended nodes especially forMax-attending-from-per-step. The detailed explanations can be found in attention strategies in Section2 where the upper bound is controlled by N1N2 and N3 (Max-attending-from-per-step correspondingto N1, Max-sampling-per-node to N2, and Max-attending-to-per-step to N3). In N1N2, Section 3.2 sug-gests that we should sample more by a large N2 but attend less by a small N1. Figure 4(F) suggeststhat the propagation steps of AGNN should not go below four.

Attention flow analysis. If the flow-style attention really captures the way we reason about theworld, its process should be conducted in a diverging-converging thinking pattern. Intuitively, first,for the diverging thinking phase, we search and collect ideas as much as we can; then, for theconverging thinking phase, we try to concentrate our thoughts on one point. To check whether theattention flow has such a pattern, we measure the average entropy of attention distributions changingalong steps and also the proportion of attention concentrated at the top-1,3,5 nodes. As we expect,attention is more focused at the final step and the beginning.

Time cost analysis. The time cost is affected not only by the scale of a dataset but also by the horizonsetting. For each dataset, we list the training time for one epoch corresponding to our standardhyperparameter settings in the appendix. Note that there is always a trade-off between complexityand performance. We thus study whether we can reduce time cost a lot at the price of sacrificing alittle performance. We plot the one-epoch training time in Figure 6(A)-(D), using the same settingsas we do in the horizon analysis. We can see that Max-attending-from-per-step and #Steps-in-AGNNaffect the training time significantly while Max-sampling-per-node and Max-attending-to-per-step affectvery slightly. Therefore, we can use smaller Max-sampling-per-node and Max-attending-to-per-step inorder to gain a larger batch size, making the computation more efficiency as shown in Figure 6(E).

Visualization. To further demonstrate the reasoning capability, we show visualization results ofsome pruned subgraphs on NELL995’s test data for 12 separate tasks. We avoid using the trainingdata in order to show generalization of the learned reasoning capability. We show the visualizationresults in Figure 1. See the appendix for detailed analysis and more visualization results.

9

Page 10: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

Discussion of the limitation. Although DPMPN shows a promising way to harness the scalabilityon large-scale graph data, current GPU-based machine learning platforms, such as TensorFlow andPyTorch, seem not ready to fully leverage sparse tensor computation which acts as building blocksto support dynamical computation graphs which varies from one input to another. Extra overheadcaused by extensive sparse operations will neutralize the benefits of exploiting sparsity.

5 RELATED WORK

Knowledge graph reasoning. Early work, including TransE (Bordes et al., 2013) and its analogues(Wang et al., 2014; Lin et al., 2015b; Ji et al., 2015), DistMult (Yang et al., 2015), ConvE (Dettmerset al., 2018) and ComplEx (Trouillon et al., 2016), focuses on learning embeddings of entities andrelations. Some recent works of this line (Sun et al., 2018; Lacroix et al., 2018) achieve high accu-racy. Another line aims to learn inference paths (Lao et al., 2011; Guu et al., 2015; Lin et al., 2015a;Toutanova et al., 2016; Chen et al., 2018; Lin et al., 2018) for knowledge graph reasoning, especiallyDeepPath (Xiong et al., 2017), MINERVA (Das et al., 2018), and M-Walk (Shen et al., 2018), whichuse RL to learn multi-hop relational paths. However, these approaches, based on policy gradientsor Monte Carlo tree search, often suffer from low sample efficiency and sparse rewards, requiringa large number of rollouts and sophisticated reward function design. Other efforts include learningsoft logical rules (Cohen, 2016; Yang et al., 2017) or compostional programs (Liang et al., 2016).

Relational reasoning in Graph Neural Networks. Relational reasoning is regarded as the key forcombinatorial generalization, taking the form of entity- and relation-centric organization to reasonabout the composition structure of the world (Craik, 1952; Lake et al., 2017). A multitude of recentimplementations (Battaglia et al., 2018) encode relational inductive biases into neural networks toexploit graph-structured representation, including graph convolution networks (GCNs) (Bruna et al.,2014; Henaff et al., 2015; Duvenaud et al., 2015; Kearnes et al., 2016; Defferrard et al., 2016; Niepertet al., 2016; Kipf & Welling, 2017; Bronstein et al., 2017) and graph neural networks (Scarselli et al.,2009; Li et al., 2016; Santoro et al., 2017; Battaglia et al., 2016; Gilmer et al., 2017). Variants ofGNN architectures have been developed. Relation networks (Santoro et al., 2017) use a simplebut effective neural module to model relational reasoning, and its recurrent versions (Santoro et al.,2018; Palm et al., 2018) do multi-step relational inference for long periods; Interaction networks(Battaglia et al., 2016) provide a general-purpose learnable physics engine, and two of its variants arevisual interaction networks (Watters et al., 2017) and vertex attention interaction networks (Hoshen,2017); Message passing neural networks (Gilmer et al., 2017) unify various GCNs and GNNs intoa general message passing formalism by analogy to the one in graphical models.

Attention mechanism on graphs. Neighborhood attention operation can enhance GNNs’ repre-sentation power (Velickovic et al., 2018; Hoshen, 2017; Wang et al., 2018; Kool, 2018). Theseapproaches often use multi-head self-attention to focus on specific interactions with neighbors whenaggregating messages, inspired by (Bahdanau et al., 2015; Lin et al., 2017; Vaswani et al., 2017).Most graph-based attention mechanisms attend over neighborhood in a single-hop fashion, and(Hoshen, 2017) claims that the multi-hop architecture does not help to model high-order interac-tion in experiments. However, a flow-style design of attention in (Xu et al., 2018b) shows a way tomodel long-range attention, stringing isolated attention operations by transition matrices.

6 CONCLUSION

We introduce Dynamically Pruned Message Passing Networks (DPMPN) and apply it to large-scaleknowledge graph reasoning tasks. We propose to learn an input-dependent local subgraph whichis progressively and selectively constructed to model a sequential reasoning process in knowledgegraphs. We use graphical attention expression, a flow-style attention mechanism, to guide and prunethe underlying message passing, making it scalable for large-scale graphs and also providing cleargraphical interpretations. We also take the inspiration from the consciousness prior to develop atwo-GNN framework to boost experimental performances.

10

Page 11: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

REFERENCES

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointlylearning to align and translate. CoRR, abs/1409.0473, 2015.

Peter W. Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, and KorayKavukcuoglu. Interaction networks for learning about objects, relations and physics. In NIPS,2016.

Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinıcius Flo-res Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, RyanFaulkner, Caglar Gulcehre, Francis Song, Andrew J. Ballard, Justin Gilmer, George E. Dahl,Ashish Vaswani, Kelsey R. Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess,Daan Wierstra, Pushmeet Kohli, Matthew Botvinick, Oriol Vinyals, Yujia Li, and Razvan Pas-canu. Relational inductive biases, deep learning, and graph networks. CoRR, abs/1806.01261,2018.

Yoshua Bengio. The consciousness prior. CoRR, abs/1709.08568, 2017.

Antoine Bordes, Nicolas Usunier, Alberto Garcıa-Duran, Jason Weston, and Oksana Yakhnenko.Translating embeddings for modeling multi-relational data. In NIPS, 2013.

Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Geomet-ric deep learning: Going beyond euclidean data. IEEE Signal Processing Magazine, 34:18–42,2017.

Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locallyconnected networks on graphs. CoRR, abs/1312.6203, 2014.

Wenhu Chen, Wenhan Xiong, Xifeng Yan, and William Yang Wang. Variational knowledge graphreasoning. In NAACL-HLT, 2018.

William W. Cohen. Tensorlog: A differentiable deductive database. CoRR, abs/1605.06523, 2016.

Kenneth H. Craik. The nature of explanation. 1952.

Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krish-namurthy, Alexander J. Smola, and Andrew McCallum. Go for a walk and arrive at the answer:Reasoning over paths in knowledge bases using reinforcement learning. CoRR, abs/1711.05851,2018.

Michael Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks ongraphs with fast localized spectral filtering. In NIPS, 2016.

Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. Convolutional 2dknowledge graph embeddings. In AAAI, 2018.

David K. Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gomez-Bombarelli,Timothy Hirzel, Alan Aspuru-Guzik, and Ryan P. Adams. Convolutional networks on graphs forlearning molecular fingerprints. In NIPS, 2015.

Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neuralmessage passing for quantum chemistry. In ICML, 2017.

Kelvin Guu, John Miller, and Percy S. Liang. Traversing knowledge graphs in vector space. InEMNLP, 2015.

Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graph-structureddata. CoRR, abs/1506.05163, 2015.

Yedid Hoshen. Vain: Attentional multi-agent predictive modeling. In NIPS, 2017.

Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jian Zhao. Knowledge graph embedding viadynamic mapping matrix. In ACL, 2015.

11

Page 12: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

Steven M. Kearnes, Kevin McCloskey, Marc Berndl, Vijay S. Pande, and Patrick Riley. Moleculargraph convolutions: moving beyond fingerprints. Journal of computer-aided molecular design,30 8:595–608, 2016.

Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional net-works. CoRR, abs/1609.02907, 2017.

Wouter Kool. Attention solves your tsp , approximately. 2018.

Timothee Lacroix, Nicolas Usunier, and Guillaume Obozinski. Canonical tensor decomposition forknowledge base completion. In ICML, 2018.

Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J Gershman. Buildingmachines that learn and think like people. The Behavioral and brain sciences, 40:e253, 2017.

Ni Lao, Tom Michael Mitchell, and William W. Cohen. Random walk inference and learning in alarge scale knowledge base. In EMNLP, 2011.

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. Gated graph sequence neuralnetworks. CoRR, abs/1511.05493, 2016.

Chen Liang, Jonathan Berant, Quoc V. Le, Kenneth D. Forbus, and Ni Lao. Neural symbolic ma-chines: Learning semantic parsers on freebase with weak supervision. In ACL, 2016.

Xi Victoria Lin, Richard Socher, and Caiming Xiong. Multi-hop knowledge graph reasoning withreward shaping. In EMNLP, 2018.

Yankai Lin, Zhiyuan Liu, and Maosong Sun. Modeling relation paths for representation learning ofknowledge bases. In EMNLP, 2015a.

Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relationembeddings for knowledge graph completion. In AAAI, 2015b.

Zhouhan Lin, Minwei Feng, Cıcero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, andYoshua Bengio. A structured self-attentive sentence embedding. CoRR, abs/1703.03130, 2017.

Farzaneh Mahdisoltani, Joanna Asia Biega, and Fabian M. Suchanek. Yago3: A knowledge basefrom multilingual wikipedias. In CIDR, 2014.

Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, and Dinh Q. Phung. A novel embeddingmodel for knowledge base completion based on convolutional neural network. In NAACL-HLT,2018.

Maximilian Nickel, Lorenzo Rosasco, and Tomaso A. Poggio. Holographic embeddings of knowl-edge graphs. In AAAI, 2016.

Mathias Niepert, Mohammed Hassan Ahmed, and Konstantin Kutzkov. Learning convolutionalneural networks for graphs. In ICML, 2016.

Rasmus Berg Palm, Ulrich Paquet, and Ole Winther. Recurrent relational networks. In NeurIPS,2018.

Adam Santoro, David Raposo, David G. T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter W.Battaglia, and Timothy P. Lillicrap. A simple neural network module for relational reasoning. InNIPS, 2017.

Adam Santoro, Ryan Faulkner, David Raposo, Jack W. Rae, Mike Chrzanowski, Theophane Weber,Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Timothy P. Lillicrap. Relational recurrentneural networks. In NeurIPS, 2018.

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini.The graph neural network model. IEEE Transactions on Neural Networks, 20:61–80, 2009.

Yelong Shen, Jianshu Chen, Pu Huang, Yuqing Guo, and Jianfeng Gao. M-walk: Learning to walkover graphs using monte carlo tree search. In NeurIPS, 2018.

12

Page 13: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. Rotate: Knowledge graph embeddingby relational rotation in complex space. CoRR, abs/1902.10197, 2018.

Kristina Toutanova and Danqi Chen. Observed versus latent features for knowledge base and textinference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and theirCompositionality, 2015.

Kristina Toutanova, Victoria Lin, Wen tau Yih, Hoifung Poon, and Chris Quirk. Compositionallearning of embeddings for relation paths in knowledge base and text. In ACL, 2016.

Theo Trouillon, Johannes Welbl, Sebastian Riedel, Eric Gaussier, and Guillaume Bouchard. Com-plex embeddings for simple link prediction. In ICML, 2016.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez,Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS, 2017.

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Alejandro Romero, Pietro Lio, and YoshuaBengio. Graph attention networks. CoRR, abs/1710.10903, 2018.

William Wang. Knowledge graph reasoning: Recent advances, 2018.

Xiaolong Wang, Ross B. Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7794–7803, 2018.

Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by trans-lating on hyperplanes. In AAAI, 2014.

Nicholas Watters, Daniel Zoran, Theophane Weber, Peter W. Battaglia, Razvan Pascanu, and AndreaTacchetti. Visual interaction networks: Learning a physics simulator from video. In NIPS, 2017.

Wenhan Xiong, Thien Hoang, and William Yang Wang. Deeppath: A reinforcement learning methodfor knowledge graph reasoning. In EMNLP, 2017.

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neuralnetworks? ArXiv, abs/1810.00826, 2018a.

Xiaoran Xu, Songpeng Zu, Chengliang Gao, Yuan Zhang, and Wei Feng. Modeling attention flowon graphs. CoRR, abs/1811.00497, 2018b.

Bishan Yang, Wen tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding entities andrelations for learning and inference in knowledge bases. CoRR, abs/1412.6575, 2015.

Fan Yang, Zhilin Yang, and William W. Cohen. Differentiable learning of logical rules for knowl-edge base reasoning. In NIPS, 2017.

13

Page 14: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

Appendix

7 PROOF

Proposition. Given a graph G (undirected or directed in both directions), we assume the probabilityof the degree of an arbitrary node being less than or equal to d is larger than p, i.e., P (deg(v) ≤d) > p,∀v ∈ V . Considering a sequence of consecutively expanding subgraphs (G0, G1, . . . , GT ),starting with G0 = {v}, for all t ≥ 1, we can ensure

P(|VGt | ≤

d(d− 1)t − 2

d− 2

)> p

d(d−1)t−1−2d−2 . (6)

Proof. We consider the extreme case of greedy consecutive expansion, where Gt = Gt−1 ∪∆Gt =Gt−1 ∪ ∂Gt−1, since if this case satisfies the inequality, any case of consecutive expansion can alsosatisfy it. By definition, all the subgraphs Gt are a connected graph. Here, we use ∆V t to denoteV∆Gt for short. In the extreme case, we can ensure that the newly added nodes ∆V t at step t onlybelong to the neighborhood of the last added nodes ∆V t−1. Since for t ≥ 2 each node in ∆V t−1

already has at least one edge within Gt−1 due to the definition of connected graphs, we can have

P(|∆V t| ≤ |∆V t−1|(d− 1)

)> p|∆V

t−1|. (7)

For t = 1, we have P (|∆V 1| ≤ d) > p and thus

P(|VG1 | ≤ 1 + d

)> p. (8)

For t ≥ 2, based on |VGt | = 1 + |∆V 1|+ . . .+ |∆V t|, we obtain

P(|VGt | ≤ 1 + d+ d(d− 1) + . . .+ d(d− 1)t−1

)> p1+d+d(d−1)+...+d(d−1)t−2

, (9)

which is

P(|VGt | ≤

d(d− 1)t − 2

d− 2

)> p

d(d−1)t−1−2d−2 . (10)

We can find that t = 1 also satisfies this inequality.

14

Page 15: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

8 HYPERPARAMETER SETTINGS

Table 3: Our standard hyperparameter settings we use for each dataset plus their one-epoch trainingtime. For experimental analysis, we only adjust one hyperparameter and keep the remaining fixedas the standard setting. For NELL995, the one-epoch training time means the average time cost ofthe 12 single-query-relation tasks.

Hyperparameter FB15K-237 FB15K WN18RR WN18 YAGO3-10 NELL995batch size 80 80 100 100 100 10n dims att 50 50 50 50 50 200n dims 100 100 100 100 100 200max sampling per step (in IGNN) 10000 10000 10000 10000 10000 10000max attending from per step 20 20 20 20 20 100max sampling per node (in AGNN) 200 200 200 200 200 1000max attending to per step 200 200 200 200 200 1000n steps in IGNN 2 1 2 1 1 1n steps in AGNN 6 6 8 8 6 5learning rate 0.001 0.001 0.001 0.001 0.0001 0.001optimizer Adam Adam Adam Adam Adam Adamgrad clipnorm 1 1 1 1 1 1n epochs 1 1 1 1 1 3One-epoch training time (h) 25.7 63.7 4.3 8.5 185.0 0.12

The hyperparameters can be categorized into three groups:

• Normal hyperparameters, including batch size, n dims att, n dims, learning rate, grad clipnorm, andn epochs. We set smaller dimensions, n dims att, for computation in the attention module, as ituses more edges than the message passing uses in AGNN, and also intuitively, it does not need topropagate high-dimensional messages but only compute scalar scores over a sampled neighbor-hood, in concert with the idea in the key-value mechanism (Bengio, 2017). We set n epochs = 1in most cases, indicating that our model can be trained well by one epoch only due to its fastconvergence.• The hyperparameters in charge of the sampling-attending horizon, including

max sampling per step that controls the maximum number to sample edges per step in IGNN, andmax sampling per node, max attending from per step and max attending to per step that control themaximum number to sample neighbors of each selected node per step per input, the maximumnumber of selected nodes for attending-from per step per input, and the maximum number ofselected nodes in a sampled neighborhood for attending-to per step per input in AGNN.• The hyperparameters in charge of the searching horizon, including n steps in IGNN representing

the number of propagation steps to run standard message passing in IGNN, and n steps in AGNNrepresenting the number of propagation steps to run pruned message passing in AGNN.

Note that we tune these hyperparameters according to not only their performances but also thecomputation resources available to us. In some cases, to deal with a very large knowledge graph withlimited resources, we need to make a trade-off between efficiency and effectiveness. For example,each of NELL995’s single-query-relation tasks has a small training set, though still with a largegraph, so we can reduce the batch size in favor of affording larger dimensions and a larger sampling-attending horizon without any concern for waiting too long to finish one epoch.

15

Page 16: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

9 MORE EXPERIMENTAL RESULTS

Table 4: Comparison results on the FB15K and WN18 datasets. Results of [♠] are taken from(Nickel et al., 2016), [♣] from (Dettmers et al., 2018), [♦] from (Sun et al., 2018), [♥] from (Yanget al., 2017), and [z] from (Lacroix et al., 2018). Our results take the form of ”mean (std)”.

FB15K WN18Metric (%) H@1 H@3 H@10 MRR H@1 H@3 H@10 MRRTransE [♠] 29.7 57.8 74.9 46.3 11.3 88.8 94.3 49.5HolE [♠] 40.2 61.3 73.9 52.4 93.0 94.5 94.9 93.8DistMult [♣] 54.6 73.3 82.4 65.4 72.8 91.4 93.6 82.2ComplEx [♣] 59.9 75.9 84.0 69.2 93.6 93.6 94.7 94.1ConvE [♣] 55.8 72.3 83.1 65.7 93.5 94.6 95.6 94.3RotatE [♦] 74.6 83.0 88.4 79.7 94.4 95.2 95.9 94.9ComplEx-N3 [z] - - 91 86 - - 96 95NeuralLP [♥] - - 83.7 76 - - 94.5 94DPMPN 72.6 (.4) 78.4 (.4) 83.4 (.5) 76.4 (.4) 91.6 (.8) 93.6 (.4) 94.9 (.4) 92.8 (.6)

Table 5: Comparison results on the YAGO3-10 dataset. Results of [♠] are taken from (Dettmerset al., 2018), [♣] from (Lacroix et al., 2018), and [z] from (Lacroix et al., 2018).

YAGO3-10Metric (%) H@1 H@3 H@10 MRRDistMult [♠] 24 38 54 34ComplEx [♠] 26 40 55 36ConvE [♠] 35 49 62 44ComplEx-N3 [z] - - 71 58DPMPN 48.4 59.5 67.9 55.3

Table 6: Comparison results of MAP scores (%) on NELL995’s single-query-relation KBC tasks.We take our baselines’ results from (Shen et al., 2018). No reports found on the last two in the paper.

Tasks NeuCFlow M-Walk MINERVA DeepPath TransE TransRAthletePlaysForTeam 83.9 (0.5) 84.7 (1.3) 82.7 (0.8) 72.1 (1.2) 62.7 67.3AthletePlaysInLeague 97.5 (0.1) 97.8 (0.2) 95.2 (0.8) 92.7 (5.3) 77.3 91.2AthleteHomeStadium 93.6 (0.1) 91.9 (0.1) 92.8 (0.1) 84.6 (0.8) 71.8 72.2AthletePlaysSport 98.6 (0.0) 98.3 (0.1) 98.6 (0.1) 91.7 (4.1) 87.6 96.3TeamPlayssport 90.4 (0.4) 88.4 (1.8) 87.5 (0.5) 69.6 (6.7) 76.1 81.4OrgHeadQuarteredInCity 94.7 (0.3) 95.0 (0.7) 94.5 (0.3) 79.0 (0.0) 62.0 65.7WorksFor 86.8 (0.0) 84.2 (0.6) 82.7 (0.5) 69.9 (0.3) 67.7 69.2PersonBornInLocation 84.1 (0.5) 81.2 (0.0) 78.2 (0.0) 75.5 (0.5) 71.2 81.2PersonLeadsOrg 88.4 (0.1) 88.8 (0.5) 83.0 (2.6) 79.0 (1.0) 75.1 77.2OrgHiredPerson 84.7 (0.8) 88.8 (0.6) 87.0 (0.3) 73.8 (1.9) 71.9 73.7AgentBelongsToOrg 89.3 (1.2) - - - - -TeamPlaysInLeague 97.2 (0.3) - - - - -

16

Page 17: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

0.5 1.0 1.5 2.0 2.5 3.0Epoch

20

25

30

35

40

45

50

55

60

65

Met

ric S

core

(%)

(A) Convergence Analysis (eval on test)H@1H@3H@10MMR

H@1 H@3 H@10 MMR25

30

35

40

45

50

55

Met

ric S

core

(%)

(B) IGNN Component AnalysisW/o IGNNWith IGNN

H@1 MMR25.0

27.5

30.0

32.5

35.0

37.5

40.0

42.5

45.0

Met

ric S

core

(%)

(C) Sampling Horizon AnalysisMax-sampling-per-node = 20Max-sampling-per-node = 50Max-sampling-per-node = 100Max-sampling-per-node = 200

H@1 MMR25.0

27.5

30.0

32.5

35.0

37.5

40.0

42.5

45.0

Met

ric S

core

(%)

(D) Attending-to Horizon AnalysisMax-atteding-to-per-step = 20Max-atteding-to-per-step = 50Max-atteding-to-per-step = 100Max-atteding-to-per-step = 200

H@1 MMR25.0

27.5

30.0

32.5

35.0

37.5

40.0

42.5

45.0

Met

ric S

core

(%)

(E) Attending-from Horizon AnalysisMax-atteding-from-per-step = 5Max-atteding-from-per-step = 10Max-atteding-from-per-step = 20

H@1 MMR15

20

25

30

35

40

45

Met

ric S

core

(%)

(F) Searching Horizon Analysis#Steps-in-AGNN = 2#Steps-in-AGNN = 4#Steps-in-AGNN = 6

Figure 7: Experimental analysis on FB15K-237. (A) Convergence analysis: we pick six modelsnapshots at time points of 0.3, 0.5, 0.7, 1, 2, and 3 epochs during training and evaluate them on test;(B) IGNN component analysis: w/o IGNN uses zero step to run message passing, while with IGNNuses two steps; (C)-(F) Sampling, attending-to, attending-from and searching horizon analysis.

0

5

10

15

20

25

30

35

Trai

ning

Tim

e fo

r One

Epo

ch (h

)

(A) Time Cost on Sampling HorizonsMax-sampling-per-node = 20Max-sampling-per-node = 50Max-sampling-per-node = 100Max-sampling-per-node = 200

0

5

10

15

20

25

30

35

Trai

ning

Tim

e fo

r One

Epo

ch (h

)

(B) Time Cost on Attending-to HorizonsMax-atteding-to-per-step = 20Max-atteding-to-per-step = 50Max-atteding-to-per-step = 100Max-atteding-to-per-step = 200

0

5

10

15

20

25

30

35

Trai

ning

Tim

e fo

r One

Epo

ch (h

)

(C) Time Cost on Attending-from HorizonsMax-atteding-from-per-step = 5Max-atteding-from-per-step = 10Max-atteding-from-per-step = 20

0

5

10

15

20

25

30

35

Trai

ning

Tim

e fo

r One

Epo

ch (h

)

(D) Time Cost on Searching Horizons#Steps-in-AGNN = 2#Steps-in-AGNN = 4#Steps-in-AGNN = 6

0

5

10

15

20

25

30

35

Trai

ning

Tim

e fo

r One

Epo

ch (h

)

(E) Time Cost on Batch SizesBatch-size = 50Batch-size = 100Batch-size = 200Batch-size = 300

Figure 8: Analysis of time cost on FB15K-237: (A)-(D) measure the one-epoch training time on dif-ferent horizon settings corresponding to Figure 7(C)-(F); (E) measures on different batch sizes usinghorizon setting Max-sampled-edges-per-node=20, Max-seen-nodes-per-step=20, Max-attended-nodes-per-step=20, and #Steps-of-AGNN=6.

10 MORE VISUALIZATION RESUTLS

10.1 CASE STUDY ON THE ATHLETEPLAYSFORTEAM TASK

In the case shown in Figure 9, the query is (concept personnorthamerica michael turner,concept:athleteplays-forteam, ?) and a true answer is concept sportsteam falcons. From Figure 9, wecan see our model learns that (concept personnorthamerica michael turner, concept:athletehomestadium,concept stadiumoreventvenue georgia dome) and (concept stadiumoreventvenue georgia dome, con-cept:teamhomestadium inv, concept sportsteam falcons) are two important facts to support the an-swer of concept sportsteam falcons. Besides, other facts, such as (concept athlete joey harrington,concept:athletehomestadium, concept stadiumoreventvenue georgia dome) and (concept athlete-joey harrington, concept:athleteplaysforteam, concept sportsteam falcons), provide a vivid example

that a person or an athlete with concept stadiumoreventvenue georgia dome as his or her homestadium might play for the team concept sportsteam falcons. We have such examples more thanone, like concept athlete roddy white’s and concept athlete quarterback matt ryan’s. The entity con-

17

Page 18: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

cept sportsleague nfl cannot help us differentiate the true answer from other NFL teams, but it canat least exclude those non-NFL teams. In a word, our subgraph-structured representation can wellcapture the relational and compositional reasoning pattern.

concept_personnorthamerica_michael_turner

concept_stadiumoreventvenue_georgia_dome

concept_sportsleague_nfl

concept_coach_deangelo_hall concept_athlete_roddy_white

concept_athlete_joey_harrington

concept_athlete_quarterback_matt_ryan

concept_coach_jerious_norwood

concept_athlete_chris_redman

concept_city_atlanta

concept_sportsteam_falcons

concept_sportsteam_dallas_cowboys

concept_sportsteam_seahawks

concept_sportsteam_sd_chargers

concept_sportsteam_minnesota_vikingsconcept_sportsteam_broncos

concept_sportsteam_cleveland_browns

concept_sportsteam_titans

concept_sportsteam_buffalo_billsconcept_sportsteam_kansas_city_chiefs

concept_sportsteam_oakland_raiders

concept_sport_football

concept_awardtrophytournament_division

concept_sportsteam_packers

concept_sportsteam_colts

concept_sportsteam_eagles

concept_sportsteam_new_york_giants

concept_sportsteam_bears_29_17

concept_sportsteam_bills

concept_sportsteam_steelers

concept_sportsteam_buccaneers concept_sportsteam_rams

concept_sportsteam_saintsconcept_sportsteam_bucs concept_sportsteam_tampa

concept_sportsteam_texansconcept_sportsteam_new_york_jets

concept_sportsteam_ravens

concept_personnorthamerica_michael_turner

concept_stadiumoreventvenue_georgia_dome

concept_sportsleague_nfl

concept_coach_deangelo_hallconcept_athlete_roddy_white

concept_athlete_joey_harrington

concept_athlete_quarterback_matt_ryan

concept_coach_jerious_norwood

concept_sportsteam_falconsconcept_sportsteam_oakland_raiders

concept_sport_football

concept_sportsteam_sd_chargers

concept_sportsteam_minnesota_vikings

concept_sportsteam_dallas_cowboys

concept_sportsteam_new_york_giants

concept_sportsteam_kansas_city_chiefs

concept_sportsteam_tampa

Figure 9: AthletePlaysForTeam. The head is concept personnorthamerica michael turner, the queryrelation is concept:athleteplaysforteam, and the tail is concept sportsteam falcons. The left is a full sub-graph derived with max attending from per step=20, and the right is a further pruned subgraph fromthe left based on attention. The big yellow node represents the head, and the big red node representsthe tail. Color on the rest indicates attention scores over a T -step reasoning process, where greymeans less attention, yellow means more attention gained during early steps, and red means gainingmore attention when getting closer to the final step.

For the AthletePlaysForTeam taskQuery : ( concept personnor thamer ica michae l tu rner , concept : a th le tep lays fo r team , concept spor ts team fa lcons )

Selected key edges :concept personnor thamer ica michae l tu rner , concept : agentbe longstoorgan iza t ion , concep t spo r t s l eague n f lconcept personnor thamer ica michae l tu rner , concept : athletehomestadium , concept stadiumoreventvenue georgia domeconcep t spo r t s league n f l , concept : agentcompeteswithagent , concep t spo r t s l eague n f lconcep t spo r t s league n f l , concept : agentcompeteswithagent inv , concep t spo r t s l eague n f lconcep t spo r t s league n f l , concept : teamplays in league inv , concept spor ts team sd chargersconcep t spo r t s league n f l , concept : leaguestadiums , concept stadiumoreventvenue georgia domeconcep t spo r t s league n f l , concept : teamplays in league inv , concept spor ts team fa lconsconcep t spo r t s league n f l , concept : agen tbe longs toorgan iza t ion inv , concept personnor thamer ica michae l tu rnerconcept stadiumoreventvenue georgia dome , concept : leaguestadiums inv , concep t spo r t s l eague n f lconcept stadiumoreventvenue georgia dome , concept : teamhomestadium inv , concept spor ts team fa lconsconcept stadiumoreventvenue georgia dome , concept : ath letehomestadium inv , c o n c e p t a t h l e t e j o e y h a r r i n g t o nconcept stadiumoreventvenue georgia dome , concept : ath letehomestadium inv , concep t a th le te roddy wh i t econcept stadiumoreventvenue georgia dome , concept : ath letehomestadium inv , concept coach deange lo ha l lconcept stadiumoreventvenue georgia dome , concept : ath letehomestadium inv , concept personnor thamer ica michae l tu rnerconcep t spo r t s league n f l , concept : subpa r t o f o rgan i za t i on i nv , concept spor ts team oak land ra idersconcept spor ts team sd chargers , concept : teamplaysinleague , concep t spo r t s l eague n f lconcept spor ts team sd chargers , concept : teamplaysagainstteam , concept spor ts team fa lconsconcept spor ts team sd chargers , concept : teamplaysagainst team inv , concept spor ts team fa lconsconcept spor ts team sd chargers , concept : teamplaysagainstteam , concep t spor ts team oak land ra idersconcept spor ts team sd chargers , concept : teamplaysagainst team inv , concept spor ts team oak land ra idersconcept spor ts team fa lcons , concept : teamplaysinleague , concep t spo r t s l eague n f lconcept spor ts team fa lcons , concept : teamplaysagainstteam , concept spor ts team sd chargersconcept spor ts team fa lcons , concept : teamplaysagainst team inv , concept spor ts team sd chargersconcept spor ts team fa lcons , concept : teamhomestadium , concept stadiumoreventvenue georgia domeconcept spor ts team fa lcons , concept : teamplaysagainstteam , concep t spor ts team oak land ra idersconcept spor ts team fa lcons , concept : teamplaysagainst team inv , concept spor ts team oak land ra idersconcept spor ts team fa lcons , concept : a th le te ledspo r t s team inv , c o n c e p t a t h l e t e j o e y h a r r i n g t o nc o n c e p t a t h l e t e j o e y h a r r i n g t o n , concept : athletehomestadium , concept stadiumoreventvenue georgia domec o n c e p t a t h l e t e j o e y h a r r i n g t o n , concept : a th le te ledspor ts team , concept spor ts team fa lconsc o n c e p t a t h l e t e j o e y h a r r i n g t o n , concept : a th le tep lays fo r team , concept spor ts team fa lconsconcep t a th le te roddy wh i te , concept : athletehomestadium , concept stadiumoreventvenue georgia domeconcep t a th le te roddy wh i te , concept : a th le tep lays fo r team , concept spor ts team fa lconsconcept coach deangelo hal l , concept : athletehomestadium , concept stadiumoreventvenue georgia dome

18

Page 19: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

concept coach deangelo hal l , concept : a th le tep lays fo r team , concep t spor ts team oak land ra idersconcep t spo r t s league n f l , concept : teamplays in league inv , concept spor ts team new york g iantsconcept spor ts team sd chargers , concept : teamplaysagainst team inv , concept spor ts team new york g iantsconcept spor ts team fa lcons , concept : teamplaysagainstteam , concept spor ts team new york g iantsconcept spor ts team fa lcons , concept : teamplaysagainst team inv , concept spor ts team new york g iantsconcept spor ts team oak land ra iders , concept : teamplaysagainst team inv , concept spor ts team new york g iantsconcept spor ts team oak land ra iders , concept : teamplaysagainstteam , concept spor ts team sd chargersconcept spor ts team oak land ra iders , concept : teamplaysagainst team inv , concept spor ts team sd chargersconcept spor ts team oak land ra iders , concept : teamplaysagainstteam , concept spor ts team fa lconsconcept spor ts team oak land ra iders , concept : teamplaysagainst team inv , concept spor ts team fa lconsconcept spor ts team oak land ra iders , concept : agentcompeteswithagent , concep t spor ts team oak land ra idersconcept spor ts team oak land ra iders , concept : agentcompeteswithagent inv , concept spor ts team oak land ra idersconcept spor ts team new york g iants , concept : teamplaysagainstteam , concept spor ts team sd chargersconcept spor ts team new york g iants , concept : teamplaysagainstteam , concept spor ts team fa lconsconcept spor ts team new york g iants , concept : teamplaysagainst team inv , concept spor ts team fa lconsconcept spor ts team new york g iants , concept : teamplaysagainstteam , concept spor ts team oak land ra iders

10.2 MORE RESULTS

concept_personnorthamerica_matt_treanor

concept_sportsteamposition_center

concept_sport_baseball

concept_personus_orlando_hudson

concept_athlete_ben_hendrickson

concept_athlete_hunter_pence

concept_athlete_gary_carter

concept_athlete_kevin_cash

concept_athlete_scott_linebrink

concept_athlete_jeff_kentconcept_athlete_freddy_sanchez

concept_athlete_mike_mussina

concept_athlete_matt_clement

concept_coach_ian_snell

concept_athlete_shin_soo_choo

concept_athlete_justin_verlander

concept_athlete_aaron_hill

concept_coach_alex_gordon

concept_athlete_pelfreyconcept_athlete_rajai_davis

concept_athlete_justin_morneau

concept_coach_j_j__hardy

concept_athlete_steve_carlton

concept_sportsleague_mlb

concept_sportsteam_twins

concept_athlete_kei_igawa

concept_coach_adam_laroche

concept_male_mike_stanton

concept_athlete_jake_peavy

concept_coach_seth_mcclung

concept_athlete_mark_hendrickson

concept_personmexico_jason_giambi

concept_athlete_russell_branyan

concept_personnorthamerica_matt_treanorconcept_sportsteamposition_center

concept_sport_baseball

concept_personus_orlando_hudson

concept_athlete_ben_hendrickson

concept_athlete_hunter_pence

concept_athlete_gary_carter

concept_athlete_kevin_cash

concept_sportsleague_mlb

concept_coach_j_j__hardy

Figure 10: AthletePlaysInLeague. The head is concept personnorthamerica matt treanor, the queryrelation is concept:athleteplaysinleague, and the tail is concept sportsleague mlb. The left is a full sub-graph derived with max attending from per step=20, and the right is a further pruned subgraph fromthe left based on attention. The big yellow node represents the head, and the big red node representsthe tail. Color on the rest indicates attention scores over a T -step reasoning process, where greymeans less attention, yellow means more attention gained during early steps, and red means gainingmore attention when getting closer to the final step.

For the AthletePlaysInLeague taskQuery : ( concept personnor thamer ica mat t t reanor , concept : a th le tep lays in league , concept spor ts league mlb )

Selected key edges :concept personnor thamer ica mat t t reanor , concept : a t h l e t e f l y o u t t o s p o r t s t e a m p o s i t i o n , concep t spo r t s teampos i t i on cen te rconcept personnor thamer ica mat t t reanor , concept : a t h l e t ep l aysspo r t , concep t spo r t baseba l lconcep t spor ts teampos i t i on cen te r , concept : a t h l e t e f l y o u t t o s p o r t s t e a m p o s i t i o n i n v , concept personus or lando hudsonconcep t spor ts teampos i t i on cen te r , concept : a t h l e t e f l y o u t t o s p o r t s t e a m p o s i t i o n i n v , concept a th le te ben hendr icksonconcep t spor ts teampos i t i on cen te r , concept : a t h l e t e f l y o u t t o s p o r t s t e a m p o s i t i o n i n v , co nc ep t co ac h j j ha rd yconcep t spor ts teampos i t i on cen te r , concept : a t h l e t e f l y o u t t o s p o r t s t e a m p o s i t i o n i n v , concep t a th le te hun te r penceconcep t spor t baseba l l , concept : a t h l e t e p l a y s s p o r t i n v , concept personus or lando hudsonconcep t spor t baseba l l , concept : a t h l e t e p l a y s s p o r t i n v , concept a th le te ben hendr icksonconcep t spor t baseba l l , concept : a t h l e t e p l a y s s p o r t i n v , c onc ep t co ac h j j ha rd yconcep t spor t baseba l l , concept : a t h l e t e p l a y s s p o r t i n v , concep t a th le te hun te r penceconcept personus or lando hudson , concept : a th le tep lays in league , concept spor ts league mlbconcept personus or lando hudson , concept : a t h l e t ep l aysspo r t , concep t spo r t baseba l lconcept a th le te ben hendr ickson , concept : coachesinleague , concept spor ts league mlbconcept a th le te ben hendr ickson , concept : a t h l e t ep l aysspo r t , concep t spo r t baseba l l

19

Page 20: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

concep t coach j j ha rdy , concept : coachesinleague , concept spor ts league mlbconcep t coach j j ha rdy , concept : a th le tep lays in league , concept spor ts league mlbconcep t coach j j ha rdy , concept : a t h l e t ep l ayssp o r t , concep t spo r t baseba l lconcept a th le te hunter pence , concept : a th le tep lays in league , concept spor ts league mlbconcept a th le te hunter pence , concept : a t h l e t ep l aysspo r t , concep t spo r t baseba l lconcept spor ts league mlb , concept : coachesin league inv , concep t a th le te ben hendr icksonconcept spor ts league mlb , concept : coachesin league inv , con ce p t co ac h j j ha rd y

concept_athlete_eli_manning

concept_sportsteam_new_york_giants

concept_sportsleague_nfl

concept_male_archie_manning

concept_athlete_joe_bradley

concept_sport_football

concept_awardtrophytournament_super_bowl

concept_stadiumoreventvenue_giants_stadium

concept_sportsteam_patsconcept_city_philadelphia

concept_athlete_barry_bonds

concept_sportsteamposition_center

concept_bone_knee

concept_athlete_blair_betts

concept_personus_jeremy_shockey

concept_athlete_phil_simms

concept_sport_basketball

concept_athlete_rich_auriliaconcept_athlete_justin_tuck

concept_coach_john_mcgraw

concept_athlete_lawrence_taylor

concept_athlete_steve_smith

concept_lake_new

concept_city_east_rutherford

concept_sportsleague_nhl

concept_sportsteam_new_york_jets

concept_city_new_york

concept_person_belichick

concept_athlete_john_cassell

concept_stadiumoreventvenue_paul_brown_stadiumconcept_sportsteam_colts

concept_stadiumoreventvenue_mcafee_coliseum

concept_awardtrophytournament_super_bowl_xlii

concept_stadiumoreventvenue_lucas_oil_stadium

concept_stadiumoreventvenue_meadowlands_stadium

concept_stadiumoreventvenue_gillette_stadium

concept_stadiumoreventvenue_izod_center

concept_stadiumoreventvenue_metrodome

concept_geopoliticalorganization_new_england

concept_geopoliticallocation_national

concept_athlete_ensberg

concept_stateorprovince_states

concept_sport_hockey

concept_stadiumoreventvenue_united_center

concept_athlete_eli_manning

concept_sportsteam_new_york_giants

concept_sportsleague_nfl

concept_male_archie_manning

concept_athlete_joe_bradley

concept_sport_football

concept_stadiumoreventvenue_giants_stadium

concept_lake_new

concept_city_east_rutherford

concept_sportsleague_nhl

concept_stadiumoreventvenue_paul_brown_stadium

concept_stadiumoreventvenue_mcafee_coliseum

concept_stadiumoreventvenue_lucas_oil_stadium

Figure 11: AthleteHomeStadium. The head is concept athlete eli manning, the query relation isconcept:athletehomestadium, and the tail is concept stadiumoreventvenue giants stadium. The left is a fullsubgraph derived with max attending from per step=20, and the right is a further pruned subgraphfrom the left based on attention. The big yellow node represents the head, and the big red noderepresents the tail. Color on the rest indicates attention scores over a T -step reasoning process,where grey means less attention, yellow means more attention gained during early steps, and redmeans gaining more attention when getting closer to the final step.

For the AthleteHomeStadium taskQuery : ( concep t a th le te e l i mann ing , concept : athletehomestadium , concept s tad iumoreventvenue giants s tad ium )

Selected key edges :concep t a th le te e l i mann ing , concept : personbe longstoorgan izat ion , concept spor ts team new york g iantsconcep t a th le te e l i mann ing , concept : a th le tep lays fo r team , concept spor ts team new york g iantsconcep t a th le te e l i mann ing , concept : a th le te ledspor ts team , concept spor ts team new york g iantsconcep t a th le te e l i mann ing , concept : a th le tep lays in league , concep t spo r t s l eague n f lconcep t a th le te e l i mann ing , concept : f a the ro fpe rson inv , concept male archie manningconcept spor ts team new york g iants , concept : teamplaysinleague , concep t spo r t s l eague n f lconcept spor ts team new york g iants , concept : teamhomestadium , concept s tad iumoreventvenue giants s tad iumconcept spor ts team new york g iants , concept : personbe longs toorgan iza t ion inv , concep t a th le te e l i mann ingconcept spor ts team new york g iants , concept : a th l e t ep lays fo r t eam inv , concep t a th le te e l i mann ingconcept spor ts team new york g iants , concept : a th le te ledspo r t s team inv , concep t a th le te e l i mann ingconcep t spo r t s league n f l , concept : teamplays in league inv , concept spor ts team new york g iantsconcep t spo r t s league n f l , concept : agentcompeteswithagent , concep t spo r t s l eague n f lconcep t spo r t s league n f l , concept : agentcompeteswithagent inv , concep t spo r t s l eague n f lconcep t spo r t s league n f l , concept : leaguestadiums , concept s tad iumoreventvenue giants s tad iumconcep t spo r t s league n f l , concept : a t h l e t e p l a y s i n l e a g u e i n v , concep t a th le te e l i mann ingconcept male archie manning , concept : fa thero fperson , concep t a th le te e l i mann ingconcep t spo r t s league n f l , concept : leaguestadiums , concept stadiumoreventvenue paul brown stadiumconcept s tad iumoreventvenue giants stad ium , concept : teamhomestadium inv , concept spor ts team new york g iantsconcept s tad iumoreventvenue giants stad ium , concept : leaguestadiums inv , concep t spo r t s l eague n f lconcept s tad iumoreventvenue giants stad ium , concept : p rox y fo r i n v , c o n c e p t c i t y e a s t r u t h e r f o r dc o n c e p t c i t y e a s t r u t h e r f o r d , concept : p roxy for , concept s tad iumoreventvenue giants s tad iumconcept stadiumoreventvenue paul brown stadium , concept : leaguestadiums inv , concep t spo r t s l eague n f l

For the AthletePlaysSport task

20

Page 21: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

concept_athlete_vernon_wells

concept_sportsleague_mlb

concept_sportsteam_blue_jays

concept_awardtrophytournament_world_series

concept_sportsteamposition_center

concept_sportsteam_yankees

concept_sportsteam_pittsburgh_pirates

concept_sportsteam_dodgers

concept_sportsteam_twins

concept_sport_baseball

concept_sportsteam_white_sox

concept_sportsteam_new_york_metsconcept_sportsteam_red_sox

concept_sportsteam_phillies

concept_sportsteam_cleveland_indians_organizationconcept_sportsteam_los_angeles_dodgers

concept_sportsteam_red_sox_this_seasonconcept_sportsteam_orioles

concept_sportsteam_detroit_tigersconcept_sportsteam_philadelphia_athletics

concept_sportsteam_washington_nationals

concept_sportsteam_mariners

concept_sportsteam_louisville_cardinals

concept_sportsteam_st___louis_cardinals

concept_country_netherlandsconcept_country_china

concept_country___america

concept_sportsgame_n1951_world_series

concept_sportsgame_aldsconcept_stadiumoreventvenue_us_cellular_field

concept_sportsteam_eagles

concept_country_canada_canada

concept_awardtrophytournament_championships

concept_sport_hockey

concept_sport_golf

concept_sport_football

concept_sport_ski

concept_sport_basketballconcept_geopoliticallocation_national

concept_sport_skiingconcept_country_countries

concept_publication_people_

concept_beverage_teaconcept_sport_cricket

concept_beverage_beer

concept_person_william001

concept_sport_soccerconcept_bank_national

concept_bank_china_construction_bank

concept_stateorprovince_states

concept_country_new_zealandconcept_country_u_s_

concept_hobby_hobbies

concept_country_the_united_kingdom

concept_country_thailand

concept_country_france_france

concept_country_switzerland

concept_awardtrophytournament_prizes

concept_country_spain

concept_country_britain

concept_country_wales

concept_country_portugal

concept_country_malaysiaconcept_country_bulgaria

concept_athlete_vernon_wells

concept_sportsleague_mlb

concept_sportsteam_blue_jaysconcept_awardtrophytournament_world_series

concept_sportsteamposition_center

concept_sportsteam_yankeesconcept_sportsteam_pittsburgh_pirates

concept_sportsteam_dodgers

concept_sportsteam_twins

concept_sport_baseball

concept_sport_hockey

concept_sport_golf

concept_sport_football

concept_sport_ski

concept_country_new_zealand

Figure 12: AthletePlaysSport. The head is concept athlete vernon wells, the query relation is con-cept:athleteplayssport, and the tail is concept sport baseball. The left is a full subgraph derived withmax attending from per step=20, and the right is a further pruned subgraph from the left based on at-tention. The big yellow node represents the head, and the big red node represents the tail. Color onthe rest indicates attention scores over a T -step reasoning process, where grey means less attention,yellow means more attention gained during early steps, and red means gaining more attention whengetting closer to the final step.

Query : ( concep t a th le te ve rnon we l l s , concept : a t h l e t ep l aysspo r t , concep t spo r t baseba l l )

Selected key edges :concep t a th le te ve rnon we l l s , concept : a th le tep lays in league , concept spor ts league mlbconcep t a th le te ve rnon we l l s , concept : coachwontrophy , concept awardt rophytournament wor ld ser iesconcep t a th le te ve rnon we l l s , concept : agen tco l l abo ra tesw i thagen t inv , concep t spor ts team blue jaysconcep t a th le te ve rnon we l l s , concept : personbelongstoorgan izat ion , concep t spor ts team blue jaysconcep t a th le te ve rnon we l l s , concept : a th le tep lays fo r team , concep t spor ts team blue jaysconcep t a th le te ve rnon we l l s , concept : a th le te ledspor ts team , concep t spor ts team blue jaysconcept spor ts league mlb , concept : teamplays in league inv , concept sportsteam dodgersconcept spor ts league mlb , concept : teamplays in league inv , concept sportsteam yankeesconcept spor ts league mlb , concept : teamplays in league inv , concep t spo r t s t eam p i t t sbu rgh p i r a t esconcept awardt rophytournament wor ld ser ies , concept : teamwontrophy inv , concept sportsteam dodgersconcept awardt rophytournament wor ld ser ies , concept : teamwontrophy inv , concept sportsteam yankeesconcept awardt rophytournament wor ld ser ies , concept : awardtrophytournament is thechampionshipgameof thenat ionalspor t ,

concep t spo r t baseba l lconcept awardt rophytournament wor ld ser ies , concept : teamwontrophy inv , concep t spo r t s t eam p i t t sbu rgh p i r a t esconcept spor ts team blue jays , concept : teamplaysinleague , concept spor ts league mlbconcept spor ts team blue jays , concept : teamplaysagainstteam , concept sportsteam yankeesconcept spor ts team blue jays , concept : teamplayssport , concep t spo r t baseba l lconcept sportsteam dodgers , concept : teamplaysagainstteam , concept sportsteam yankeesconcept sportsteam dodgers , concept : teamplaysagainst team inv , concept sportsteam yankeesconcept sportsteam dodgers , concept : teamwontrophy , concept awardt rophytournament wor ld ser iesconcept sportsteam dodgers , concept : teamplayssport , concep t spo r t baseba l lconcept sportsteam yankees , concept : teamplaysagainstteam , concept sportsteam dodgersconcept sportsteam yankees , concept : teamplaysagainst team inv , concept sportsteam dodgersconcept sportsteam yankees , concept : teamwontrophy , concept awardt rophytournament wor ld ser iesconcept sportsteam yankees , concept : teamplayssport , concep t spo r t baseba l lconcept sportsteam yankees , concept : teamplaysagainstteam , concep t spo r t s t eam p i t t sbu rgh p i r a t esconcept sportsteam yankees , concept : teamplaysagainst team inv , concep t spo r t s t eam p i t t sbu rgh p i r a t esconcep t spor t baseba l l , concept : teamplaysspor t inv , concept sportsteam dodgersconcep t spor t baseba l l , concept : teamplaysspor t inv , concept sportsteam yankeesconcep t spor t baseba l l , concept : awardt rophytournament is thechampionsh ipgameof thenat iona lspor t inv ,

concept awardt rophytournament wor ld ser iesconcep t spor t baseba l l , concept : teamplaysspor t inv , concep t spo r t s t eam p i t t sbu rgh p i r a t esconcep t spo r t s team p i t t sbu rgh p i ra tes , concept : teamplaysagainstteam , concept sportsteam yankeesconcep t spo r t s team p i t t sbu rgh p i ra tes , concept : teamplaysagainst team inv , concept sportsteam yankees

21

Page 22: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

concep t spo r t s team p i t t sbu rgh p i ra tes , concept : teamwontrophy , concept awardt rophytournament wor ld ser iesconcep t spo r t s team p i t t sbu rgh p i ra tes , concept : teamplayssport , concep t spo r t baseba l l

concept_sportsteam_red_wings

concept_athlete_lidstrom

concept_sportsteam_blue_jackets

concept_sportsteam_montreal_canadiens

concept_sportsteam_anaheim_ducks

concept_sportsteam_columbus_blue_jackets

concept_athlete_chelios

concept_sportsteam_hawks

concept_sportsteam_edmonton_oilers

concept_sportsteam_chicago_blackhawks

concept_sportsteam_flyers_playoff_ticketsconcept_sportsteam_pittsburgh_penguins

concept_sportsteam_l_a__kings

concept_sportsteam_colorado_avalanche

concept_sportsteam_dallas_stars

concept_stateorprovince_new_york

concept_sportsteam_devils

concept_sportsteam_buffalo_sabres

concept_sportsgame_series

concept_sportsteam_bruins

concept_sportsteam_kings_college

concept_sport_hockey

concept_sportsleague_nhl

concept_sport_basketball

concept_sportsteam_leafsconcept_stadiumoreventvenue_joe_louis_arena

concept_sportsteam_blackhawks

concept_sportsteam_rangers

concept_sportsteam_capitals

concept_sportsteam_new_york_islanders

concept_sportsteam_oilers

concept_sportsteam_ottawa_senators

concept_sportsteam_spurs

concept_sportsteam_tampa_bay_lightning

concept_country___america

concept_country_republic_of_india

concept_country_canada_canada

concept_sportsequipment_ball

concept_awardtrophytournament_championshipsconcept_sportsequipment_clubs

concept_sportsequipment_hockey_sticks

concept_stadiumoreventvenue_td_banknorth_garden

concept_tool_accessories

concept_stadiumoreventvenue_pete_times_forum

concept_stadiumoreventvenue_staples_center

concept_sportsteamposition_forward

concept_hobby_hobbies

concept_sport_football

concept_sport_baseballconcept_sport_golf

concept_company_national

concept_sport_soccer

concept_country_usaconcept_sport_wrestling

concept_sport_team concept_sport_fitness

concept_sportsteam_packers

concept_hobby_hiking

concept_sport_sports

concept_hobby_fishing

concept_country_england

concept_country_iraq

concept_country_russia

concept_coach_billy_butler

concept_sportsteamposition_quarterback

concept_sportsteam_red_wings

concept_athlete_lidstrom

concept_sportsteam_blue_jackets

concept_sportsteam_montreal_canadiens

concept_sportsteam_anaheim_ducksconcept_sportsteam_columbus_blue_jackets

concept_sport_hockeyconcept_sportsleague_nhl

concept_sport_basketball

concept_sportsteam_leafs

concept_country___america

concept_sport_football

concept_sport_baseball

concept_sport_golf

Figure 13: TeamPlaysSport. The head is concept sportsteam red wings, the query relation is con-cept:teamplayssport, and the tail is concept sport hockey. The left is a full subgraph derived withmax attending from per step=20, and the right is a further pruned subgraph from the left based onattention. The big yellow node represents the head, and the big red node represents the tail. Coloron the rest indicates attention scores over a T -step reasoning process, where grey means less atten-tion, yellow means more attention gained during early steps, and red means gaining more attentionwhen getting closer to the final step.

For the TeamPlaysSport taskQuery : ( concept spor ts team red wings , concept : teamplayssport , concept spor t hockey )

Selected key edges :concept spor ts team red wings , concept : teamplaysagainstteam , concept spor ts team montrea l canadiensconcept spor ts team red wings , concept : teamplaysagainst team inv , concept spor ts team montrea l canadiensconcept spor ts team red wings , concept : teamplaysagainstteam , concep t spor ts team b lue jacke tsconcept spor ts team red wings , concept : teamplaysagainst team inv , concep t spor ts team b lue jacke tsconcept spor ts team red wings , concept : works fo r inv , c o n c e p t a t h l e t e l i d s t r o mconcept spor ts team red wings , concept : o rgan iza t ionh i redperson , c o n c e p t a t h l e t e l i d s t r o mconcept spor ts team red wings , concept : a th l e t ep lays fo r t eam inv , c o n c e p t a t h l e t e l i d s t r o mconcept spor ts team red wings , concept : a th le te ledspo r t s team inv , c o n c e p t a t h l e t e l i d s t r o mconcept spor ts team montreal canadiens , concept : teamplaysagainstteam , concept spor ts team red wingsconcept spor ts team montreal canadiens , concept : teamplaysagainst team inv , concept spor ts team red wingsconcept spor ts team montreal canadiens , concept : teamplaysinleague , concep t spor ts league nh lconcept spor ts team montreal canadiens , concept : teamplaysagainstteam , concep t spor ts team lea fsconcept spor ts team montreal canadiens , concept : teamplaysagainst team inv , concep t spor ts team lea fsconcept spor ts team blue jacke ts , concept : teamplaysagainstteam , concept spor ts team red wingsconcept spor ts team blue jacke ts , concept : teamplaysagainst team inv , concept spor ts team red wingsconcept spor ts team blue jacke ts , concept : teamplaysinleague , concep t spor ts league nh lc o n c e p t a t h l e t e l i d s t r o m , concept : worksfor , concept spor ts team red wingsc o n c e p t a t h l e t e l i d s t r o m , concept : o rgan i za t i onh i redpe rson inv , concept spor ts team red wingsc o n c e p t a t h l e t e l i d s t r o m , concept : a th le tep lays fo r team , concept spor ts team red wingsc o n c e p t a t h l e t e l i d s t r o m , concept : a th le te ledspor ts team , concept spor ts team red wingsconcept spor ts team red wings , concept : teamplaysinleague , concep t spor ts league nh lconcept spor ts team red wings , concept : teamplaysagainstteam , concep t spor ts team lea fsconcept spor ts team red wings , concept : teamplaysagainst team inv , concep t spor ts team lea fsconcept spor ts league nh l , concept : agentcompeteswithagent , concep t spor ts league nh lconcept spor ts league nh l , concept : agentcompeteswithagent inv , concep t spor ts league nh lconcept spor ts league nh l , concept : teamplays in league inv , concep t spor ts team lea fsconcept spor ts team leafs , concept : teamplaysinleague , concep t spor ts league nh lconcept spor ts team leafs , concept : teamplayssport , concept spor t hockey

22

Page 23: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

concept_company_disney

concept_city_burbank

concept_ceo_robert_iger

concept_ceo_jeffrey_katzenberg

concept_city_abc

concept_ceo_michael_eisnerconcept_politicsblog_rights

concept_blog_espn_the_magazine

concept_website_network

concept_academicfield_media

concept_politicsissue_entertainment

concept_publication_espn

concept_company_abc_television_network

concept_personaustralia_jobs

concept_company_club_penguin

concept_televisionnetwork_abc

concept_website_infoseek

concept_company_pixar

concept_person_disney

concept_biotechcompany_the_walt_disney_co_

concept_city_new_york

concept_recordlabel_dreamworks_skg

concept_company_disney_feature_animation

concept_ceo_george_bodenheimerconcept_company_walt_disney

concept_city_emeryville

concept_personeurope_disney

concept_university_search

concept_personus_david_geffen

concept_sportsleague_espn

concept_personus_steven_spielberg

concept_musicartist_toy

concept_person_steven_spielberg

concept_city_lego

concept_personaustralia_david_geffen

concept_transportation_burbank_glendale_pasadena

concept_company_asylum

concept_politicianus_rudy_giuliani

concept_company_walt_disney_world_resort

concept_stateorprovince_illinois

concept_company_walt_disney_world

concept_company_disney

concept_city_burbank

concept_ceo_robert_iger

concept_ceo_jeffrey_katzenberg

concept_city_abc

concept_ceo_michael_eisner

concept_biotechcompany_the_walt_disney_co_

concept_city_new_york

concept_recordlabel_dreamworks_skg

concept_website_network

concept_ceo_george_bodenheimer

concept_transportation_burbank_glendale_pasadena

Figure 14: OrganizationHeadQuarteredInCity. The head is concept company disney, the queryrelation is concept:organizationheadquarteredincity, and the tail is concept city burbank. The left is afull subgraph derived with max attending from per step=20, and the right is a further pruned subgraphfrom the left based on attention. The big yellow node represents the head, and the big red noderepresents the tail. Color on the rest indicates attention scores over a T -step reasoning process,where grey means less attention, yellow means more attention gained during early steps, and redmeans gaining more attention when getting closer to the final step.

For the OrganizationHeadQuarteredInCity taskQuery : ( concept company disney , concept : o rgan i za t i onheadquar te red inc i t y , concep t c i t y burbank )

Selected key edges :concept company disney , concept : headquarteredin , concep t c i t y burbankconcept company disney , concept : subpa r t o f o rgan i za t i on i nv , concept websi te networkconcept company disney , concept : works fo r inv , concep t ceo robe r t i ge rconcept company disney , concept : p ro xy f o r i n v , concep t ceo robe r t i ge rconcept company disney , concept : pe rson leadsorgan iza t ion inv , concep t ceo robe r t i ge rconcept company disney , concept : ceoof inv , concep t ceo robe r t i ge rconcept company disney , concept : pe rson leadsorgan iza t ion inv , concep t ceo je f f r ey ka tzenbe rgconcept company disney , concept : o rgan iza t ionh i redperson , concep t ceo je f f r ey ka tzenbe rgconcept company disney , concept : o rgan iza t ion terminatedperson , concep t ceo je f f r ey ka tzenbe rgconcept c i ty burbank , concept : headquar tered in inv , concept company disneyconcept c i ty burbank , concept : headquar tered in inv , concept b io techcompany the wal t d isney coconcept websi te network , concept : subpar to fo rgan iza t i on , concept company disneyconcep t ceo robe r t i ge r , concept : worksfor , concept company disneyconcep t ceo robe r t i ge r , concept : p roxy for , concept company disneyconcep t ceo robe r t i ge r , concept : person leadsorgan iza t ion , concept company disneyconcep t ceo robe r t i ge r , concept : ceoof , concept company disneyconcep t ceo robe r t i ge r , concept : topmemberoforganizat ion , concept b io techcompany the wal t d isney coconcep t ceo robe r t i ge r , concept : o rgan iza t ion te rm ina tedperson inv , concept b io techcompany the wal t d isney coconcep t ceo je f f rey ka tzenberg , concept : person leadsorgan izat ion , concept company disneyconcep t ceo je f f rey ka tzenberg , concept : o rgan i za t i onh i redpe rson inv , concept company disneyconcep t ceo je f f rey ka tzenberg , concept : o rgan iza t ion te rm ina tedperson inv , concept company disneyconcep t ceo je f f rey ka tzenberg , concept : worksfor , concept record label dreamworks skgconcep t ceo je f f rey ka tzenberg , concept : topmemberoforganizat ion , concept record label dreamworks skgconcep t ceo je f f rey ka tzenberg , concept : o rgan iza t ion te rm ina tedperson inv , concept record label dreamworks skgconcep t ceo je f f rey ka tzenberg , concept : ceoof , concept record label dreamworks skgconcept b io techcompany the wal t d isney co , concept : headquarteredin , concep t c i t y burbankconcept b io techcompany the wal t d isney co , concept : o rgan i za t i onheadquar te red inc i t y , concep t c i t y burbankconcept record label dreamworks skg , concept : works fo r inv , concep t ceo je f f r ey ka tzenbe rgconcept record label dreamworks skg , concept : topmemberoforganizat ion inv , concep t ceo je f f r ey ka tzenbe rgconcept record label dreamworks skg , concept : o rgan iza t ion terminatedperson , concep t ceo je f f r ey ka tzenbe rgconcept record label dreamworks skg , concept : ceoof inv , concep t ceo je f f r ey ka tzenbe rgconcept c i ty burbank , concept : a i r p o r t i n c i t y i n v , concept t ranspor ta t ion burbank g lenda le pasadenaconcept t ranspor ta t ion burbank g lenda le pasadena , concept : a i r p o r t i n c i t y , concep t c i t y burbank

23

Page 24: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

concept_scientist_balmer

concept_university_microsoft

concept_company_microsoft

concept_personus_steve_ballmer

concept_person_robbie_bach

concept_politician_jobs

concept_person_bill

concept_personaustralia_paul_allen

concept_ceo_steve_ballmerconcept_company_gates

concept_buildingfeature_windows

concept_date_bill

concept_website_download

concept_product_powerpoint

concept_charactertrait_vista

concept_product_word_documents

concept_mlalgorithm_microsoft_word

concept_museum_steve

concept_city_outlook

concept_emotion_word

concept_consumerelectronicitem_ms_word

concept_hallwayitem_access

concept_personmexico_ryan_whitney

concept_retailstore_microsoft

concept_company_adobe

concept_beverage_new

concept_sportsteam_harvard_divinity_school

concept_coach_vulcan_inc

concept_sportsteam_state_universityconcept_biotechcompany_microsoft_corp

concept_person_edwards

concept_company_clinton

concept_politicianus_rodham_clinton

concept_geopoliticallocation_kerry

concept_person_mccain

concept_governmentorganization_representativesconcept_politicalparty_senate

concept_politicalparty_college

concept_governmentorganization_house

concept_personus_paul_allen

concept_coach_tim_murphy

concept_software_microsoft_excel

concept_ceo_paul_allen

concept_sportsteam_new_york_jets

concept_company_sunconcept_software_office_2003

concept_company_yahoo001

concept_female_hillary

concept_personafrica_george_bush

concept_stateorprovince_last_year

concept_university_yahoo

concept_geopoliticallocation_world

concept_university_google

concept_company_microsoft_corporation

concept_automobilemaker_jeff_bezos

concept_scientist_balmer

concept_university_microsoft

concept_company_microsoft

concept_personus_steve_ballmer

concept_person_robbie_bachconcept_politician_jobs

concept_person_bill

concept_personmexico_ryan_whitney

concept_retailstore_microsoft

concept_company_adobe

concept_personaustralia_paul_allen

concept_sportsteam_harvard_divinity_school

concept_sportsteam_state_university

Figure 15: WorksFor. The head is concept scientist balmer, the query relation is con-cept:worksfor, and the tail is concept university microsoft. The left is a full subgraph derived withmax attending from per step=20, and the right is a further pruned subgraph from the left based on at-tention. The big yellow node represents the head, and the big red node represents the tail. Color onthe rest indicates attention scores over a T -step reasoning process, where grey means less attention,yellow means more attention gained during early steps, and red means gaining more attention whengetting closer to the final step.

For the WorksFor taskQuery : ( concep t sc ien t i s t ba lmer , concept : worksfor , c o n c e p t u n i v e r s i t y m i c r o s o f t )

Selected key edges :concep t sc ien t i s t ba lmer , concept : topmemberoforganizat ion , concept company microsof tconcep t sc ien t i s t ba lmer , concept : o rgan iza t ion te rm ina tedperson inv , c o n c e p t u n i v e r s i t y m i c r o s o f tconcept company microsoft , concept : topmemberoforganizat ion inv , concept personus steve ba l lmerconcept company microsoft , concept : topmemberoforganizat ion inv , c o n c e p t s c i e n t i s t b a l m e rc o n c e p t u n i v e r s i t y m i c r o s o f t , concept : agentco l labora teswi thagent , concept personus steve ba l lmerc o n c e p t u n i v e r s i t y m i c r o s o f t , concept : pe rson leadsorgan iza t ion inv , concept personus steve ba l lmerc o n c e p t u n i v e r s i t y m i c r o s o f t , concept : pe rson leadsorgan iza t ion inv , c o n c e p t p e r s o n b i l lc o n c e p t u n i v e r s i t y m i c r o s o f t , concept : o rgan iza t ion terminatedperson , c o n c e p t s c i e n t i s t b a l m e rc o n c e p t u n i v e r s i t y m i c r o s o f t , concept : pe rson leadsorgan iza t ion inv , concept person robbie bachconcept personus steve bal lmer , concept : topmemberoforganizat ion , concept company microsof tconcept personus steve bal lmer , concept : agen tco l l abo ra tesw i thagen t inv , c o n c e p t u n i v e r s i t y m i c r o s o f tconcept personus steve bal lmer , concept : person leadsorgan izat ion , c o n c e p t u n i v e r s i t y m i c r o s o f tconcept personus steve bal lmer , concept : worksfor , c o n c e p t u n i v e r s i t y m i c r o s o f tconcept personus steve bal lmer , concept : p roxy for , c o n c e p t r e t a i l s t o r e m i c r o s o f tconcept personus steve bal lmer , concept : subpar to f , c o n c e p t r e t a i l s t o r e m i c r o s o f tconcept personus steve bal lmer , concept : agentcont ro ls , c o n c e p t r e t a i l s t o r e m i c r o s o f tconcep t pe rson b i l l , concept : person leadsorgan izat ion , c o n c e p t u n i v e r s i t y m i c r o s o f tconcep t pe rson b i l l , concept : worksfor , c o n c e p t u n i v e r s i t y m i c r o s o f tconcept person robbie bach , concept : person leadsorgan izat ion , c o n c e p t u n i v e r s i t y m i c r o s o f tconcept person robbie bach , concept : worksfor , c o n c e p t u n i v e r s i t y m i c r o s o f tc o n c e p t r e t a i l s t o r e m i c r o s o f t , concept : p ro xy f o r i n v , concept personus steve ba l lmerc o n c e p t r e t a i l s t o r e m i c r o s o f t , concept : subpa r to f i nv , concept personus steve ba l lmerc o n c e p t r e t a i l s t o r e m i c r o s o f t , concept : agen tcon t ro l s i nv , concept personus steve ba l lmer

For the PersonBornInLocation taskQuery : ( concept person mark001 , concept : personborn in loca t ion , con cep t co un t y yo r k c i t y )

Selected key edges :concept person mark001 , concept : persongraduatedf romunivers i ty , c o n c e p t u n i v e r s i t y c o l l e g econcept person mark001 , concept : persongraduatedschool , c o n c e p t u n i v e r s i t y c o l l e g econcept person mark001 , concept : persongraduatedf romunivers i ty , c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t yconcept person mark001 , concept : persongraduatedschool , c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t yconcept person mark001 , concept : pe rsonborn inc i t y , concept c i t y hampsh i re

24

Page 25: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

concept_person_mark001

concept_university_collegeconcept_university_state_university

concept_person_diane001

concept_male_world

concept_sportsteam_state_university

concept_politicalparty_college

10

concept_university_syracuse_university

concept_stateorprovince_california

concept_university_high_school

concept_sportsgame_series

concept_city_hampshire

concept_stateorprovince_georgia

concept_geopoliticallocation_world

concept_stateorprovince_massachusettsconcept_stateorprovince_maine

concept_stateorprovince_illinoisconcept_jobposition_king

concept_charactertrait_world

concept_county_york_city

concept_person_bill

concept_person_david

concept_person_greg001

concept_person_james001concept_politician_jobs

concept_language_english

concept_person_michael002

concept_person_joe002concept_person_aaron_brooks

concept_journalist_dan

concept_person_john003

concept_person_andrew001

concept_person_kevin

concept_person_jim

concept_person_adam001

concept_academicfield_science

concept_city_york

concept_lake_new

concept_buildingfeature_american

concept_city_new_yorkconcept_building_the_metropolitan

concept_stateorprovince_new_york

concept_building_metropolitan

concept_room_contemporary

concept_river_arts

concept_country_orleans

concept_governmentorganization_federal

concept_geopoliticallocation_state

concept_personeurope_whitney

concept_person_sean002

concept_person_charles001

concept_person_princess

concept_river_state

concept_person_robert003

concept_female_mary

concept_person_prince

concept_sportsleague_newconcept_book_new

concept_country_monaco

concept_country_luxembourg

concept_writer_new

concept_musicinstrument_guitar

concept_country_brazil

concept_country_sweden

concept_geopoliticalorganization_wurttemberg

concept_country_mecklenburg

concept_country_romania

concept_person_mark001

concept_university_college

concept_university_state_university

concept_person_diane001

concept_male_world

concept_sportsteam_state_university

concept_county_york_city

concept_person_bill

concept_university_syracuse_university

concept_person_david

concept_city_york

concept_lake_new

concept_city_hampshire

concept_person_adam001

concept_country_orleans

Figure 16: PersonBornInLocation. The head is concept person mark001, the query relation is con-cept:personborninlocation, and the tail is concept county york city. The left is a full subgraph derivedwith max attending from per step=20, and the right is a further pruned subgraph from the left based onattention. The big yellow node represents the head, and the big red node represents the tail. Color onthe rest indicates attention scores over a T -step reasoning process, where grey means less attention,yellow means more attention gained during early steps, and red means gaining more attention whengetting closer to the final step.

concept person mark001 , concept : hasspouse , concept person diane001concept person mark001 , concept : hasspouse inv , concept person diane001c o n c e p t u n i v e r s i t y c o l l e g e , concept : persongradua ted f romun ivers i t y inv , concept person mark001c o n c e p t u n i v e r s i t y c o l l e g e , concept : persongraduatedschool inv , concept person mark001c o n c e p t u n i v e r s i t y c o l l e g e , concept : persongradua ted f romun ivers i t y inv , c o n c e p t p e r s o n b i l lc o n c e p t u n i v e r s i t y c o l l e g e , concept : persongraduatedschool inv , c o n c e p t p e r s o n b i l lc o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t y , concept : persongradua ted f romun ivers i t y inv , concept person mark001c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t y , concept : persongraduatedschool inv , concept person mark001c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t y , concept : persongradua ted f romun ivers i t y inv , c o n c e p t p e r s o n b i l lc o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t y , concept : persongraduatedschool inv , c o n c e p t p e r s o n b i l lconcept c i ty hampsh i re , concept : pe rsonbo rn i nc i t y i nv , concept person mark001concept person diane001 , concept : persongraduatedf romunivers i ty , c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t yconcept person diane001 , concept : persongraduatedschool , c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t yconcept person diane001 , concept : hasspouse , concept person mark001concept person diane001 , concept : hasspouse inv , concept person mark001concept person diane001 , concept : personborn in loca t ion , c onc ep t co un t y yo r k c i t yc o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t y , concept : persongradua ted f romun ivers i t y inv , concept person diane001c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t y , concept : persongraduatedschool inv , concept person diane001concep t pe rson b i l l , concept : pe rsonborn inc i t y , c o n c e p t c i t y y o r kconcep t pe rson b i l l , concept : personborn in loca t ion , c o n c e p t c i t y y o r kconcep t pe rson b i l l , concept : persongraduatedf romunivers i ty , c o n c e p t u n i v e r s i t y c o l l e g econcep t pe rson b i l l , concept : persongraduatedschool , c o n c e p t u n i v e r s i t y c o l l e g econcep t pe rson b i l l , concept : persongraduatedf romunivers i ty , c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t yconcep t pe rson b i l l , concept : persongraduatedschool , c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t yconcep t c i t y yo rk , concept : pe r sonbo rn inc i t y i nv , c o n c e p t p e r s o n b i l lconcep t c i t y yo rk , concept : pe r sonbo rn inc i t y i nv , concept person diane001c o n c e p t u n i v e r s i t y c o l l e g e , concept : persongradua ted f romun ivers i t y inv , concept person diane001concept person diane001 , concept : pe rsonborn inc i t y , c o n c e p t c i t y y o r k

For the PersonLeadsOrganization taskQuery : ( c o n c e p t j o u r n a l i s t b i l l p l a n t e , concept : person leadsorgan izat ion , concept company cnn pbs )

Selected key edges :c o n c e p t j o u r n a l i s t b i l l p l a n t e , concept : worksfor , concep t te lev i s i onne twork cbsc o n c e p t j o u r n a l i s t b i l l p l a n t e , concept : agen tco l l abo ra tesw i thagen t inv , concep t te lev i s i onne twork cbsconcep t te lev is ionne twork cbs , concept : works fo r inv , c o n c e p t j o u r n a l i s t w a l t e r c r o n k i t e

25

Page 26: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

concept_journalist_bill_plante

concept_televisionnetwork_cbs

concept_person_edward_r__murrow

concept_personus_scott_pelley

concept_actor_daniel_schorr

concept_journalist_wyatt_andrews

concept_athlete_anthony_mason

concept_journalist_connie_chung

concept_journalist_andy_rooney

concept_journalist_ed_bradley

concept_writer_bill_geist

concept_person_byron_pitts

concept_personaustralia_harry_smith

concept_journalist_morley_safer

concept_journalist_gloria_borger

concept_writer_bernard_goldberg

concept_person_eric_sevareid

concept_journalist_lesley_stahl

concept_personmexico_david_martin

concept_journalist_walter_cronkite

concept_personaustralia_phil_jones concept_company_cnn__pbs

concept_nonprofitorganization_cbs_evening

concept_musicartist_television

concept_crustacean_tv

concept_website_cbs_evening_news

concept_city_new_york

concept_personeurope_william_paley

concept_coach_douglas_edwards

concept_academicfield_media

concept_person_sean_mcmanus

concept_televisionstation_wcscconcept_comedian_leslie_moonves

concept_person_nina_tasslerconcept_televisionstation_kion_tv

concept_televisionstation_wowk_tv

concept_televisionstation_kbci_tv

concept_televisionstation_wjtv

concept_politicsissue_entertainment

concept_person_les_moonves

concept_televisionstation_kfda

concept_televisionstation_wdtv

concept_televisionstation_khqa_tv

concept_televisionstation_wrbl

concept_televisionstation_wltx_tv

concept_website_cbs

concept_company_cbs_corp_

concept_recordlabel_universal

concept_blog_mtv

concept_journalist_bill_plante

concept_televisionnetwork_cbs

concept_person_edward_r__murrow

concept_personus_scott_pelley

concept_actor_daniel_schorr

concept_journalist_wyatt_andrews

concept_athlete_anthony_mason

concept_company_cnn__pbs

concept_nonprofitorganization_cbs_evening

concept_musicartist_television

concept_crustacean_tv

concept_journalist_walter_cronkite

concept_city_new_york

concept_personeurope_william_paley

concept_coach_douglas_edwards

concept_academicfield_media

concept_website_cbs

concept_televisionstation_wcsc

Figure 17: PersonLeadsOrganization. The head is concept journalist bill plante, the query relationis concept:organizationheadquarteredincity, and the tail is concept company cnn pbs. The left is a fullsubgraph derived with max attending from per step=20, and the right is a further pruned subgraphfrom the left based on attention. The big yellow node represents the head, and the big red noderepresents the tail. Color on the rest indicates attention scores over a T -step reasoning process,where grey means less attention, yellow means more attention gained during early steps, and redmeans gaining more attention when getting closer to the final step.

concep t te lev is ionne twork cbs , concept : agentco l labora teswi thagent , c o n c e p t j o u r n a l i s t w a l t e r c r o n k i t econcep t te lev is ionne twork cbs , concept : works fo r inv , concep t pe rsonus sco t t pe l l eyconcep t te lev is ionne twork cbs , concept : works fo r inv , c o n c e p t a c t o r d a n i e l s c h o r rconcep t te lev is ionne twork cbs , concept : works fo r inv , concept person edward r murrowconcep t te lev is ionne twork cbs , concept : agentco l labora teswi thagent , concept person edward r murrowconcep t te lev is ionne twork cbs , concept : works fo r inv , c o n c e p t j o u r n a l i s t b i l l p l a n t econcep t te lev is ionne twork cbs , concept : agentco l labora teswi thagent , c o n c e p t j o u r n a l i s t b i l l p l a n t ec o n c e p t j o u r n a l i s t w a l t e r c r o n k i t e , concept : worksfor , concep t te lev i s i onne twork cbsc o n c e p t j o u r n a l i s t w a l t e r c r o n k i t e , concept : agen tco l l abo ra tesw i thagen t inv , concep t te lev i s i onne twork cbsc o n c e p t j o u r n a l i s t w a l t e r c r o n k i t e , concept : worksfor , concep t nonp ro f i t o rgan i za t i on cbs even ingconcep t pe rsonus sco t t pe l l ey , concept : worksfor , concep t te lev i s i onne twork cbsconcep t pe rsonus sco t t pe l l ey , concept : person leadsorgan izat ion , concep t te lev i s i onne twork cbsconcep t pe rsonus sco t t pe l l ey , concept : person leadsorgan izat ion , concept company cnn pbsconcep t ac to r dan ie l scho r r , concept : worksfor , concep t te lev i s i onne twork cbsconcep t ac to r dan ie l scho r r , concept : person leadsorgan izat ion , concep t te lev i s ionne twork cbsconcep t ac to r dan ie l scho r r , concept : person leadsorgan izat ion , concept company cnn pbsconcept person edward r murrow , concept : worksfor , concep t te lev i s i onne twork cbsconcept person edward r murrow , concept : agen tco l l abo ra tesw i thagen t inv , concep t te lev i s i onne twork cbsconcept person edward r murrow , concept : person leadsorgan izat ion , concep t te lev i s i onne twork cbsconcept person edward r murrow , concept : person leadsorgan izat ion , concept company cnn pbsconcep t te lev is ionne twork cbs , concept : o rgan i za t i onheadquar te red inc i t y , concep t c i t y new yorkconcep t te lev is ionne twork cbs , concept : headquarteredin , concep t c i t y new yorkconcep t te lev is ionne twork cbs , concept : agentco l labora teswi thagent , concept personeurope wi l l i am pa leyconcep t te lev is ionne twork cbs , concept : topmemberoforganizat ion inv , concept personeurope wi l l i am pa leyconcept company cnn pbs , concept : headquarteredin , concep t c i t y new yorkconcept company cnn pbs , concept : personbe longs toorgan iza t ion inv , concept personeurope wi l l i am pa leyconcep t nonpro f i t o rgan iza t i on cbs even ing , concept : works fo r inv , c o n c e p t j o u r n a l i s t w a l t e r c r o n k i t econcept c i t y new york , concept : o rgan i za t i onheadqua r t e red i nc i t y i nv , concep t te lev i s i onne twork cbsconcept c i t y new york , concept : headquar tered in inv , concep t te lev i s i onne twork cbsconcept c i t y new york , concept : headquar tered in inv , concept company cnn pbsconcept personeurope wi l l iam paley , concept : agen tco l l abo ra tesw i thagen t inv , concep t te lev i s i onne twork cbsconcept personeurope wi l l iam paley , concept : topmemberoforganizat ion , concep t te lev i s i onne twork cbsconcept personeurope wi l l iam paley , concept : personbe longstoorgan izat ion , concept company cnn pbsconcept personeurope wi l l iam paley , concept : person leadsorgan izat ion , concept company cnn pbs

For the OrganizationHiredPerson task

26

Page 27: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

concept_stateorprovince_afternoon

concept_dateliteral_n2007

concept_dateliteral_n2006concept_date_n2003

concept_dateliteral_n2002

concept_dateliteral_n2008

concept_dateliteral_n2005concept_date_n2004concept_date_n2001

concept_date_n1996

concept_date_n1999concept_year_n1991

concept_year_n1998concept_date_n2000

concept_book_new

concept_lake_new

concept_university_national

concept_city_home

concept_country_united_states

concept_city_service

concept_governmentorganization_law

concept_governmentorganization_program

concept_book_morning

concept_programminglanguage_project

concept_academicfield_directors

concept_musicsong_nightconcept_website_tour

concept_country_israel

concept_musicsong_end

concept_book_years

concept_country_left_parties

concept_website_visit

concept_governmentorganization_fire

concept_academicfield_trial

concept_website_trip

concept_governmentorganization_launch

concept_company_case

concept_personmexico_ryan_whitney

concept_year_n1997

concept_year_n1992

concept_year_n1994concept_year_n1995

concept_date_n1993

concept_date_n2009

concept_year_n1986concept_date_n1968

concept_year_n1989

concept_year_n1975

concept_year_n1982

concept_director_committee

concept_dayofweek_wednesday

concept_date_n1944

concept_dateliteral_n1990

concept_year_n1978

concept_year_n1967concept_dateliteral_n1987

concept_tradeunion_congress

concept_governmentorganization_house

concept_governmentorganization_epa

concept_governmentorganization_commission

concept_nongovorganization_council

concept_politicsblog_white_house

concept_country_party

concept_geopoliticallocation_iraqconcept_county_records

concept_city_capital

concept_city_team

concept_visualizablething_use

concept_biotechcompany_china

concept_governmentorganization_action

concept_governmentorganization_representatives

concept_city_members

concept_personus_party

concept_person_state

concept_buildingfeature_window

concept_politicianus_president_george_w__bush

concept_person_president

concept_person_mugabe

concept_personasia_number

concept_buildingfeature_windows

concept_astronaut_herbert_hoover

concept_stateorprovince_afternoon

concept_dateliteral_n2007concept_dateliteral_n2006

concept_date_n2003

concept_dateliteral_n2002

concept_dateliteral_n2008

concept_city_home

concept_country_united_states

concept_city_service

concept_governmentorganization_lawconcept_governmentorganization_program

concept_personmexico_ryan_whitneyconcept_year_n1997

concept_year_n1992

concept_year_n1994

concept_tradeunion_congressconcept_governmentorganization_house

concept_governmentorganization_epa

concept_governmentorganization_commission

concept_personus_party

concept_country_left_parties

concept_person_state

Figure 18: OrganizationHiredPerson. The head is concept stateorprovince afternoon, the query rela-tion is concept:organizationhiredperson, and the tail is concept personmexico ryan whitney. The left is afull subgraph derived with max attending from per step=20, and the right is a further pruned subgraphfrom the left based on attention. The big yellow node represents the head, and the big red node rep-resents the tail. Color on the rest indicates attention scores over a T -step reasoning process, wheregrey means less attention, yellow means more attention gained during early steps, and red meansgaining more attention when getting closer to the final step.

Query : ( concep t s ta teo rp rov ince a f te rnoon , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tney )

Selected key edges :concep t s ta teo rp rov ince a f te rnoon , concept : atdate , c o n c e p t d a t e l i t e r a l n 2 0 0 7concep t s ta teo rp rov ince a f te rnoon , concept : atdate , concept date n2003concep t s ta teo rp rov ince a f te rnoon , concept : atdate , c o n c e p t d a t e l i t e r a l n 2 0 0 6concep t da te l i t e ra l n2007 , concept : a tda te inv , concep t coun t r y un i t ed s ta tesconcep t da te l i t e ra l n2007 , concept : a tda te inv , concept c i ty homeconcep t da te l i t e ra l n2007 , concept : a tda te inv , c o n c e p t c i t y s e r v i c econcep t da te l i t e ra l n2007 , concept : a tda te inv , c o n c e p t c o u n t r y l e f t p a r t i e sconcept date n2003 , concept : a tda te inv , concep t coun t r y un i t ed s ta tesconcept date n2003 , concept : a tda te inv , concept c i ty homeconcept date n2003 , concept : a tda te inv , c o n c e p t c i t y s e r v i c econcept date n2003 , concept : a tda te inv , c o n c e p t c o u n t r y l e f t p a r t i e sconcep t da te l i t e ra l n2006 , concept : a tda te inv , concep t coun t r y un i t ed s ta tesconcep t da te l i t e ra l n2006 , concept : a tda te inv , concept c i ty homeconcep t da te l i t e ra l n2006 , concept : a tda te inv , c o n c e p t c i t y s e r v i c econcep t da te l i t e ra l n2006 , concept : a tda te inv , c o n c e p t c o u n t r y l e f t p a r t i e sconcep t coun t r y un i t ed s ta tes , concept : atdate , concept year n1992concep t coun t r y un i t ed s ta tes , concept : atdate , concept year n1997concep t coun t r y un i t ed s ta tes , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyconcept c i ty home , concept : atdate , concept year n1992concept c i ty home , concept : atdate , concept year n1997concept c i ty home , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyc o n c e p t c i t y s e r v i c e , concept : atdate , concept year n1992c o n c e p t c i t y s e r v i c e , concept : atdate , concept year n1997c o n c e p t c i t y s e r v i c e , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyc o n c e p t c o u n t r y l e f t p a r t i e s , concept : works fo r inv , concept personmexico ryan whi tneyc o n c e p t c o u n t r y l e f t p a r t i e s , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyconcept year n1992 , concept : a tda te inv , concept governmentorganizat ion houseconcept year n1992 , concept : a tda te inv , concep t coun t r y un i t ed s ta tesconcept year n1992 , concept : a tda te inv , concept c i ty homeconcept year n1992 , concept : a tda te inv , concept t radeunion congressconcept year n1997 , concept : a tda te inv , concept governmentorganizat ion houseconcept year n1997 , concept : a tda te inv , concep t coun t r y un i t ed s ta tesconcept year n1997 , concept : a tda te inv , concept c i ty homeconcept personmexico ryan whi tney , concept : worksfor , concept governmentorganizat ion house

27

Page 28: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

concept personmexico ryan whi tney , concept : worksfor , concept t radeunion congressconcept personmexico ryan whi tney , concept : worksfor , c o n c e p t c o u n t r y l e f t p a r t i e sconcept governmentorganizat ion house , concept : personbe longs toorgan iza t ion inv , concept personus par tyconcept governmentorganizat ion house , concept : works fo r inv , concept personmexico ryan whi tneyconcept governmentorganizat ion house , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyconcept t radeunion congress , concept : o rgan iza t ionh i redperson , concept personus par tyconcept t radeunion congress , concept : works fo r inv , concept personmexico ryan whi tneyconcept t radeunion congress , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyc o n c e p t c o u n t r y l e f t p a r t i e s , concept : o rgan iza t ionh i redperson , concept personus par ty

concept_person_mark001

concept_sportsteam_state_university

concept_male_world

concept_politicalparty_college

concept_sportsgame_series

concept_university_state_university

concept_charactertrait_world

concept_university_college

concept_person_diane001

concept_university_high_school

concept_televisionshow_passionconcept_musicsong_gospel

10

concept_university_syracuse_university

concept_city_hampshireconcept_person_louise

concept_jobposition_king

concept_county_york_city

concept_stateorprovince_california

concept_stateorprovince_maine

concept_stateorprovince_georgia

concept_person_greg001concept_person_michael002

concept_stateorprovince_author

concept_person_bill

concept_person_kevin

concept_person_john003

concept_politician_jobs

concept_person_mike

concept_person_stephen

concept_person_fred

concept_person_karen

concept_person_william001

concept_person_tom001

concept_journalist_dan

concept_person_joe002

concept_person_adam001

concept_politician_james

concept_eventoutcome_result

concept_geopoliticallocation_world

concept_recordlabel_friends

concept_politicalparty_house

concept_company_apple001

concept_televisionnetwork_cbs

concept_company_apple002

3

concept_automobilemaker_announcement

concept_personmexico_ryan_whitney

concept_museum_steve

concept_city_downtown_manhattanconcept_attraction_nycconcept_city_nyc_

concept_personasia_number

concept_scientist_no_

concept_country_united_statesconcept_geopoliticallocation_agencies

concept_company_nbc

concept_city_team

concept_terroristorganization_state

concept_automobilemaker_jeff_bezos

concept_ceo_richard

concept_lake_new

concept_city_service

concept_governmentorganization_u_s__department

concept_person_mark001

concept_sportsteam_state_university

concept_male_world

concept_politicalparty_college

concept_sportsgame_series concept_university_state_university

concept_person_greg001

concept_person_michael002

concept_stateorprovince_author

concept_geopoliticallocation_world

concept_recordlabel_friends

concept_personmexico_ryan_whitney

concept_politician_jobs

concept_eventoutcome_result

concept_museum_steve

Figure 19: AgentBelongsToOrganization. The head is concept person mark001, the query relationis concept:agentbelongstoorganization, and the tail is concept geopoliticallocation world. The left is afull subgraph derived with max attending from per step=20, and the right is a further pruned subgraphfrom the left based on attention. The big yellow node represents the head, and the big red noderepresents the tail. Color on the rest indicates attention scores over a T -step reasoning process,where grey means less attention, yellow means more attention gained during early steps, and redmeans gaining more attention when getting closer to the final step.

For the AgentBelongsToOrganization taskQuery : ( concept person mark001 , concept : agentbe longstoorgan iza t ion , c o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l d )

Selected key edges :concept person mark001 , concept : personbe longstoorgan izat ion , c o n c e p t s p o r t s t e a m s t a t e u n i v e r s i t yconcept person mark001 , concept : agentco l labora teswi thagent , concept male wor ldconcept person mark001 , concept : agen tco l l abo ra tesw i thagen t inv , concept male wor ldconcept person mark001 , concept : personbe longstoorgan izat ion , c o n c e p t p o l i t i c a l p a r t y c o l l e g econcep t spo r t s t eam s ta te un i ve rs i t y , concept : personbe longs toorgan iza t ion inv , c o n c e p t p o l i t i c i a n j o b sconcep t spo r t s t eam s ta te un i ve rs i t y , concept : personbe longs toorgan iza t ion inv , concept person mark001concep t spo r t s t eam s ta te un i ve rs i t y , concept : personbe longs toorgan iza t ion inv , concept person greg001concep t spo r t s t eam s ta te un i ve rs i t y , concept : personbe longs toorgan iza t ion inv , concept person michael002concept male world , concept : agentco l labora teswi thagent , c o n c e p t p o l i t i c i a n j o b sconcept male world , concept : agen tco l l abo ra tesw i thagen t inv , c o n c e p t p o l i t i c i a n j o b sconcept male world , concept : agentco l labora teswi thagent , concept person mark001concept male world , concept : agen tco l l abo ra tesw i thagen t inv , concept person mark001concept male world , concept : agentco l labora teswi thagent , concept person greg001concept male world , concept : agen tco l l abo ra tesw i thagen t inv , concept person greg001concept male world , concept : agentcont ro ls , concept person greg001concept male world , concept : agentco l labora teswi thagent , concept person michael002concept male world , concept : agen tco l l abo ra tesw i thagen t inv , concept person michael002c o n c e p t p o l i t i c a l p a r t y c o l l e g e , concept : personbe longs toorgan iza t ion inv , concept person mark001c o n c e p t p o l i t i c a l p a r t y c o l l e g e , concept : personbe longs toorgan iza t ion inv , concept person greg001c o n c e p t p o l i t i c a l p a r t y c o l l e g e , concept : personbe longs toorgan iza t ion inv , concept person michael002c o n c e p t p o l i t i c i a n j o b s , concept : personbe longstoorgan izat ion , c o n c e p t s p o r t s t e a m s t a t e u n i v e r s i t yc o n c e p t p o l i t i c i a n j o b s , concept : agentco l labora teswi thagent , concept male wor ld

28

Page 29: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

c o n c e p t p o l i t i c i a n j o b s , concept : agen tco l l abo ra tesw i thagen t inv , concept male wor ldc o n c e p t p o l i t i c i a n j o b s , concept : worksfor , c o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l dconcept person greg001 , concept : personbe longstoorgan izat ion , c o n c e p t s p o r t s t e a m s t a t e u n i v e r s i t yconcept person greg001 , concept : agentco l labora teswi thagent , concept male wor ldconcept person greg001 , concept : agen tco l l abo ra tesw i thagen t inv , concept male wor ldconcept person greg001 , concept : agen tcon t ro l s i nv , concept male wor ldconcept person greg001 , concept : agentbe longstoorgan iza t ion , c o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l dconcept person greg001 , concept : personbe longstoorgan izat ion , c o n c e p t p o l i t i c a l p a r t y c o l l e g econcept person greg001 , concept : agentbe longstoorgan iza t ion , c o n c e p t r e c o r d l a b e l f r i e n d sconcept person michael002 , concept : personbe longstoorgan izat ion , c o n c e p t s p o r t s t e a m s t a t e u n i v e r s i t yconcept person michael002 , concept : agentco l labora teswi thagent , concept male wor ldconcept person michael002 , concept : agen tco l l abo ra tesw i thagen t inv , concept male wor ldconcept person michael002 , concept : agentbe longstoorgan iza t ion , c o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l dconcept person michael002 , concept : personbe longstoorgan izat ion , c o n c e p t p o l i t i c a l p a r t y c o l l e g ec o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l d , concept : works fo r inv , concept personmexico ryan whi tneyc o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l d , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyc o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l d , concept : works fo r inv , c o n c e p t p o l i t i c i a n j o b sconcep t reco rd l abe l f r i ends , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyconcept personmexico ryan whi tney , concept : worksfor , c o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l dconcept personmexico ryan whi tney , concept : o rgan i za t i onh i redpe rson inv , c o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l dconcept personmexico ryan whi tney , concept : o rgan i za t i onh i redpe rson inv , c o n c e p t r e c o r d l a b e l f r i e n d s

concept_sportsteam_mavericks

concept_sport_basketball

concept_sportsteam_boston_celtics

concept_sportsteam_spurs

concept_sportsteam_suns

concept_sportsteam_esu_hornets

concept_sportsteam_rockets

concept_sportsteam_knicks

concept_sportsteam_memphis_grizzlies

concept_sportsteam_kings_college

concept_sportsteam_san_antonio

concept_sportsgame_series

concept_sportsteam_chicago_bulls

concept_sportsteam_golden_state_warriors

concept_sportsteam_utah_jazz

concept_sportsgame_championship

concept_sportsteam_la_clippers

concept_convention_games

concept_athlete_josh_howard

concept_awardtrophytournament_nba_finals

concept_sportsteam_marshall_university

concept_sportsleague_nba

concept_sportsteam_college

concept_sportsteam_pacers

concept_athlete_o__j__simpson

concept_sportsteam_devils

concept_athlete_dikembe_mutombo

concept_sportsleague_nascar

concept_sportsteam_pistons

concept_sportsteam_los_angeles_lakers

concept_athlete_shane_battier

concept_sportsteam_hawks

concept_sportsteam_washington_wizards

concept_sportsteam_michigan_state_university

concept_sportsteam_new_jersey_nets

concept_sportsteam_astros

concept_sportsteam_trail_blazers

concept_athlete_cuttino_mobley

concept_sportsteam_george_mason_university

concept_athlete_colin_long

concept_sportsleague_international

concept_city_huntington

concept_sportsleague_nhl

concept_sportsleague_mlb

concept_stadiumoreventvenue_toyota_center

concept_sportsteam_mavericks

concept_sport_basketball

concept_sportsteam_boston_celtics

concept_sportsteam_spurs

concept_sportsteam_suns

concept_sportsteam_esu_hornets

concept_sportsteam_marshall_university

concept_sportsleague_nba

concept_sportsteam_college

concept_sportsteam_pacers

concept_athlete_o__j__simpson

concept_sportsleague_international

concept_city_huntington

concept_sportsleague_nhl

concept_sportsleague_nascar

Figure 20: TeamPlaysInLeague. The head is concept sportsteam mavericks, the query relation isconcept:teamplaysinleague, and the tail is concept sportsleague nba. The left is a full subgraph derivedwith max attending from per step=20, and the right is a further pruned subgraph from the left based onattention. The big yellow node represents the head, and the big red node represents the tail. Color onthe rest indicates attention scores over a T -step reasoning process, where grey means less attention,yellow means more attention gained during early steps, and red means gaining more attention whengetting closer to the final step.

For the TeamPlaysInLeague taskQuery : ( concept sportsteam maver icks , concept : teamplaysinleague , concept spor ts league nba )

Selected key edges :concept sportsteam maver icks , concept : teamplayssport , concep t spo r t baske tba l lconcept sportsteam maver icks , concept : teamplaysagainstteam , concep t spo r t s team bos ton ce l t i csconcept sportsteam maver icks , concept : teamplaysagainst team inv , concep t spo r t s team bos ton ce l t i csconcept sportsteam maver icks , concept : teamplaysagainstteam , concept spor ts team spursconcept sportsteam maver icks , concept : teamplaysagainst team inv , concept spor ts team spursconcep t spo r t baske tba l l , concept : teamplaysspor t inv , concept spor ts team co l legeconcep t spo r t baske tba l l , concept : teamplaysspor t inv , concep t spo r t s team marsha l l un i ve rs i t yconcep t spor t s team bos ton ce l t i cs , concept : teamplaysinleague , concept spor ts league nbaconcept spor ts team spurs , concept : teamplaysinleague , concept spor ts league nbaconcept spor ts league nba , concept : agentcompeteswithagent , concept spor ts league nba

29

Page 30: DYNAMICALLY PRUNED MESSAGE PASSING NET WORKS FOR …

Published as a conference paper at ICLR 2020

concept spor ts league nba , concept : agentcompeteswithagent inv , concept spor ts league nbaconcept spor ts team col lege , concept : teamplaysinleague , c o n c e p t s p o r t s l e a g u e i n t e r n a t i o n a l

30