Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Published as a conference paper at ICLR 2020
DYNAMICALLY PRUNED MESSAGE PASSING NET-WORKS FOR LARGE-SCALE KNOWLEDGE GRAPHREASONING
Xiaoran Xu1, Wei Feng1, Yunsheng Jiang1, Xiaohui Xie1, Zhiqing Sun2, Zhi-Hong Deng3
1Hulu, {xiaoran.xu, wei.feng, yunsheng.jiang, xiaohui.xie}@hulu.com2Carnegie Mellon University, [email protected] University, [email protected]
ABSTRACT
We propose Dynamically Pruned Message Passing Networks (DPMPN) for large-scale knowledge graph reasoning. In contrast to existing models, embedding-based or path-based, we learn an input-dependent subgraph to explicitly modelreasoning process. Subgraphs are dynamically constructed and expanded by ap-plying graphical attention mechanism conditioned on input queries. In this way,we not only construct graph-structured explanations but also enable message pass-ing designed in Graph Neural Networks (GNNs) to scale with graph sizes. Wetake the inspiration from the consciousness prior proposed by Bengio (2017) anddevelop a two-GNN framework to encode input-agnostic full structure represen-tation and learn input-dependent local one coordinated by an attention module.Experiments show the reasoning capability of our model to provide clear graph-ical explanations as well as predict results accurately, outperforming most state-of-the-art methods in knowledge base completion tasks.
1 INTRODUCTION
Modern deep learning systems should bring in explicit reasoning modeling to complement theirblack-box models, where reasoning takes a step-by-step form about organizing facts to yield newknowledge and finally draw a conclusion. Particularly, we rely on graph-structured representationto model reasoning by manipulating nodes and edges where semantic entities or relations can beexplicitly represented (Battaglia et al., 2018). Here, we choose knowledge graph scenarios to studyreasoning where semantics have been defined on nodes and edges. For example, in knowledge basecompletion tasks, each edge is represented by a triple 〈head, rel, tail〉 that contains two entities andtheir relation. The goal is to predict which entity might be a tail given query 〈head, rel, ?〉.Existing models can be categorized into embedding-based and path-based model families. Theembedding-based (Bordes et al., 2013; Sun et al., 2018; Lacroix et al., 2018) often achieves a highscore by fitting data using various neural network techniques but lacks interpretability. The path-based (Xiong et al., 2017; Das et al., 2018; Shen et al., 2018; Wang, 2018) attempts to construct anexplanatory path to model an iterative decision-making process using reinforcement learning andrecurrent networks. A question is: can we construct structured explanations other than a path tobetter explain reasoning in graph context. To this end, we propose to learn a dynamically inducedsubgraph which starts with a head node and ends with a predicted tail node as shown in Figure 1.
Graph reasoning can be powered by Graph Neural Networks. Graph reasoning needs to learnabout entities, relations, and their composing rules to manipulate structured knowledge and producestructured explanations. Graph Neural Networks (GNNs) provide such structured computation andalso inherit powerful data-fitting capacity from deep neural networks (Scarselli et al., 2009; Battagliaet al., 2018). Specifically, GNNs follow a neighborhood aggregation scheme to recursively aggregateinformation from neighbors to update node states. After T iterations, each node can carry structureinformation from its T -hop neighborhood (Gilmer et al., 2017; Xu et al., 2018a).
GNNs need graphical attention expression to interpret. Neighborhood attention operation is apopular way to implement attention mechanism on graphs (Velickovic et al., 2018; Hoshen, 2017) by
1
Published as a conference paper at ICLR 2020
(a) The AthletePlaysForTeam task. (b) The OrganizationHiredPerson task.
Figure 1: Subgraph visualization on two examples from NELL995’s test data. Each task has tenthousands of nodes and edges. The big yellow represents a given head and the big red represents apredicted tail. Color indicates attention gained along T -step reasoning. Yellow means more attentionduring early steps while red means more attention at the end. Grey means less attention.
focusing on specific interactions with neighbors. Here, we propose a new graphical attention mech-anism not only for computation but also for interpretation. We present three considerations whenconstructing attention-induced subgraphs: (1) given a subgraph, we first attend within it to select afew nodes and then attend over those nodes’ neighborhood for next expansion; (2) we propagate at-tention across steps to capture long-term dependency; (3) our attention mechanism models reasoningprocess explicitly through pipeline disentangled from underlying representation computing.
GNNs need input-dependent pruning to scale. GNNs are notorious for their poor scalability.Consider one message passing iteration on a graph with |V | nodes and |E| edges. Even if the graphis sparse, the complexity of O(|E|) is still problematic on large graphs with millions of nodes andedges. Besides, mini-batch based training with batch size B and high dimensions D would lead toO(BD|E|) making things worse. However, we can avoid this situation by learning input-dependentpruning to run computation on dynamical graphs, as an input query often triggers a small fractionof the entire graph so that it is wasteful to perform computation over the full graph for each input.
Cognitive intuition of the consciousness prior. Bengio (2017) brought the notion of attentiveawareness from cognitive science into deep learning in his consciousness prior proposal. He pointedout a process of disentangling high-level factors from full underlying representation to form a low-dimensional combination through attention mechanism. He proposed to use two recurrent neuralnetworks (RNNs) to encode two types of state: unconscious state represented by a high-dimensionalvector before attention and conscious state by a derived low-dimensional vector after attention.
We use two GNNs instead to encode such states on nodes. We construct input-dependent subgraphsto run message passing efficiently, and also run full message passing over the entire graph to acquirefeatures beyond a local view constrained by subgraphs. We apply attention mechanism betweenthe two GNNs, where the bottom runs before attention, called Inattentive GNN (IGNN), and theabove runs on each attention-induced subgraph, called Attentive GNN (AGNN). IGNN providesrepresentation computed on the full graph for AGNN. AGNN reinforces representation within acohesive group of nodes to produce sharp semantics. Experimental results show that our modelattains very competitive scores on HITS@1,3 and the mean reciprocal rank (MRR) compared to thebest embedding-based method so far. More importantly, we provide explanations while they do not.
2 ADDRESSING THE SCALE-UP PROBLEM
Notation. We denote training data by {(xi, yi)}Ni=1. We denote a full graph by G = 〈V, E〉with relations R and an input-dependent subgraph by G(x) = 〈VG(x), EG(x)〉 which is an in-duced subgraph of G. We denote boundary of a graph by ∂G where V∂G = N(VG) − VG andN(VG) means neighbors of nodes in VG. We also denote high-order boundaries such as ∂2G whereV∂2G = N(N(VG)) ∪ N(VG) − VG. Trainable parameters include node embeddings {ev}v∈V ,
2
Published as a conference paper at ICLR 2020
relation embeddings {er}r∈R, and weights used in two GNNs and an attention module. When per-forming full or pruned message passing, node and relation embeddings will be indexed according tothe operated graph, denoted by θG or θG(x). For IGNN, we use Ht of size |V| ×D to denote nodehidden states at step t; for AGNN, we use Ht(x) of size |VG(x)| × D to denote. The objective iswritten as
∑Ni=1 l(xi, yi;θG(xi),θG), where G(xi) is dynamically constructed.
The scale-up problem in GNNs. First, we write the full message passing in IGNN as
Ht = fIGNN(Ht−1;θG), (1)
where fIGNN represents all involved operations in one message passing iteration over G, including:(1) computing messages along each edge with the complexity1 of O(BD|E|), (2) aggregating mes-sages received at each node with O(BD|E|), and (3) updating node states with O(BD|V|). ForT -step propagation, the per-batch complexity is O(BDT (|E|+ |V|)). Considering that backpropa-gation requires intermediate computation results to be saved during one pass, this complexity countsfor both time and space. However, since IGNN is input-agnostic, node representations can be sharedacross inputs in one batch so that we can remove B to get O(DT (|E| + |V|)). If we use a samplededge set E from E such that |E | ≈ k|V|, the complexity can be further reduced to O(DT |V|).
The pruned message passing in AGNN can be written as
Ht(x) = fAGNN(Ht−1(x),Ht;θG(x)). (2)
Its complexity can be computed similarly as above. However, we cannot remove B. Fortunately,subgraph G(x) is not G. If we let x be a node v, G(x) grows from a single node, i.e., G0(x) = {v},and expands itself each step, leading to a sequence of (G0(x), G1(x), . . . , GT (x)). Here, we de-scribe the expansion behavior as consecutive expansion, which means no jumping across neighbor-hood allowed, so that we can ensure that
Gt(x) ⊆ Gt−1(x) ∪ ∂Gt−1(x) ⊆ Gt−2(x) ∪ ∂2Gt−2(x). (3)
Many real-world graphs follow the small-world pattern, and the six degrees of separation impliesG0(x) ∪ ∂6G0(x) ≈ G. The upper bound of Gt(x) can grow exponentially in t, and there is noguarantee that Gt(x) will not explode.Proposition. Given a graph G (undirected or directed in both directions), we assume the probabilityof the degree of an arbitrary node being less than or equal to d is larger than p, i.e., P (deg(v) ≤d) > p,∀v ∈ V . Considering a sequence of consecutively expanding subgraphs (G0, G1, . . . , GT ),starting with G0 = {v}, for all t ≥ 1, we can ensure
P(|VGt | ≤
d(d− 1)t − 2
d− 2
)> p
d(d−1)t−1−2d−2 . (4)
The proposition implies the guarantee of upper-bounding |VGt(x)| becomes exponentially looser andweaker as t gets larger even if the given assumption has a small d and a large p (close to 1). Wedefine graph increment at step t as ∆Gt(x) such that Gt(x) = Gt−1(x) ∪ ∆Gt(x). To preventGt(x) from explosion, we need to constrain ∆Gt(x).
Sampling strategies. A simple but effective way to handle the large scale is to do sampling.
1. ∆Gt(x) = ∂Gt−1(x), where we sample nodes from the boundary of Gt−1(x).
2. ∆Gt(x) = ∂Gt−1(x), where we take the boundary of sampled nodes from Gt−1(x).
3. ∆Gt(x) = ∂Gt−1(x), where we sample nodes twice from Gt−1(x) and from ∂Gt−1(x).
4. ∆Gt(x) = ∂Gt−1(x), where we sample nodes three times with the last from ∂Gt−1(x).
Obviously, we have ∂Gt−1(x) ⊆ ∂Gt−1(x) ⊆ ∂Gt−1(x) and Gt−1(x) ∪ ∂Gt−1(x) ⊆ Gt−1(x) ∪∂Gt−1(x). Further, we let N1 and N3 be the maximum number of sampled nodes in ∂Gt−1(x) and
the last sampling of ∂Gt−1(x) respectively and let N2 be per-node maximum sampled neighbors in∂Gt−1(x), and then we can obtain much tighter guarantee as follow:
1We assume per-example per-edge per-dimension time cost as a unit time.
3
Published as a conference paper at ICLR 2020
AggrOp
... ...
... AggrOp
...
AttdOp
...
Inat tent ive GNN
At tent ive GNN
OneBatch
At tent i on Module
...
Pooling or returning the last
Figure 2: Model architecture used in knowledge graph reasoning.
1. P (|V∆Gt(x)| ≤ N1(d− 1)) > pN1 for ∂Gt−1(x).
2. P (|V∆Gt(x)| ≤ N1N2) = 1 and P (|V∆Gt(x)| ≤ N1 ·min(d−1, N2)) > pN1 for ∂Gt−1(x).
3. P (|V∆Gt(x)| ≤ min(N1N2, N3)) = 1 for ∂Gt−1(x).
Attention strategies. Although we guarantee |VGT (x)| ≤ 1 + T min(N1N2, N3) by ∂Gt−1(x) andconstrain the growth of Gt−1(x) by decreasing either N1N2 or N3, smaller sampling size meansless area explored and less chance to hit target nodes. To make efficient selection rather than ran-dom sampling, we apply attention mechanism to do the top-K selection where K can be small. We
change ∂Gt−1(x) to ∂Gt−1(x) where ∼ represents the operation of attending over nodes and pick-ing the top-K. There are two types of attention operations, one applied to Gt−1(x) and the other
applied to ∂Gt−1(x). Note that the size of ∂Gt−1(x) might be much larger if we intend to samplemore nodes with larger N2 to sufficiently explore the boundary. Nevertheless, we can address thisproblem by using smaller dimensions to compute attention, since attention on each node is a scalarrequiring a smaller capacity compared to node representation vectors computed in message passing.
3 DPMPN MODEL
3.1 ARCHITECTURE DESIGN FOR KNOWLEDGE GRAPH REASONING
Our model architecture as shown in Figure 2 consists of:
• IGNN module: performs full message passing to compute full-graph node representations.• AGNN module: performs a batch of pruned message passing to compute input-dependent node
representations which also make use of underlying representations from IGNN.• Attention Module: performs a flow-style attention transition process, conditioned on node repre-
sentations from both IGNN and AGNN but only affecting AGNN.
IGNN module. We implement it using standard message passing mechanism (Gilmer et al., 2017).If the full graph has an extremely large number of edges, we sample a subset of edges, Eτ ⊂ E ,randomly each step. For a batch of input queries, we let node representations from IGNN be sharedacross queries, containing no batch dimension. Thus, its complexity does not scale with batch sizeand the saved resources can be allocated to sampling more edges. Each node v has a state Hτ
v,: atstep τ , where the initial H0
v,: = ev . Each edge 〈v′, r, v〉 produces a message, denoted by Mτ〈v′,r,v〉,:
at step τ . The computation components include:
4
Published as a conference paper at ICLR 2020
• Message function: Mτ〈v′,r,v〉,: = ψIGNN(Hτ
v′,:, er,Hτv,:), where 〈v′, r, v〉 ∈ Eτ .
• Message aggregation: Mτ
v,: = 1√Nτ (v)
∑v′,rM
τ〈v′,r,v〉,:, where 〈v′, r, v〉 ∈ Eτ .
• Node state update function: Hτ+1v,: = Hτ
v,: + δIGNN(Hτv,:,M
τ
v,:, ev), where v ∈ V .
We compute messages only for sampled edges, 〈v′, r, v〉 ∈ Eτ , each step. Functions ψIGNN andδIGNN are implemented by a two-layer MLP (using leakyReLu for the first layer and tanh for thesecond) with input arguments concatenated respectively. Messages are aggregated by dividing thesum by the square root of Nτ (v), the number of neighbors that send messages to v, preserving thescale of variance. We use a residual adding to update each node state instead of a GRU or a LSTM.After running for T steps, we output a pooling result or simply the last, denoted by H = HT , tofeed into downstream modules.
AGNN module. AGNN is input-dependent, which means node states depend on input queryx = 〈head, rel, ?〉, denoted by Ht
v,:(x). We implement pruned message passing, running on smallsubgraphs each conditioned on an input query. We leverage the sparsity and only save Ht
v,:(x) forvisited nodes v ∈ VGt(x). When t = 0, we start from node head with VG0(x) = {vhead}. Whencomputing messages, denoted byM t
〈v′,r,v〉,:(x), we use an attending-sampling-attending procedure,explained in Section 3.2, to constrain the number of computed edges. The computation componentsinclude:
• Message function: M t〈v′,r,v〉,:(x) = ψAGNN(Ht
v′,:(x), cr(x),Htv,:(x)), where 〈v′, r, v〉 ∈
EGt(x)2, and cr(x) = [er, qhead, qrel] represents a context vector.
• Message aggregation: Mt
v,:(x) = 1√Nt(v)
∑v′,rM
t〈v′,r,v〉,:(x), where 〈v′, r, v〉 ∈ EGt(x).
• Node state attending function: Ht+1v,: (x) = at+1
v WHv,:, where at+1v is an attention score.
• Node state update function: Ht+1v,: (x) = Ht
v,:(x) + δAGNN(Htv,:(x),M
t
v,:(x), ct+1v (x)), where
ct+1v (x) = [Ht+1
v,: (x), qhead, qrel] also represents a context vector.
Query context is defined by its head and relation embeddings, i.e., qhead = ehead and qrel = erel.We introduce a node state attending function to pass node representation information from IGNNto AGNN weighted by a scalar attention score at+1
v and projected by a learnable matrix W . WeinitializeH0
v,:(x) = Hv,: for node v ∈ VG0(x), letting unseen nodes hold zero states.
Attention module. Attention over T steps is represented by a sequence of node probability dis-tributions, denoted by at (t = 1, 2 . . . , T ). The initial distribution a0 is a one-hot vector witha0[vhead] = 1. To spread attention, we need to compute transition matrices T t each step. Sinceit is conditioned on both IGNN and AGNN, we capture two types of interaction between v′ and v:Htv′,:(x) ∼Ht
v,:(x), andHtv′,:(x) ∼Hv,:. The former favors visited nodes, while the latter is used
to attend to unseen neighboring nodes.
T t:,v′ = softmaxv∈Nt(v′)(∑
rα1(Ht
v′,:(x), cr(x),Htv,:(x)) + α2(Ht
v′,:(x), cr(x),Hv,:))
α1(·) = MLP(Htv′,:(x), cr(x))TW1MLP(Ht
v,:(x), cr(x))
α2(·) = MLP(Htv′,:(x), cr(x))TW2MLP(Hv,:, cr(x))
(5)
whereW1 andW2 are two learnable matrices. Each MLP uses one single layer with the leakyReLuactivation. To reduce the complexity for computing T t, we use nodes v′ ∈ V
Gt(x), which contains
nodes with the k-largest attention scores at step t, and use nodes v sampled from v′’s neighborsto compute attention transition for the next step. Due to the fact that nodes v′ result from thetop-k pruning, the loss of attention may occur to diminish the total amount. Therefore, we use arenormalized version, at+1 = T tat/‖T tat‖, to compute new attention scores. We use attentionscores at the final step as the probability to predict the tail node.
2In practice, we can use a smaller set of edges than EGt(x) to pass messages as discussed in Section 3.2
5
Published as a conference paper at ICLR 2020
Sampling Horizon
Attending-to Horizon
Attending-from Horizon
Visited Nodes Visited Nodes
Full Neighborhood
Sampling Horizon
Attending-to Horizon
Attending-from Horizon
Full Neighborhood
Figure 3: Iterative attending-sampling-attending procedure balancing coverage and complexity.
3.2 COMPLEXITY REDUCTION BY ITERATIVE ATTENDING, SAMPLING AND ATTENDING
AGNN deals with each subgraph relying on input x and keeps a few selected nodes in VGt(x), calledvisited nodes. Initially, VG0(x) contains only one node vhead, and then VGt(x) is enlarged by addingnew nodes each step. When propagating messages, we can just consider the one-hop neighborhoodeach step. However, the expansion goes so rapidly that it covers almost all nodes after a few steps.The key to address the problem is to constrain the scope of nodes we can expand the boundary from,i.e., the core nodes which determine where we can go next. We call it the attending-from horizon,Gt(x), selected according to attention scores at. Given this horizon, we may still need node sam-pling over the neighborhood N(Gt(x)) in some cases where a hub node of extremely high degreeexists to cause an extremely large neighborhood. We introduce an attending-to horizon, denoted by
N(Gt(x)), inside the sampling horizon, denoted by N(Gt(x)). The attention module runs withinthe sampling horizon with smaller dimensions exchanged for sampling more neighbors for a largercoverage. In one word, we face a trade-off between coverage and complexity, and our strategy isto sample more but attend less plus using small dimensions to compute attention. We obtain theattending-to horizon according to newly computed attention scores at+1. Then, message passing
iteration at step t in AGNN can be further constrained on edges between Gt(x) and N(Gt(x)), asmaller set than EGt(x). We illustrate this procedure in Figure 3.
4 EXPERIMENTS
Datasets. We use six large KG datasets: FB15K, FB15K-237, WN18, WN18RR, NELL995, andYAGO3-10. FB15K-237 (Toutanova & Chen, 2015) is sampled from FB15K (Bordes et al., 2013)with redundant relations removed, and WN18RR (Dettmers et al., 2018) is a subset of WN18 (Bor-des et al., 2013) removing triples that cause test leakage. Thus, they are both considered morechallenging. NELL995 (Xiong et al., 2017) has separate datasets for 12 query relations each corre-sponding to a single-query-relation KBC task. YAGO3-10 (Mahdisoltani et al., 2014) contains thelargest KG with millions of edges. Their statistics are shown in Table 1. We find some statisticaldifferences between train and validation (or test). In a KG with all training triples as its edges, atriple (head, rel, tail) is considered as a multi-edge triple if the KG contains other triples that alsoconnect head and tail ignoring the direction. We notice that FB15K-237 is a special case comparedto the others, as there are no edges in its KG directly linking any pair of head and tail in valida-tion (or test). Therefore, when using training triples as queries to train our model, given a batch,for FB15K-237, we cut off from the KG all triples connecting the head-tail pairs in the given batch,ignoring relation types and edge directions, forcing the model to learn a composite reasoning patternrather than a single-hop pattern, and for the rest datasets, we only remove the triples of this batchand their inverse from the KG to avoid information leakage before training on this batch. This canbe regarded as a hyperparameter tuning whether to force a multi-hop reasoning or not, leading to aperformance boost of about 2% in HITS@1 on FB15-237.
Experimental settings. We use the same data split protocol as in many papers (Dettmers et al.,2018; Xiong et al., 2017; Das et al., 2018). We create a KG, a directed graph, consisting of alltrain triples and their inverse added for each dataset except NELL995, since it already includesreciprocal relations. Besides, every node in KGs has a self-loop edge to itself. We also add inverserelations into the validation and test set to evaluate the two directions. For evaluation metrics, we useHITS@1,3,10 and the mean reciprocal rank (MRR) in the filtered setting for FB15K-237, WN18RR,
6
Published as a conference paper at ICLR 2020
Table 1: Statistics of the six KG datasets. PME (tr) means the proportion of multi-edge triples intrain; PME (va) means the proportion of multi-edge triples in validation; AL (va) means the averagelength of shortest paths connecting each head-tail pair in validation.
Dataset #Entities #Rels #Train #Valid #Test PME (tr) PME (va) AL (va)FB15K 14,951 1,345 483,142 50,000 59,071 81.2% 80.6% 1.22FB15K-237 14,541 237 272,115 17,535 20,466 38.0% 0% 2.25WN18 40,943 18 141,442 5,000 5,000 93.1% 94.0% 1.18WN18RR 40,943 11 86,835 3,034 3,134 34.5% 35.5% 2.84NELL995 74,536 200 149,678 543 2,818 100% 31.1% 2.00YAGO3-10 123,188 37 1,079,040 5,000 5,000 56.4% 56.0% 1.75
Table 2: Comparison results on the FB15K-237 and WN18RR datasets. Results of [♠] are takenfrom (Nguyen et al., 2018), [♣] from (Dettmers et al., 2018), [♥] from (Shen et al., 2018), [♦] from(Sun et al., 2018), [4] from (Das et al., 2018), and [z] from (Lacroix et al., 2018). Some collectedresults only have a metric score while some including ours take the form of “mean (std)”.
FB15K-237 WN18RRMetric (%) H@1 H@3 H@10 MRR H@1 H@3 H@10 MRRTransE [♠] - - 46.5 29.4 - - 50.1 22.6DistMult [♣] 15.5 26.3 41.9 24.1 39 44 49 43DistMult [♥] 20.6 (.4) 31.8 (.2) - 29.0 (.2) 38.4 (.4) 42.4 (.3) - 41.3 (.3)ComplEx [♣] 15.8 27.5 42.8 24.7 41 46 51 44ComplEx [♥] 20.8 (.2) 32.6 (.5) - 29.6 (.2) 38.5 (.3) 43.9 (.3) - 42.2 (.2)ConvE [♣] 23.7 35.6 50.1 32.5 40 44 52 43ConvE [♥] 23.3 (.4) 33.8 (.3) - 30.8 (.2) 39.6 (.3) 44.7 (.2) - 43.3 (.2)RotatE [♦] 24.1 37.5 53.3 33.8 42.8 49.2 57.1 47.6ComplEx-N3[z] - - 56 37 - - 57 48NeuralLP [♥] 18.2 (.6) 27.2 (.3) - 24.9 (.2) 37.2 (.1) 43.4 (.1) - 43.5 (.1)MINERVA [♥] 14.1 (.2) 23.2 (.4) - 20.5 (.3) 35.1 (.1) 44.5 (.4) - 40.9 (.1)MINERVA [4] - - 45.6 - 41.3 45.6 51.3 -M-Walk [♥] 16.5 (.3) 24.3 (.2) - 23.2 (.2) 41.4 (.1) 44.5 (.2) - 43.7 (.1)DPMPN 28.6 (.1) 40.3 (.1) 53.0 (.3) 36.9 (.1) 44.4 (.4) 49.7 (.8) 55.8 (.5) 48.2 (.5)
FB15K, WN18, and YAGO3-10, and use the mean average precision (MAP) for NELL995’s single-query-relation KBC tasks. For NELL995, we follow the same evaluation procedure as in (Xionget al., 2017; Das et al., 2018; Shen et al., 2018), ranking the answer entities against the negativeexamples given in their experiments. We run our experiments using a 12G-memory GPU, TITANX (Pascal), with Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz. Our code is written in Pythonbased on TensorFlow 2.0 and NumPy 1.16 and can be found by the link3 below. We run threetimes for each hyperparameter setting per dataset to report the means and standard deviations. Seehyperparameter details in the appendix.
Baselines. We compare our model against embedding-based approaches, including TransE (Bordeset al., 2013), TransR (Lin et al., 2015b), DistMult (Yang et al., 2015), ConvE (Dettmers et al.,2018), ComplE (Trouillon et al., 2016), HolE (Nickel et al., 2016), RotatE (Sun et al., 2018), andComplEx-N3 (Lacroix et al., 2018), and path-based approaches that use RL methods, includingDeepPath (Xiong et al., 2017), MINERVA (Das et al., 2018), and M-Walk (Shen et al., 2018), andalso that uses learned neural logic, NeuralLP (Yang et al., 2017).
Comparison results and analysis. We report comparison on FB15K-23 and WN18RR in Table 2.Our model DPMPN significantly outperforms all the baselines in HITS@1,3 and MRR. Comparedto the best baseline, we only lose a few points in HITS@10 but gain a lot in HITS@1,3. Wespeculate that it is the reasoning capability that helps DPMPN make a sharp prediction by exploitinggraph-structured composition locally and conditionally. When a target becomes too vague to predict,reasoning may lose its advantage against embedding-based models. However, path-based baselines,with a certain ability to do reasoning, perform worse than we expect. We argue that it might beinappropriate to think of reasoning, a sequential decision process, equivalent to a sequence of nodes.The average lengths of the shortest paths between heads and tails as shown in Table 1 suggests a veryshort path, which makes the motivation of using a path almost useless. The reasoning pattern shouldbe modeled in the form of dynamical local graph-structured pattern with nodes densely connected
3https://github.com/anonymousauthor123/DPMPN
7
Published as a conference paper at ICLR 2020
0.5 1.0 1.5 2.0 2.5 3.0Epoch
40.0
42.5
45.0
47.5
50.0
52.5
55.0
57.5
60.0
Met
ric S
core
(%)
(A) Convergence Analysis (eval on test)H@1H@3H@10MMR
H@1 H@3 H@10 MMR40.0
42.5
45.0
47.5
50.0
52.5
55.0
57.5
60.0
Met
ric S
core
(%)
(B) IGNN Component AnalysisW/o IGNNWith IGNN
H@1 MMR40
42
44
46
48
50
52
54
Met
ric S
core
(%)
(C) Sampling Horizon AnalysisMax-sampling-per-node = 20Max-sampling-per-node = 50Max-sampling-per-node = 100Max-sampling-per-node = 200Max-sampling-per-node = 400
H@1 MMR40
42
44
46
48
50
52
54
Met
ric S
core
(%)
(D) Attending-to Horizon AnalysisMax-atteding-to-per-step = 20Max-atteding-to-per-step = 50Max-atteding-to-per-step = 100Max-atteding-to-per-step = 200Max-atteding-to-per-step = 400
H@1 MMR40
42
44
46
48
50
52
54
Met
ric S
core
(%)
(E) Attending-from Horizon AnalysisMax-atteding-from-per-step = 5Max-atteding-from-per-step = 10Max-atteding-from-per-step = 20Max-atteding-from-per-step = 40
H@1 MMR35.0
37.5
40.0
42.5
45.0
47.5
50.0
52.5
55.0
Met
ric S
core
(%)
(F) Searching Horizon Analysis#Steps-in-AGNN = 2#Steps-in-AGNN = 4#Steps-in-AGNN = 6#Steps-in-AGNN = 8
Figure 4: Experimental analysis on WN18RR. (A) Convergence analysis: we pick six model snap-shots during training and evaluate them on test. (B) IGNN component analysis: w/o IGNN uses zerostep to run message passing, while with IGNN uses two; (C)-(F) Sampling, attending-to, attending-from and searching horizon analysis. The charts on FB15K-237 can be found in the appendix.
with each other to produce a decision collectively. We also run our model on FB15K, WN18,and YAGO3-10, and the comparison results in the appendix show that DPMPN achieves a verycompetitive position against the best state of the art. We summarize the comparison on NELL995’stasks in the appendix. DPMPN performs the best on five tasks, also being competitive on the rest.
Convergence analysis. Our model converges very fast during training. We may use half of train-ing queries to train model to generalize as shown in Figure 4(A). Compared to less expensiveembedding-based models, our model need to traverse a number of edges when training on oneinput, consuming much time per batch, but it does not need to pass a second epoch, thus saving a lotof training time. The reason may be that training queries also belong to the KG’s edges and somemight be exploited to construct subgraphs during training on other queries.
Component analysis. Given the stacked GNN architecture, we want to examine how much eachGNN component contributes to the performance. Since IGNN is input-agnostic, we cannot relyon its node representations only to predict a tail given an input query. However, AGNN is input-dependent, which means it can be carried out to complete the task without taking underlying noderepresentations from IGNN. Therefore, we can arrange two sets of experiments: (1) AGNN + IGNN,and (2) AGNN-only. In AGNN-only, we do not run message passing in IGNN to compute Hv,:
but instead use node embeddings as Hv,:, and then we run pruned message passing in AGNN asusual. We want to be sure whether IGNN is actually useful. In this setting, we compare the firstset which runs IGNN for two steps against the second one which totally shuts IGNN down. Theresults in Figure 4(B) (and Figure 7(B) in Appendix) show that IGNN brings an amount of gains ineach metric on WN18RR (and FB15K-23), indicating that representations computed by full-graphmessage passing indeed help subgraph-based message passing.
Horizon analysis. The sampling, attending-to, attending-from and searching (i.e., propagationsteps) horizons determine how large area a subgraph can expand over. These factors affect com-putation complexity as well as prediction performance. Intuitively, enlarging the exploring area bysampling more, attending more, and searching longer, may increase the chance of hitting a targetto gain some performance. However, the experimental results in Figure 4(C)(D) show that it is notalways the case. In Figure 4(E), we can see that increasing the maximum number of attending-
8
Published as a conference paper at ICLR 2020
0 1 2 3 4Step
0
2
4
6
8
10
Entro
py o
f Atte
ntio
n Di
strib
utio
n(A) Attention Flow Analysis (on entroy)
AthletePlaysForTeamAthletePlaysInLeagueAthleteHomeStadiumAthletePlaysSportTeamPlaysSportOrgHeadQuarteredInCityWorksForPersonBornInLocationPersonLeadsOrgOrgHiredPersonAgentBelongsToOrgTeamPlaysInLeague
0 1 2 3 4Step
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00(B) Attention Flow Analysis (top1's proportion)
AthletePlaysForTeamAthletePlaysInLeagueAthleteHomeStadiumAthletePlaysSportTeamPlaysSportOrgHeadQuarteredInCityWorksForPersonBornInLocationPersonLeadsOrgOrgHiredPersonAgentBelongsToOrgTeamPlaysInLeague
0 1 2 3 4Step
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00(C) Attention Flow Analysis (top3's proportion)
AthletePlaysForTeamAthletePlaysInLeagueAthleteHomeStadiumAthletePlaysSportTeamPlaysSportOrgHeadQuarteredInCityWorksForPersonBornInLocationPersonLeadsOrgOrgHiredPersonAgentBelongsToOrgTeamPlaysInLeague
0 1 2 3 4Step
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00(D) Attention Flow Analysis (top5's proportion)
AthletePlaysForTeamAthletePlaysInLeagueAthleteHomeStadiumAthletePlaysSportTeamPlaysSportOrgHeadQuarteredInCityWorksForPersonBornInLocationPersonLeadsOrgOrgHiredPersonAgentBelongsToOrgTeamPlaysInLeague
Figure 5: Analysis of attention flow on NELL995 tasks. (A) The average entropy of attentiondistributions changing along steps for each single-query-relation KBC task. (B)(C)(D) The changingof the proportion of attention concentrated at the top-1,3,5 nodes per step for each task.
0
2
4
6
8
10
Trai
ning
Tim
e fo
r One
Epo
ch (h
)
(A) Time Cost on Sampling HorizonsMax-sampling-per-node = 20Max-sampling-per-node = 50Max-sampling-per-node = 100Max-sampling-per-node = 200Max-sampling-per-node = 400
0
2
4
6
8
10
Trai
ning
Tim
e fo
r One
Epo
ch (h
)
(B) Time Cost on Attending-to HorizonsMax-atteding-to-per-step = 20Max-atteding-to-per-step = 50Max-atteding-to-per-step = 100Max-atteding-to-per-step = 200Max-atteding-to-per-step = 400
0
2
4
6
8
10
Trai
ning
Tim
e fo
r One
Epo
ch (h
)
(C) Time Cost on Attending-from HorizonsMax-atteding-from-per-step = 5Max-atteding-from-per-step = 10Max-atteding-from-per-step = 20Max-atteding-from-per-step = 40
0
2
4
6
8
10
Trai
ning
Tim
e fo
r One
Epo
ch (h
)
(D) Time Cost on Searching Horizons#Steps-in-AGNN = 2#Steps-in-AGNN = 4#Steps-in-AGNN = 6#Steps-in-AGNN = 8
0
2
4
6
8
10
Trai
ning
Tim
e fo
r One
Epo
ch (h
)
(E) Time Cost on Batch SizesBatch-size = 50Batch-size = 100Batch-size = 200Batch-size = 300
Figure 6: Analysis of time cost on WN18RR: (A)-(D) measure the one-epoch training time on dif-ferent horizon settings corresponding to Figure 4(C)-(F); (E) measures on different batch sizes us-ing horizon setting Max-sampling-per-node=20, Max-attending-to-per-step=20, Max-attending-from-per-step=20, and #Steps-in-AGNN=8. The charts on FB15K-237 can be found in the appendix.
from nodes per step is useful. That also explains why we call nodes in the attending-from horizonthe core nodes, as they determine where subgraphs can be expanded and how attention will bepropagated to affect the final probability distribution on the tail prediction. However, GPUs witha limited memory do not allow for a too large number of sampled or attended nodes especially forMax-attending-from-per-step. The detailed explanations can be found in attention strategies in Section2 where the upper bound is controlled by N1N2 and N3 (Max-attending-from-per-step correspondingto N1, Max-sampling-per-node to N2, and Max-attending-to-per-step to N3). In N1N2, Section 3.2 sug-gests that we should sample more by a large N2 but attend less by a small N1. Figure 4(F) suggeststhat the propagation steps of AGNN should not go below four.
Attention flow analysis. If the flow-style attention really captures the way we reason about theworld, its process should be conducted in a diverging-converging thinking pattern. Intuitively, first,for the diverging thinking phase, we search and collect ideas as much as we can; then, for theconverging thinking phase, we try to concentrate our thoughts on one point. To check whether theattention flow has such a pattern, we measure the average entropy of attention distributions changingalong steps and also the proportion of attention concentrated at the top-1,3,5 nodes. As we expect,attention is more focused at the final step and the beginning.
Time cost analysis. The time cost is affected not only by the scale of a dataset but also by the horizonsetting. For each dataset, we list the training time for one epoch corresponding to our standardhyperparameter settings in the appendix. Note that there is always a trade-off between complexityand performance. We thus study whether we can reduce time cost a lot at the price of sacrificing alittle performance. We plot the one-epoch training time in Figure 6(A)-(D), using the same settingsas we do in the horizon analysis. We can see that Max-attending-from-per-step and #Steps-in-AGNNaffect the training time significantly while Max-sampling-per-node and Max-attending-to-per-step affectvery slightly. Therefore, we can use smaller Max-sampling-per-node and Max-attending-to-per-step inorder to gain a larger batch size, making the computation more efficiency as shown in Figure 6(E).
Visualization. To further demonstrate the reasoning capability, we show visualization results ofsome pruned subgraphs on NELL995’s test data for 12 separate tasks. We avoid using the trainingdata in order to show generalization of the learned reasoning capability. We show the visualizationresults in Figure 1. See the appendix for detailed analysis and more visualization results.
9
Published as a conference paper at ICLR 2020
Discussion of the limitation. Although DPMPN shows a promising way to harness the scalabilityon large-scale graph data, current GPU-based machine learning platforms, such as TensorFlow andPyTorch, seem not ready to fully leverage sparse tensor computation which acts as building blocksto support dynamical computation graphs which varies from one input to another. Extra overheadcaused by extensive sparse operations will neutralize the benefits of exploiting sparsity.
5 RELATED WORK
Knowledge graph reasoning. Early work, including TransE (Bordes et al., 2013) and its analogues(Wang et al., 2014; Lin et al., 2015b; Ji et al., 2015), DistMult (Yang et al., 2015), ConvE (Dettmerset al., 2018) and ComplEx (Trouillon et al., 2016), focuses on learning embeddings of entities andrelations. Some recent works of this line (Sun et al., 2018; Lacroix et al., 2018) achieve high accu-racy. Another line aims to learn inference paths (Lao et al., 2011; Guu et al., 2015; Lin et al., 2015a;Toutanova et al., 2016; Chen et al., 2018; Lin et al., 2018) for knowledge graph reasoning, especiallyDeepPath (Xiong et al., 2017), MINERVA (Das et al., 2018), and M-Walk (Shen et al., 2018), whichuse RL to learn multi-hop relational paths. However, these approaches, based on policy gradientsor Monte Carlo tree search, often suffer from low sample efficiency and sparse rewards, requiringa large number of rollouts and sophisticated reward function design. Other efforts include learningsoft logical rules (Cohen, 2016; Yang et al., 2017) or compostional programs (Liang et al., 2016).
Relational reasoning in Graph Neural Networks. Relational reasoning is regarded as the key forcombinatorial generalization, taking the form of entity- and relation-centric organization to reasonabout the composition structure of the world (Craik, 1952; Lake et al., 2017). A multitude of recentimplementations (Battaglia et al., 2018) encode relational inductive biases into neural networks toexploit graph-structured representation, including graph convolution networks (GCNs) (Bruna et al.,2014; Henaff et al., 2015; Duvenaud et al., 2015; Kearnes et al., 2016; Defferrard et al., 2016; Niepertet al., 2016; Kipf & Welling, 2017; Bronstein et al., 2017) and graph neural networks (Scarselli et al.,2009; Li et al., 2016; Santoro et al., 2017; Battaglia et al., 2016; Gilmer et al., 2017). Variants ofGNN architectures have been developed. Relation networks (Santoro et al., 2017) use a simplebut effective neural module to model relational reasoning, and its recurrent versions (Santoro et al.,2018; Palm et al., 2018) do multi-step relational inference for long periods; Interaction networks(Battaglia et al., 2016) provide a general-purpose learnable physics engine, and two of its variants arevisual interaction networks (Watters et al., 2017) and vertex attention interaction networks (Hoshen,2017); Message passing neural networks (Gilmer et al., 2017) unify various GCNs and GNNs intoa general message passing formalism by analogy to the one in graphical models.
Attention mechanism on graphs. Neighborhood attention operation can enhance GNNs’ repre-sentation power (Velickovic et al., 2018; Hoshen, 2017; Wang et al., 2018; Kool, 2018). Theseapproaches often use multi-head self-attention to focus on specific interactions with neighbors whenaggregating messages, inspired by (Bahdanau et al., 2015; Lin et al., 2017; Vaswani et al., 2017).Most graph-based attention mechanisms attend over neighborhood in a single-hop fashion, and(Hoshen, 2017) claims that the multi-hop architecture does not help to model high-order interac-tion in experiments. However, a flow-style design of attention in (Xu et al., 2018b) shows a way tomodel long-range attention, stringing isolated attention operations by transition matrices.
6 CONCLUSION
We introduce Dynamically Pruned Message Passing Networks (DPMPN) and apply it to large-scaleknowledge graph reasoning tasks. We propose to learn an input-dependent local subgraph whichis progressively and selectively constructed to model a sequential reasoning process in knowledgegraphs. We use graphical attention expression, a flow-style attention mechanism, to guide and prunethe underlying message passing, making it scalable for large-scale graphs and also providing cleargraphical interpretations. We also take the inspiration from the consciousness prior to develop atwo-GNN framework to boost experimental performances.
10
Published as a conference paper at ICLR 2020
REFERENCES
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointlylearning to align and translate. CoRR, abs/1409.0473, 2015.
Peter W. Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, and KorayKavukcuoglu. Interaction networks for learning about objects, relations and physics. In NIPS,2016.
Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinıcius Flo-res Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, RyanFaulkner, Caglar Gulcehre, Francis Song, Andrew J. Ballard, Justin Gilmer, George E. Dahl,Ashish Vaswani, Kelsey R. Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess,Daan Wierstra, Pushmeet Kohli, Matthew Botvinick, Oriol Vinyals, Yujia Li, and Razvan Pas-canu. Relational inductive biases, deep learning, and graph networks. CoRR, abs/1806.01261,2018.
Yoshua Bengio. The consciousness prior. CoRR, abs/1709.08568, 2017.
Antoine Bordes, Nicolas Usunier, Alberto Garcıa-Duran, Jason Weston, and Oksana Yakhnenko.Translating embeddings for modeling multi-relational data. In NIPS, 2013.
Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Geomet-ric deep learning: Going beyond euclidean data. IEEE Signal Processing Magazine, 34:18–42,2017.
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locallyconnected networks on graphs. CoRR, abs/1312.6203, 2014.
Wenhu Chen, Wenhan Xiong, Xifeng Yan, and William Yang Wang. Variational knowledge graphreasoning. In NAACL-HLT, 2018.
William W. Cohen. Tensorlog: A differentiable deductive database. CoRR, abs/1605.06523, 2016.
Kenneth H. Craik. The nature of explanation. 1952.
Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krish-namurthy, Alexander J. Smola, and Andrew McCallum. Go for a walk and arrive at the answer:Reasoning over paths in knowledge bases using reinforcement learning. CoRR, abs/1711.05851,2018.
Michael Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks ongraphs with fast localized spectral filtering. In NIPS, 2016.
Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. Convolutional 2dknowledge graph embeddings. In AAAI, 2018.
David K. Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gomez-Bombarelli,Timothy Hirzel, Alan Aspuru-Guzik, and Ryan P. Adams. Convolutional networks on graphs forlearning molecular fingerprints. In NIPS, 2015.
Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neuralmessage passing for quantum chemistry. In ICML, 2017.
Kelvin Guu, John Miller, and Percy S. Liang. Traversing knowledge graphs in vector space. InEMNLP, 2015.
Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graph-structureddata. CoRR, abs/1506.05163, 2015.
Yedid Hoshen. Vain: Attentional multi-agent predictive modeling. In NIPS, 2017.
Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jian Zhao. Knowledge graph embedding viadynamic mapping matrix. In ACL, 2015.
11
Published as a conference paper at ICLR 2020
Steven M. Kearnes, Kevin McCloskey, Marc Berndl, Vijay S. Pande, and Patrick Riley. Moleculargraph convolutions: moving beyond fingerprints. Journal of computer-aided molecular design,30 8:595–608, 2016.
Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional net-works. CoRR, abs/1609.02907, 2017.
Wouter Kool. Attention solves your tsp , approximately. 2018.
Timothee Lacroix, Nicolas Usunier, and Guillaume Obozinski. Canonical tensor decomposition forknowledge base completion. In ICML, 2018.
Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J Gershman. Buildingmachines that learn and think like people. The Behavioral and brain sciences, 40:e253, 2017.
Ni Lao, Tom Michael Mitchell, and William W. Cohen. Random walk inference and learning in alarge scale knowledge base. In EMNLP, 2011.
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. Gated graph sequence neuralnetworks. CoRR, abs/1511.05493, 2016.
Chen Liang, Jonathan Berant, Quoc V. Le, Kenneth D. Forbus, and Ni Lao. Neural symbolic ma-chines: Learning semantic parsers on freebase with weak supervision. In ACL, 2016.
Xi Victoria Lin, Richard Socher, and Caiming Xiong. Multi-hop knowledge graph reasoning withreward shaping. In EMNLP, 2018.
Yankai Lin, Zhiyuan Liu, and Maosong Sun. Modeling relation paths for representation learning ofknowledge bases. In EMNLP, 2015a.
Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relationembeddings for knowledge graph completion. In AAAI, 2015b.
Zhouhan Lin, Minwei Feng, Cıcero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, andYoshua Bengio. A structured self-attentive sentence embedding. CoRR, abs/1703.03130, 2017.
Farzaneh Mahdisoltani, Joanna Asia Biega, and Fabian M. Suchanek. Yago3: A knowledge basefrom multilingual wikipedias. In CIDR, 2014.
Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, and Dinh Q. Phung. A novel embeddingmodel for knowledge base completion based on convolutional neural network. In NAACL-HLT,2018.
Maximilian Nickel, Lorenzo Rosasco, and Tomaso A. Poggio. Holographic embeddings of knowl-edge graphs. In AAAI, 2016.
Mathias Niepert, Mohammed Hassan Ahmed, and Konstantin Kutzkov. Learning convolutionalneural networks for graphs. In ICML, 2016.
Rasmus Berg Palm, Ulrich Paquet, and Ole Winther. Recurrent relational networks. In NeurIPS,2018.
Adam Santoro, David Raposo, David G. T. Barrett, Mateusz Malinowski, Razvan Pascanu, Peter W.Battaglia, and Timothy P. Lillicrap. A simple neural network module for relational reasoning. InNIPS, 2017.
Adam Santoro, Ryan Faulkner, David Raposo, Jack W. Rae, Mike Chrzanowski, Theophane Weber,Daan Wierstra, Oriol Vinyals, Razvan Pascanu, and Timothy P. Lillicrap. Relational recurrentneural networks. In NeurIPS, 2018.
Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini.The graph neural network model. IEEE Transactions on Neural Networks, 20:61–80, 2009.
Yelong Shen, Jianshu Chen, Pu Huang, Yuqing Guo, and Jianfeng Gao. M-walk: Learning to walkover graphs using monte carlo tree search. In NeurIPS, 2018.
12
Published as a conference paper at ICLR 2020
Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. Rotate: Knowledge graph embeddingby relational rotation in complex space. CoRR, abs/1902.10197, 2018.
Kristina Toutanova and Danqi Chen. Observed versus latent features for knowledge base and textinference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and theirCompositionality, 2015.
Kristina Toutanova, Victoria Lin, Wen tau Yih, Hoifung Poon, and Chris Quirk. Compositionallearning of embeddings for relation paths in knowledge base and text. In ACL, 2016.
Theo Trouillon, Johannes Welbl, Sebastian Riedel, Eric Gaussier, and Guillaume Bouchard. Com-plex embeddings for simple link prediction. In ICML, 2016.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez,Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS, 2017.
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Alejandro Romero, Pietro Lio, and YoshuaBengio. Graph attention networks. CoRR, abs/1710.10903, 2018.
William Wang. Knowledge graph reasoning: Recent advances, 2018.
Xiaolong Wang, Ross B. Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7794–7803, 2018.
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by trans-lating on hyperplanes. In AAAI, 2014.
Nicholas Watters, Daniel Zoran, Theophane Weber, Peter W. Battaglia, Razvan Pascanu, and AndreaTacchetti. Visual interaction networks: Learning a physics simulator from video. In NIPS, 2017.
Wenhan Xiong, Thien Hoang, and William Yang Wang. Deeppath: A reinforcement learning methodfor knowledge graph reasoning. In EMNLP, 2017.
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neuralnetworks? ArXiv, abs/1810.00826, 2018a.
Xiaoran Xu, Songpeng Zu, Chengliang Gao, Yuan Zhang, and Wei Feng. Modeling attention flowon graphs. CoRR, abs/1811.00497, 2018b.
Bishan Yang, Wen tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding entities andrelations for learning and inference in knowledge bases. CoRR, abs/1412.6575, 2015.
Fan Yang, Zhilin Yang, and William W. Cohen. Differentiable learning of logical rules for knowl-edge base reasoning. In NIPS, 2017.
13
Published as a conference paper at ICLR 2020
Appendix
7 PROOF
Proposition. Given a graph G (undirected or directed in both directions), we assume the probabilityof the degree of an arbitrary node being less than or equal to d is larger than p, i.e., P (deg(v) ≤d) > p,∀v ∈ V . Considering a sequence of consecutively expanding subgraphs (G0, G1, . . . , GT ),starting with G0 = {v}, for all t ≥ 1, we can ensure
P(|VGt | ≤
d(d− 1)t − 2
d− 2
)> p
d(d−1)t−1−2d−2 . (6)
Proof. We consider the extreme case of greedy consecutive expansion, where Gt = Gt−1 ∪∆Gt =Gt−1 ∪ ∂Gt−1, since if this case satisfies the inequality, any case of consecutive expansion can alsosatisfy it. By definition, all the subgraphs Gt are a connected graph. Here, we use ∆V t to denoteV∆Gt for short. In the extreme case, we can ensure that the newly added nodes ∆V t at step t onlybelong to the neighborhood of the last added nodes ∆V t−1. Since for t ≥ 2 each node in ∆V t−1
already has at least one edge within Gt−1 due to the definition of connected graphs, we can have
P(|∆V t| ≤ |∆V t−1|(d− 1)
)> p|∆V
t−1|. (7)
For t = 1, we have P (|∆V 1| ≤ d) > p and thus
P(|VG1 | ≤ 1 + d
)> p. (8)
For t ≥ 2, based on |VGt | = 1 + |∆V 1|+ . . .+ |∆V t|, we obtain
P(|VGt | ≤ 1 + d+ d(d− 1) + . . .+ d(d− 1)t−1
)> p1+d+d(d−1)+...+d(d−1)t−2
, (9)
which is
P(|VGt | ≤
d(d− 1)t − 2
d− 2
)> p
d(d−1)t−1−2d−2 . (10)
We can find that t = 1 also satisfies this inequality.
14
Published as a conference paper at ICLR 2020
8 HYPERPARAMETER SETTINGS
Table 3: Our standard hyperparameter settings we use for each dataset plus their one-epoch trainingtime. For experimental analysis, we only adjust one hyperparameter and keep the remaining fixedas the standard setting. For NELL995, the one-epoch training time means the average time cost ofthe 12 single-query-relation tasks.
Hyperparameter FB15K-237 FB15K WN18RR WN18 YAGO3-10 NELL995batch size 80 80 100 100 100 10n dims att 50 50 50 50 50 200n dims 100 100 100 100 100 200max sampling per step (in IGNN) 10000 10000 10000 10000 10000 10000max attending from per step 20 20 20 20 20 100max sampling per node (in AGNN) 200 200 200 200 200 1000max attending to per step 200 200 200 200 200 1000n steps in IGNN 2 1 2 1 1 1n steps in AGNN 6 6 8 8 6 5learning rate 0.001 0.001 0.001 0.001 0.0001 0.001optimizer Adam Adam Adam Adam Adam Adamgrad clipnorm 1 1 1 1 1 1n epochs 1 1 1 1 1 3One-epoch training time (h) 25.7 63.7 4.3 8.5 185.0 0.12
The hyperparameters can be categorized into three groups:
• Normal hyperparameters, including batch size, n dims att, n dims, learning rate, grad clipnorm, andn epochs. We set smaller dimensions, n dims att, for computation in the attention module, as ituses more edges than the message passing uses in AGNN, and also intuitively, it does not need topropagate high-dimensional messages but only compute scalar scores over a sampled neighbor-hood, in concert with the idea in the key-value mechanism (Bengio, 2017). We set n epochs = 1in most cases, indicating that our model can be trained well by one epoch only due to its fastconvergence.• The hyperparameters in charge of the sampling-attending horizon, including
max sampling per step that controls the maximum number to sample edges per step in IGNN, andmax sampling per node, max attending from per step and max attending to per step that control themaximum number to sample neighbors of each selected node per step per input, the maximumnumber of selected nodes for attending-from per step per input, and the maximum number ofselected nodes in a sampled neighborhood for attending-to per step per input in AGNN.• The hyperparameters in charge of the searching horizon, including n steps in IGNN representing
the number of propagation steps to run standard message passing in IGNN, and n steps in AGNNrepresenting the number of propagation steps to run pruned message passing in AGNN.
Note that we tune these hyperparameters according to not only their performances but also thecomputation resources available to us. In some cases, to deal with a very large knowledge graph withlimited resources, we need to make a trade-off between efficiency and effectiveness. For example,each of NELL995’s single-query-relation tasks has a small training set, though still with a largegraph, so we can reduce the batch size in favor of affording larger dimensions and a larger sampling-attending horizon without any concern for waiting too long to finish one epoch.
15
Published as a conference paper at ICLR 2020
9 MORE EXPERIMENTAL RESULTS
Table 4: Comparison results on the FB15K and WN18 datasets. Results of [♠] are taken from(Nickel et al., 2016), [♣] from (Dettmers et al., 2018), [♦] from (Sun et al., 2018), [♥] from (Yanget al., 2017), and [z] from (Lacroix et al., 2018). Our results take the form of ”mean (std)”.
FB15K WN18Metric (%) H@1 H@3 H@10 MRR H@1 H@3 H@10 MRRTransE [♠] 29.7 57.8 74.9 46.3 11.3 88.8 94.3 49.5HolE [♠] 40.2 61.3 73.9 52.4 93.0 94.5 94.9 93.8DistMult [♣] 54.6 73.3 82.4 65.4 72.8 91.4 93.6 82.2ComplEx [♣] 59.9 75.9 84.0 69.2 93.6 93.6 94.7 94.1ConvE [♣] 55.8 72.3 83.1 65.7 93.5 94.6 95.6 94.3RotatE [♦] 74.6 83.0 88.4 79.7 94.4 95.2 95.9 94.9ComplEx-N3 [z] - - 91 86 - - 96 95NeuralLP [♥] - - 83.7 76 - - 94.5 94DPMPN 72.6 (.4) 78.4 (.4) 83.4 (.5) 76.4 (.4) 91.6 (.8) 93.6 (.4) 94.9 (.4) 92.8 (.6)
Table 5: Comparison results on the YAGO3-10 dataset. Results of [♠] are taken from (Dettmerset al., 2018), [♣] from (Lacroix et al., 2018), and [z] from (Lacroix et al., 2018).
YAGO3-10Metric (%) H@1 H@3 H@10 MRRDistMult [♠] 24 38 54 34ComplEx [♠] 26 40 55 36ConvE [♠] 35 49 62 44ComplEx-N3 [z] - - 71 58DPMPN 48.4 59.5 67.9 55.3
Table 6: Comparison results of MAP scores (%) on NELL995’s single-query-relation KBC tasks.We take our baselines’ results from (Shen et al., 2018). No reports found on the last two in the paper.
Tasks NeuCFlow M-Walk MINERVA DeepPath TransE TransRAthletePlaysForTeam 83.9 (0.5) 84.7 (1.3) 82.7 (0.8) 72.1 (1.2) 62.7 67.3AthletePlaysInLeague 97.5 (0.1) 97.8 (0.2) 95.2 (0.8) 92.7 (5.3) 77.3 91.2AthleteHomeStadium 93.6 (0.1) 91.9 (0.1) 92.8 (0.1) 84.6 (0.8) 71.8 72.2AthletePlaysSport 98.6 (0.0) 98.3 (0.1) 98.6 (0.1) 91.7 (4.1) 87.6 96.3TeamPlayssport 90.4 (0.4) 88.4 (1.8) 87.5 (0.5) 69.6 (6.7) 76.1 81.4OrgHeadQuarteredInCity 94.7 (0.3) 95.0 (0.7) 94.5 (0.3) 79.0 (0.0) 62.0 65.7WorksFor 86.8 (0.0) 84.2 (0.6) 82.7 (0.5) 69.9 (0.3) 67.7 69.2PersonBornInLocation 84.1 (0.5) 81.2 (0.0) 78.2 (0.0) 75.5 (0.5) 71.2 81.2PersonLeadsOrg 88.4 (0.1) 88.8 (0.5) 83.0 (2.6) 79.0 (1.0) 75.1 77.2OrgHiredPerson 84.7 (0.8) 88.8 (0.6) 87.0 (0.3) 73.8 (1.9) 71.9 73.7AgentBelongsToOrg 89.3 (1.2) - - - - -TeamPlaysInLeague 97.2 (0.3) - - - - -
16
Published as a conference paper at ICLR 2020
0.5 1.0 1.5 2.0 2.5 3.0Epoch
20
25
30
35
40
45
50
55
60
65
Met
ric S
core
(%)
(A) Convergence Analysis (eval on test)H@1H@3H@10MMR
H@1 H@3 H@10 MMR25
30
35
40
45
50
55
Met
ric S
core
(%)
(B) IGNN Component AnalysisW/o IGNNWith IGNN
H@1 MMR25.0
27.5
30.0
32.5
35.0
37.5
40.0
42.5
45.0
Met
ric S
core
(%)
(C) Sampling Horizon AnalysisMax-sampling-per-node = 20Max-sampling-per-node = 50Max-sampling-per-node = 100Max-sampling-per-node = 200
H@1 MMR25.0
27.5
30.0
32.5
35.0
37.5
40.0
42.5
45.0
Met
ric S
core
(%)
(D) Attending-to Horizon AnalysisMax-atteding-to-per-step = 20Max-atteding-to-per-step = 50Max-atteding-to-per-step = 100Max-atteding-to-per-step = 200
H@1 MMR25.0
27.5
30.0
32.5
35.0
37.5
40.0
42.5
45.0
Met
ric S
core
(%)
(E) Attending-from Horizon AnalysisMax-atteding-from-per-step = 5Max-atteding-from-per-step = 10Max-atteding-from-per-step = 20
H@1 MMR15
20
25
30
35
40
45
Met
ric S
core
(%)
(F) Searching Horizon Analysis#Steps-in-AGNN = 2#Steps-in-AGNN = 4#Steps-in-AGNN = 6
Figure 7: Experimental analysis on FB15K-237. (A) Convergence analysis: we pick six modelsnapshots at time points of 0.3, 0.5, 0.7, 1, 2, and 3 epochs during training and evaluate them on test;(B) IGNN component analysis: w/o IGNN uses zero step to run message passing, while with IGNNuses two steps; (C)-(F) Sampling, attending-to, attending-from and searching horizon analysis.
0
5
10
15
20
25
30
35
Trai
ning
Tim
e fo
r One
Epo
ch (h
)
(A) Time Cost on Sampling HorizonsMax-sampling-per-node = 20Max-sampling-per-node = 50Max-sampling-per-node = 100Max-sampling-per-node = 200
0
5
10
15
20
25
30
35
Trai
ning
Tim
e fo
r One
Epo
ch (h
)
(B) Time Cost on Attending-to HorizonsMax-atteding-to-per-step = 20Max-atteding-to-per-step = 50Max-atteding-to-per-step = 100Max-atteding-to-per-step = 200
0
5
10
15
20
25
30
35
Trai
ning
Tim
e fo
r One
Epo
ch (h
)
(C) Time Cost on Attending-from HorizonsMax-atteding-from-per-step = 5Max-atteding-from-per-step = 10Max-atteding-from-per-step = 20
0
5
10
15
20
25
30
35
Trai
ning
Tim
e fo
r One
Epo
ch (h
)
(D) Time Cost on Searching Horizons#Steps-in-AGNN = 2#Steps-in-AGNN = 4#Steps-in-AGNN = 6
0
5
10
15
20
25
30
35
Trai
ning
Tim
e fo
r One
Epo
ch (h
)
(E) Time Cost on Batch SizesBatch-size = 50Batch-size = 100Batch-size = 200Batch-size = 300
Figure 8: Analysis of time cost on FB15K-237: (A)-(D) measure the one-epoch training time on dif-ferent horizon settings corresponding to Figure 7(C)-(F); (E) measures on different batch sizes usinghorizon setting Max-sampled-edges-per-node=20, Max-seen-nodes-per-step=20, Max-attended-nodes-per-step=20, and #Steps-of-AGNN=6.
10 MORE VISUALIZATION RESUTLS
10.1 CASE STUDY ON THE ATHLETEPLAYSFORTEAM TASK
In the case shown in Figure 9, the query is (concept personnorthamerica michael turner,concept:athleteplays-forteam, ?) and a true answer is concept sportsteam falcons. From Figure 9, wecan see our model learns that (concept personnorthamerica michael turner, concept:athletehomestadium,concept stadiumoreventvenue georgia dome) and (concept stadiumoreventvenue georgia dome, con-cept:teamhomestadium inv, concept sportsteam falcons) are two important facts to support the an-swer of concept sportsteam falcons. Besides, other facts, such as (concept athlete joey harrington,concept:athletehomestadium, concept stadiumoreventvenue georgia dome) and (concept athlete-joey harrington, concept:athleteplaysforteam, concept sportsteam falcons), provide a vivid example
that a person or an athlete with concept stadiumoreventvenue georgia dome as his or her homestadium might play for the team concept sportsteam falcons. We have such examples more thanone, like concept athlete roddy white’s and concept athlete quarterback matt ryan’s. The entity con-
17
Published as a conference paper at ICLR 2020
cept sportsleague nfl cannot help us differentiate the true answer from other NFL teams, but it canat least exclude those non-NFL teams. In a word, our subgraph-structured representation can wellcapture the relational and compositional reasoning pattern.
concept_personnorthamerica_michael_turner
concept_stadiumoreventvenue_georgia_dome
concept_sportsleague_nfl
concept_coach_deangelo_hall concept_athlete_roddy_white
concept_athlete_joey_harrington
concept_athlete_quarterback_matt_ryan
concept_coach_jerious_norwood
concept_athlete_chris_redman
concept_city_atlanta
concept_sportsteam_falcons
concept_sportsteam_dallas_cowboys
concept_sportsteam_seahawks
concept_sportsteam_sd_chargers
concept_sportsteam_minnesota_vikingsconcept_sportsteam_broncos
concept_sportsteam_cleveland_browns
concept_sportsteam_titans
concept_sportsteam_buffalo_billsconcept_sportsteam_kansas_city_chiefs
concept_sportsteam_oakland_raiders
concept_sport_football
concept_awardtrophytournament_division
concept_sportsteam_packers
concept_sportsteam_colts
concept_sportsteam_eagles
concept_sportsteam_new_york_giants
concept_sportsteam_bears_29_17
concept_sportsteam_bills
concept_sportsteam_steelers
concept_sportsteam_buccaneers concept_sportsteam_rams
concept_sportsteam_saintsconcept_sportsteam_bucs concept_sportsteam_tampa
concept_sportsteam_texansconcept_sportsteam_new_york_jets
concept_sportsteam_ravens
concept_personnorthamerica_michael_turner
concept_stadiumoreventvenue_georgia_dome
concept_sportsleague_nfl
concept_coach_deangelo_hallconcept_athlete_roddy_white
concept_athlete_joey_harrington
concept_athlete_quarterback_matt_ryan
concept_coach_jerious_norwood
concept_sportsteam_falconsconcept_sportsteam_oakland_raiders
concept_sport_football
concept_sportsteam_sd_chargers
concept_sportsteam_minnesota_vikings
concept_sportsteam_dallas_cowboys
concept_sportsteam_new_york_giants
concept_sportsteam_kansas_city_chiefs
concept_sportsteam_tampa
Figure 9: AthletePlaysForTeam. The head is concept personnorthamerica michael turner, the queryrelation is concept:athleteplaysforteam, and the tail is concept sportsteam falcons. The left is a full sub-graph derived with max attending from per step=20, and the right is a further pruned subgraph fromthe left based on attention. The big yellow node represents the head, and the big red node representsthe tail. Color on the rest indicates attention scores over a T -step reasoning process, where greymeans less attention, yellow means more attention gained during early steps, and red means gainingmore attention when getting closer to the final step.
For the AthletePlaysForTeam taskQuery : ( concept personnor thamer ica michae l tu rner , concept : a th le tep lays fo r team , concept spor ts team fa lcons )
Selected key edges :concept personnor thamer ica michae l tu rner , concept : agentbe longstoorgan iza t ion , concep t spo r t s l eague n f lconcept personnor thamer ica michae l tu rner , concept : athletehomestadium , concept stadiumoreventvenue georgia domeconcep t spo r t s league n f l , concept : agentcompeteswithagent , concep t spo r t s l eague n f lconcep t spo r t s league n f l , concept : agentcompeteswithagent inv , concep t spo r t s l eague n f lconcep t spo r t s league n f l , concept : teamplays in league inv , concept spor ts team sd chargersconcep t spo r t s league n f l , concept : leaguestadiums , concept stadiumoreventvenue georgia domeconcep t spo r t s league n f l , concept : teamplays in league inv , concept spor ts team fa lconsconcep t spo r t s league n f l , concept : agen tbe longs toorgan iza t ion inv , concept personnor thamer ica michae l tu rnerconcept stadiumoreventvenue georgia dome , concept : leaguestadiums inv , concep t spo r t s l eague n f lconcept stadiumoreventvenue georgia dome , concept : teamhomestadium inv , concept spor ts team fa lconsconcept stadiumoreventvenue georgia dome , concept : ath letehomestadium inv , c o n c e p t a t h l e t e j o e y h a r r i n g t o nconcept stadiumoreventvenue georgia dome , concept : ath letehomestadium inv , concep t a th le te roddy wh i t econcept stadiumoreventvenue georgia dome , concept : ath letehomestadium inv , concept coach deange lo ha l lconcept stadiumoreventvenue georgia dome , concept : ath letehomestadium inv , concept personnor thamer ica michae l tu rnerconcep t spo r t s league n f l , concept : subpa r t o f o rgan i za t i on i nv , concept spor ts team oak land ra idersconcept spor ts team sd chargers , concept : teamplaysinleague , concep t spo r t s l eague n f lconcept spor ts team sd chargers , concept : teamplaysagainstteam , concept spor ts team fa lconsconcept spor ts team sd chargers , concept : teamplaysagainst team inv , concept spor ts team fa lconsconcept spor ts team sd chargers , concept : teamplaysagainstteam , concep t spor ts team oak land ra idersconcept spor ts team sd chargers , concept : teamplaysagainst team inv , concept spor ts team oak land ra idersconcept spor ts team fa lcons , concept : teamplaysinleague , concep t spo r t s l eague n f lconcept spor ts team fa lcons , concept : teamplaysagainstteam , concept spor ts team sd chargersconcept spor ts team fa lcons , concept : teamplaysagainst team inv , concept spor ts team sd chargersconcept spor ts team fa lcons , concept : teamhomestadium , concept stadiumoreventvenue georgia domeconcept spor ts team fa lcons , concept : teamplaysagainstteam , concep t spor ts team oak land ra idersconcept spor ts team fa lcons , concept : teamplaysagainst team inv , concept spor ts team oak land ra idersconcept spor ts team fa lcons , concept : a th le te ledspo r t s team inv , c o n c e p t a t h l e t e j o e y h a r r i n g t o nc o n c e p t a t h l e t e j o e y h a r r i n g t o n , concept : athletehomestadium , concept stadiumoreventvenue georgia domec o n c e p t a t h l e t e j o e y h a r r i n g t o n , concept : a th le te ledspor ts team , concept spor ts team fa lconsc o n c e p t a t h l e t e j o e y h a r r i n g t o n , concept : a th le tep lays fo r team , concept spor ts team fa lconsconcep t a th le te roddy wh i te , concept : athletehomestadium , concept stadiumoreventvenue georgia domeconcep t a th le te roddy wh i te , concept : a th le tep lays fo r team , concept spor ts team fa lconsconcept coach deangelo hal l , concept : athletehomestadium , concept stadiumoreventvenue georgia dome
18
Published as a conference paper at ICLR 2020
concept coach deangelo hal l , concept : a th le tep lays fo r team , concep t spor ts team oak land ra idersconcep t spo r t s league n f l , concept : teamplays in league inv , concept spor ts team new york g iantsconcept spor ts team sd chargers , concept : teamplaysagainst team inv , concept spor ts team new york g iantsconcept spor ts team fa lcons , concept : teamplaysagainstteam , concept spor ts team new york g iantsconcept spor ts team fa lcons , concept : teamplaysagainst team inv , concept spor ts team new york g iantsconcept spor ts team oak land ra iders , concept : teamplaysagainst team inv , concept spor ts team new york g iantsconcept spor ts team oak land ra iders , concept : teamplaysagainstteam , concept spor ts team sd chargersconcept spor ts team oak land ra iders , concept : teamplaysagainst team inv , concept spor ts team sd chargersconcept spor ts team oak land ra iders , concept : teamplaysagainstteam , concept spor ts team fa lconsconcept spor ts team oak land ra iders , concept : teamplaysagainst team inv , concept spor ts team fa lconsconcept spor ts team oak land ra iders , concept : agentcompeteswithagent , concep t spor ts team oak land ra idersconcept spor ts team oak land ra iders , concept : agentcompeteswithagent inv , concept spor ts team oak land ra idersconcept spor ts team new york g iants , concept : teamplaysagainstteam , concept spor ts team sd chargersconcept spor ts team new york g iants , concept : teamplaysagainstteam , concept spor ts team fa lconsconcept spor ts team new york g iants , concept : teamplaysagainst team inv , concept spor ts team fa lconsconcept spor ts team new york g iants , concept : teamplaysagainstteam , concept spor ts team oak land ra iders
10.2 MORE RESULTS
concept_personnorthamerica_matt_treanor
concept_sportsteamposition_center
concept_sport_baseball
concept_personus_orlando_hudson
concept_athlete_ben_hendrickson
concept_athlete_hunter_pence
concept_athlete_gary_carter
concept_athlete_kevin_cash
concept_athlete_scott_linebrink
concept_athlete_jeff_kentconcept_athlete_freddy_sanchez
concept_athlete_mike_mussina
concept_athlete_matt_clement
concept_coach_ian_snell
concept_athlete_shin_soo_choo
concept_athlete_justin_verlander
concept_athlete_aaron_hill
concept_coach_alex_gordon
concept_athlete_pelfreyconcept_athlete_rajai_davis
concept_athlete_justin_morneau
concept_coach_j_j__hardy
concept_athlete_steve_carlton
concept_sportsleague_mlb
concept_sportsteam_twins
concept_athlete_kei_igawa
concept_coach_adam_laroche
concept_male_mike_stanton
concept_athlete_jake_peavy
concept_coach_seth_mcclung
concept_athlete_mark_hendrickson
concept_personmexico_jason_giambi
concept_athlete_russell_branyan
concept_personnorthamerica_matt_treanorconcept_sportsteamposition_center
concept_sport_baseball
concept_personus_orlando_hudson
concept_athlete_ben_hendrickson
concept_athlete_hunter_pence
concept_athlete_gary_carter
concept_athlete_kevin_cash
concept_sportsleague_mlb
concept_coach_j_j__hardy
Figure 10: AthletePlaysInLeague. The head is concept personnorthamerica matt treanor, the queryrelation is concept:athleteplaysinleague, and the tail is concept sportsleague mlb. The left is a full sub-graph derived with max attending from per step=20, and the right is a further pruned subgraph fromthe left based on attention. The big yellow node represents the head, and the big red node representsthe tail. Color on the rest indicates attention scores over a T -step reasoning process, where greymeans less attention, yellow means more attention gained during early steps, and red means gainingmore attention when getting closer to the final step.
For the AthletePlaysInLeague taskQuery : ( concept personnor thamer ica mat t t reanor , concept : a th le tep lays in league , concept spor ts league mlb )
Selected key edges :concept personnor thamer ica mat t t reanor , concept : a t h l e t e f l y o u t t o s p o r t s t e a m p o s i t i o n , concep t spo r t s teampos i t i on cen te rconcept personnor thamer ica mat t t reanor , concept : a t h l e t ep l aysspo r t , concep t spo r t baseba l lconcep t spor ts teampos i t i on cen te r , concept : a t h l e t e f l y o u t t o s p o r t s t e a m p o s i t i o n i n v , concept personus or lando hudsonconcep t spor ts teampos i t i on cen te r , concept : a t h l e t e f l y o u t t o s p o r t s t e a m p o s i t i o n i n v , concept a th le te ben hendr icksonconcep t spor ts teampos i t i on cen te r , concept : a t h l e t e f l y o u t t o s p o r t s t e a m p o s i t i o n i n v , co nc ep t co ac h j j ha rd yconcep t spor ts teampos i t i on cen te r , concept : a t h l e t e f l y o u t t o s p o r t s t e a m p o s i t i o n i n v , concep t a th le te hun te r penceconcep t spor t baseba l l , concept : a t h l e t e p l a y s s p o r t i n v , concept personus or lando hudsonconcep t spor t baseba l l , concept : a t h l e t e p l a y s s p o r t i n v , concept a th le te ben hendr icksonconcep t spor t baseba l l , concept : a t h l e t e p l a y s s p o r t i n v , c onc ep t co ac h j j ha rd yconcep t spor t baseba l l , concept : a t h l e t e p l a y s s p o r t i n v , concep t a th le te hun te r penceconcept personus or lando hudson , concept : a th le tep lays in league , concept spor ts league mlbconcept personus or lando hudson , concept : a t h l e t ep l aysspo r t , concep t spo r t baseba l lconcept a th le te ben hendr ickson , concept : coachesinleague , concept spor ts league mlbconcept a th le te ben hendr ickson , concept : a t h l e t ep l aysspo r t , concep t spo r t baseba l l
19
Published as a conference paper at ICLR 2020
concep t coach j j ha rdy , concept : coachesinleague , concept spor ts league mlbconcep t coach j j ha rdy , concept : a th le tep lays in league , concept spor ts league mlbconcep t coach j j ha rdy , concept : a t h l e t ep l ayssp o r t , concep t spo r t baseba l lconcept a th le te hunter pence , concept : a th le tep lays in league , concept spor ts league mlbconcept a th le te hunter pence , concept : a t h l e t ep l aysspo r t , concep t spo r t baseba l lconcept spor ts league mlb , concept : coachesin league inv , concep t a th le te ben hendr icksonconcept spor ts league mlb , concept : coachesin league inv , con ce p t co ac h j j ha rd y
concept_athlete_eli_manning
concept_sportsteam_new_york_giants
concept_sportsleague_nfl
concept_male_archie_manning
concept_athlete_joe_bradley
concept_sport_football
concept_awardtrophytournament_super_bowl
concept_stadiumoreventvenue_giants_stadium
concept_sportsteam_patsconcept_city_philadelphia
concept_athlete_barry_bonds
concept_sportsteamposition_center
concept_bone_knee
concept_athlete_blair_betts
concept_personus_jeremy_shockey
concept_athlete_phil_simms
concept_sport_basketball
concept_athlete_rich_auriliaconcept_athlete_justin_tuck
concept_coach_john_mcgraw
concept_athlete_lawrence_taylor
concept_athlete_steve_smith
concept_lake_new
concept_city_east_rutherford
concept_sportsleague_nhl
concept_sportsteam_new_york_jets
concept_city_new_york
concept_person_belichick
concept_athlete_john_cassell
concept_stadiumoreventvenue_paul_brown_stadiumconcept_sportsteam_colts
concept_stadiumoreventvenue_mcafee_coliseum
concept_awardtrophytournament_super_bowl_xlii
concept_stadiumoreventvenue_lucas_oil_stadium
concept_stadiumoreventvenue_meadowlands_stadium
concept_stadiumoreventvenue_gillette_stadium
concept_stadiumoreventvenue_izod_center
concept_stadiumoreventvenue_metrodome
concept_geopoliticalorganization_new_england
concept_geopoliticallocation_national
concept_athlete_ensberg
concept_stateorprovince_states
concept_sport_hockey
concept_stadiumoreventvenue_united_center
concept_athlete_eli_manning
concept_sportsteam_new_york_giants
concept_sportsleague_nfl
concept_male_archie_manning
concept_athlete_joe_bradley
concept_sport_football
concept_stadiumoreventvenue_giants_stadium
concept_lake_new
concept_city_east_rutherford
concept_sportsleague_nhl
concept_stadiumoreventvenue_paul_brown_stadium
concept_stadiumoreventvenue_mcafee_coliseum
concept_stadiumoreventvenue_lucas_oil_stadium
Figure 11: AthleteHomeStadium. The head is concept athlete eli manning, the query relation isconcept:athletehomestadium, and the tail is concept stadiumoreventvenue giants stadium. The left is a fullsubgraph derived with max attending from per step=20, and the right is a further pruned subgraphfrom the left based on attention. The big yellow node represents the head, and the big red noderepresents the tail. Color on the rest indicates attention scores over a T -step reasoning process,where grey means less attention, yellow means more attention gained during early steps, and redmeans gaining more attention when getting closer to the final step.
For the AthleteHomeStadium taskQuery : ( concep t a th le te e l i mann ing , concept : athletehomestadium , concept s tad iumoreventvenue giants s tad ium )
Selected key edges :concep t a th le te e l i mann ing , concept : personbe longstoorgan izat ion , concept spor ts team new york g iantsconcep t a th le te e l i mann ing , concept : a th le tep lays fo r team , concept spor ts team new york g iantsconcep t a th le te e l i mann ing , concept : a th le te ledspor ts team , concept spor ts team new york g iantsconcep t a th le te e l i mann ing , concept : a th le tep lays in league , concep t spo r t s l eague n f lconcep t a th le te e l i mann ing , concept : f a the ro fpe rson inv , concept male archie manningconcept spor ts team new york g iants , concept : teamplaysinleague , concep t spo r t s l eague n f lconcept spor ts team new york g iants , concept : teamhomestadium , concept s tad iumoreventvenue giants s tad iumconcept spor ts team new york g iants , concept : personbe longs toorgan iza t ion inv , concep t a th le te e l i mann ingconcept spor ts team new york g iants , concept : a th l e t ep lays fo r t eam inv , concep t a th le te e l i mann ingconcept spor ts team new york g iants , concept : a th le te ledspo r t s team inv , concep t a th le te e l i mann ingconcep t spo r t s league n f l , concept : teamplays in league inv , concept spor ts team new york g iantsconcep t spo r t s league n f l , concept : agentcompeteswithagent , concep t spo r t s l eague n f lconcep t spo r t s league n f l , concept : agentcompeteswithagent inv , concep t spo r t s l eague n f lconcep t spo r t s league n f l , concept : leaguestadiums , concept s tad iumoreventvenue giants s tad iumconcep t spo r t s league n f l , concept : a t h l e t e p l a y s i n l e a g u e i n v , concep t a th le te e l i mann ingconcept male archie manning , concept : fa thero fperson , concep t a th le te e l i mann ingconcep t spo r t s league n f l , concept : leaguestadiums , concept stadiumoreventvenue paul brown stadiumconcept s tad iumoreventvenue giants stad ium , concept : teamhomestadium inv , concept spor ts team new york g iantsconcept s tad iumoreventvenue giants stad ium , concept : leaguestadiums inv , concep t spo r t s l eague n f lconcept s tad iumoreventvenue giants stad ium , concept : p rox y fo r i n v , c o n c e p t c i t y e a s t r u t h e r f o r dc o n c e p t c i t y e a s t r u t h e r f o r d , concept : p roxy for , concept s tad iumoreventvenue giants s tad iumconcept stadiumoreventvenue paul brown stadium , concept : leaguestadiums inv , concep t spo r t s l eague n f l
For the AthletePlaysSport task
20
Published as a conference paper at ICLR 2020
concept_athlete_vernon_wells
concept_sportsleague_mlb
concept_sportsteam_blue_jays
concept_awardtrophytournament_world_series
concept_sportsteamposition_center
concept_sportsteam_yankees
concept_sportsteam_pittsburgh_pirates
concept_sportsteam_dodgers
concept_sportsteam_twins
concept_sport_baseball
concept_sportsteam_white_sox
concept_sportsteam_new_york_metsconcept_sportsteam_red_sox
concept_sportsteam_phillies
concept_sportsteam_cleveland_indians_organizationconcept_sportsteam_los_angeles_dodgers
concept_sportsteam_red_sox_this_seasonconcept_sportsteam_orioles
concept_sportsteam_detroit_tigersconcept_sportsteam_philadelphia_athletics
concept_sportsteam_washington_nationals
concept_sportsteam_mariners
concept_sportsteam_louisville_cardinals
concept_sportsteam_st___louis_cardinals
concept_country_netherlandsconcept_country_china
concept_country___america
concept_sportsgame_n1951_world_series
concept_sportsgame_aldsconcept_stadiumoreventvenue_us_cellular_field
concept_sportsteam_eagles
concept_country_canada_canada
concept_awardtrophytournament_championships
concept_sport_hockey
concept_sport_golf
concept_sport_football
concept_sport_ski
concept_sport_basketballconcept_geopoliticallocation_national
concept_sport_skiingconcept_country_countries
concept_publication_people_
concept_beverage_teaconcept_sport_cricket
concept_beverage_beer
concept_person_william001
concept_sport_soccerconcept_bank_national
concept_bank_china_construction_bank
concept_stateorprovince_states
concept_country_new_zealandconcept_country_u_s_
concept_hobby_hobbies
concept_country_the_united_kingdom
concept_country_thailand
concept_country_france_france
concept_country_switzerland
concept_awardtrophytournament_prizes
concept_country_spain
concept_country_britain
concept_country_wales
concept_country_portugal
concept_country_malaysiaconcept_country_bulgaria
concept_athlete_vernon_wells
concept_sportsleague_mlb
concept_sportsteam_blue_jaysconcept_awardtrophytournament_world_series
concept_sportsteamposition_center
concept_sportsteam_yankeesconcept_sportsteam_pittsburgh_pirates
concept_sportsteam_dodgers
concept_sportsteam_twins
concept_sport_baseball
concept_sport_hockey
concept_sport_golf
concept_sport_football
concept_sport_ski
concept_country_new_zealand
Figure 12: AthletePlaysSport. The head is concept athlete vernon wells, the query relation is con-cept:athleteplayssport, and the tail is concept sport baseball. The left is a full subgraph derived withmax attending from per step=20, and the right is a further pruned subgraph from the left based on at-tention. The big yellow node represents the head, and the big red node represents the tail. Color onthe rest indicates attention scores over a T -step reasoning process, where grey means less attention,yellow means more attention gained during early steps, and red means gaining more attention whengetting closer to the final step.
Query : ( concep t a th le te ve rnon we l l s , concept : a t h l e t ep l aysspo r t , concep t spo r t baseba l l )
Selected key edges :concep t a th le te ve rnon we l l s , concept : a th le tep lays in league , concept spor ts league mlbconcep t a th le te ve rnon we l l s , concept : coachwontrophy , concept awardt rophytournament wor ld ser iesconcep t a th le te ve rnon we l l s , concept : agen tco l l abo ra tesw i thagen t inv , concep t spor ts team blue jaysconcep t a th le te ve rnon we l l s , concept : personbelongstoorgan izat ion , concep t spor ts team blue jaysconcep t a th le te ve rnon we l l s , concept : a th le tep lays fo r team , concep t spor ts team blue jaysconcep t a th le te ve rnon we l l s , concept : a th le te ledspor ts team , concep t spor ts team blue jaysconcept spor ts league mlb , concept : teamplays in league inv , concept sportsteam dodgersconcept spor ts league mlb , concept : teamplays in league inv , concept sportsteam yankeesconcept spor ts league mlb , concept : teamplays in league inv , concep t spo r t s t eam p i t t sbu rgh p i r a t esconcept awardt rophytournament wor ld ser ies , concept : teamwontrophy inv , concept sportsteam dodgersconcept awardt rophytournament wor ld ser ies , concept : teamwontrophy inv , concept sportsteam yankeesconcept awardt rophytournament wor ld ser ies , concept : awardtrophytournament is thechampionshipgameof thenat ionalspor t ,
concep t spo r t baseba l lconcept awardt rophytournament wor ld ser ies , concept : teamwontrophy inv , concep t spo r t s t eam p i t t sbu rgh p i r a t esconcept spor ts team blue jays , concept : teamplaysinleague , concept spor ts league mlbconcept spor ts team blue jays , concept : teamplaysagainstteam , concept sportsteam yankeesconcept spor ts team blue jays , concept : teamplayssport , concep t spo r t baseba l lconcept sportsteam dodgers , concept : teamplaysagainstteam , concept sportsteam yankeesconcept sportsteam dodgers , concept : teamplaysagainst team inv , concept sportsteam yankeesconcept sportsteam dodgers , concept : teamwontrophy , concept awardt rophytournament wor ld ser iesconcept sportsteam dodgers , concept : teamplayssport , concep t spo r t baseba l lconcept sportsteam yankees , concept : teamplaysagainstteam , concept sportsteam dodgersconcept sportsteam yankees , concept : teamplaysagainst team inv , concept sportsteam dodgersconcept sportsteam yankees , concept : teamwontrophy , concept awardt rophytournament wor ld ser iesconcept sportsteam yankees , concept : teamplayssport , concep t spo r t baseba l lconcept sportsteam yankees , concept : teamplaysagainstteam , concep t spo r t s t eam p i t t sbu rgh p i r a t esconcept sportsteam yankees , concept : teamplaysagainst team inv , concep t spo r t s t eam p i t t sbu rgh p i r a t esconcep t spor t baseba l l , concept : teamplaysspor t inv , concept sportsteam dodgersconcep t spor t baseba l l , concept : teamplaysspor t inv , concept sportsteam yankeesconcep t spor t baseba l l , concept : awardt rophytournament is thechampionsh ipgameof thenat iona lspor t inv ,
concept awardt rophytournament wor ld ser iesconcep t spor t baseba l l , concept : teamplaysspor t inv , concep t spo r t s t eam p i t t sbu rgh p i r a t esconcep t spo r t s team p i t t sbu rgh p i ra tes , concept : teamplaysagainstteam , concept sportsteam yankeesconcep t spo r t s team p i t t sbu rgh p i ra tes , concept : teamplaysagainst team inv , concept sportsteam yankees
21
Published as a conference paper at ICLR 2020
concep t spo r t s team p i t t sbu rgh p i ra tes , concept : teamwontrophy , concept awardt rophytournament wor ld ser iesconcep t spo r t s team p i t t sbu rgh p i ra tes , concept : teamplayssport , concep t spo r t baseba l l
concept_sportsteam_red_wings
concept_athlete_lidstrom
concept_sportsteam_blue_jackets
concept_sportsteam_montreal_canadiens
concept_sportsteam_anaheim_ducks
concept_sportsteam_columbus_blue_jackets
concept_athlete_chelios
concept_sportsteam_hawks
concept_sportsteam_edmonton_oilers
concept_sportsteam_chicago_blackhawks
concept_sportsteam_flyers_playoff_ticketsconcept_sportsteam_pittsburgh_penguins
concept_sportsteam_l_a__kings
concept_sportsteam_colorado_avalanche
concept_sportsteam_dallas_stars
concept_stateorprovince_new_york
concept_sportsteam_devils
concept_sportsteam_buffalo_sabres
concept_sportsgame_series
concept_sportsteam_bruins
concept_sportsteam_kings_college
concept_sport_hockey
concept_sportsleague_nhl
concept_sport_basketball
concept_sportsteam_leafsconcept_stadiumoreventvenue_joe_louis_arena
concept_sportsteam_blackhawks
concept_sportsteam_rangers
concept_sportsteam_capitals
concept_sportsteam_new_york_islanders
concept_sportsteam_oilers
concept_sportsteam_ottawa_senators
concept_sportsteam_spurs
concept_sportsteam_tampa_bay_lightning
concept_country___america
concept_country_republic_of_india
concept_country_canada_canada
concept_sportsequipment_ball
concept_awardtrophytournament_championshipsconcept_sportsequipment_clubs
concept_sportsequipment_hockey_sticks
concept_stadiumoreventvenue_td_banknorth_garden
concept_tool_accessories
concept_stadiumoreventvenue_pete_times_forum
concept_stadiumoreventvenue_staples_center
concept_sportsteamposition_forward
concept_hobby_hobbies
concept_sport_football
concept_sport_baseballconcept_sport_golf
concept_company_national
concept_sport_soccer
concept_country_usaconcept_sport_wrestling
concept_sport_team concept_sport_fitness
concept_sportsteam_packers
concept_hobby_hiking
concept_sport_sports
concept_hobby_fishing
concept_country_england
concept_country_iraq
concept_country_russia
concept_coach_billy_butler
concept_sportsteamposition_quarterback
concept_sportsteam_red_wings
concept_athlete_lidstrom
concept_sportsteam_blue_jackets
concept_sportsteam_montreal_canadiens
concept_sportsteam_anaheim_ducksconcept_sportsteam_columbus_blue_jackets
concept_sport_hockeyconcept_sportsleague_nhl
concept_sport_basketball
concept_sportsteam_leafs
concept_country___america
concept_sport_football
concept_sport_baseball
concept_sport_golf
Figure 13: TeamPlaysSport. The head is concept sportsteam red wings, the query relation is con-cept:teamplayssport, and the tail is concept sport hockey. The left is a full subgraph derived withmax attending from per step=20, and the right is a further pruned subgraph from the left based onattention. The big yellow node represents the head, and the big red node represents the tail. Coloron the rest indicates attention scores over a T -step reasoning process, where grey means less atten-tion, yellow means more attention gained during early steps, and red means gaining more attentionwhen getting closer to the final step.
For the TeamPlaysSport taskQuery : ( concept spor ts team red wings , concept : teamplayssport , concept spor t hockey )
Selected key edges :concept spor ts team red wings , concept : teamplaysagainstteam , concept spor ts team montrea l canadiensconcept spor ts team red wings , concept : teamplaysagainst team inv , concept spor ts team montrea l canadiensconcept spor ts team red wings , concept : teamplaysagainstteam , concep t spor ts team b lue jacke tsconcept spor ts team red wings , concept : teamplaysagainst team inv , concep t spor ts team b lue jacke tsconcept spor ts team red wings , concept : works fo r inv , c o n c e p t a t h l e t e l i d s t r o mconcept spor ts team red wings , concept : o rgan iza t ionh i redperson , c o n c e p t a t h l e t e l i d s t r o mconcept spor ts team red wings , concept : a th l e t ep lays fo r t eam inv , c o n c e p t a t h l e t e l i d s t r o mconcept spor ts team red wings , concept : a th le te ledspo r t s team inv , c o n c e p t a t h l e t e l i d s t r o mconcept spor ts team montreal canadiens , concept : teamplaysagainstteam , concept spor ts team red wingsconcept spor ts team montreal canadiens , concept : teamplaysagainst team inv , concept spor ts team red wingsconcept spor ts team montreal canadiens , concept : teamplaysinleague , concep t spor ts league nh lconcept spor ts team montreal canadiens , concept : teamplaysagainstteam , concep t spor ts team lea fsconcept spor ts team montreal canadiens , concept : teamplaysagainst team inv , concep t spor ts team lea fsconcept spor ts team blue jacke ts , concept : teamplaysagainstteam , concept spor ts team red wingsconcept spor ts team blue jacke ts , concept : teamplaysagainst team inv , concept spor ts team red wingsconcept spor ts team blue jacke ts , concept : teamplaysinleague , concep t spor ts league nh lc o n c e p t a t h l e t e l i d s t r o m , concept : worksfor , concept spor ts team red wingsc o n c e p t a t h l e t e l i d s t r o m , concept : o rgan i za t i onh i redpe rson inv , concept spor ts team red wingsc o n c e p t a t h l e t e l i d s t r o m , concept : a th le tep lays fo r team , concept spor ts team red wingsc o n c e p t a t h l e t e l i d s t r o m , concept : a th le te ledspor ts team , concept spor ts team red wingsconcept spor ts team red wings , concept : teamplaysinleague , concep t spor ts league nh lconcept spor ts team red wings , concept : teamplaysagainstteam , concep t spor ts team lea fsconcept spor ts team red wings , concept : teamplaysagainst team inv , concep t spor ts team lea fsconcept spor ts league nh l , concept : agentcompeteswithagent , concep t spor ts league nh lconcept spor ts league nh l , concept : agentcompeteswithagent inv , concep t spor ts league nh lconcept spor ts league nh l , concept : teamplays in league inv , concep t spor ts team lea fsconcept spor ts team leafs , concept : teamplaysinleague , concep t spor ts league nh lconcept spor ts team leafs , concept : teamplayssport , concept spor t hockey
22
Published as a conference paper at ICLR 2020
concept_company_disney
concept_city_burbank
concept_ceo_robert_iger
concept_ceo_jeffrey_katzenberg
concept_city_abc
concept_ceo_michael_eisnerconcept_politicsblog_rights
concept_blog_espn_the_magazine
concept_website_network
concept_academicfield_media
concept_politicsissue_entertainment
concept_publication_espn
concept_company_abc_television_network
concept_personaustralia_jobs
concept_company_club_penguin
concept_televisionnetwork_abc
concept_website_infoseek
concept_company_pixar
concept_person_disney
concept_biotechcompany_the_walt_disney_co_
concept_city_new_york
concept_recordlabel_dreamworks_skg
concept_company_disney_feature_animation
concept_ceo_george_bodenheimerconcept_company_walt_disney
concept_city_emeryville
concept_personeurope_disney
concept_university_search
concept_personus_david_geffen
concept_sportsleague_espn
concept_personus_steven_spielberg
concept_musicartist_toy
concept_person_steven_spielberg
concept_city_lego
concept_personaustralia_david_geffen
concept_transportation_burbank_glendale_pasadena
concept_company_asylum
concept_politicianus_rudy_giuliani
concept_company_walt_disney_world_resort
concept_stateorprovince_illinois
concept_company_walt_disney_world
concept_company_disney
concept_city_burbank
concept_ceo_robert_iger
concept_ceo_jeffrey_katzenberg
concept_city_abc
concept_ceo_michael_eisner
concept_biotechcompany_the_walt_disney_co_
concept_city_new_york
concept_recordlabel_dreamworks_skg
concept_website_network
concept_ceo_george_bodenheimer
concept_transportation_burbank_glendale_pasadena
Figure 14: OrganizationHeadQuarteredInCity. The head is concept company disney, the queryrelation is concept:organizationheadquarteredincity, and the tail is concept city burbank. The left is afull subgraph derived with max attending from per step=20, and the right is a further pruned subgraphfrom the left based on attention. The big yellow node represents the head, and the big red noderepresents the tail. Color on the rest indicates attention scores over a T -step reasoning process,where grey means less attention, yellow means more attention gained during early steps, and redmeans gaining more attention when getting closer to the final step.
For the OrganizationHeadQuarteredInCity taskQuery : ( concept company disney , concept : o rgan i za t i onheadquar te red inc i t y , concep t c i t y burbank )
Selected key edges :concept company disney , concept : headquarteredin , concep t c i t y burbankconcept company disney , concept : subpa r t o f o rgan i za t i on i nv , concept websi te networkconcept company disney , concept : works fo r inv , concep t ceo robe r t i ge rconcept company disney , concept : p ro xy f o r i n v , concep t ceo robe r t i ge rconcept company disney , concept : pe rson leadsorgan iza t ion inv , concep t ceo robe r t i ge rconcept company disney , concept : ceoof inv , concep t ceo robe r t i ge rconcept company disney , concept : pe rson leadsorgan iza t ion inv , concep t ceo je f f r ey ka tzenbe rgconcept company disney , concept : o rgan iza t ionh i redperson , concep t ceo je f f r ey ka tzenbe rgconcept company disney , concept : o rgan iza t ion terminatedperson , concep t ceo je f f r ey ka tzenbe rgconcept c i ty burbank , concept : headquar tered in inv , concept company disneyconcept c i ty burbank , concept : headquar tered in inv , concept b io techcompany the wal t d isney coconcept websi te network , concept : subpar to fo rgan iza t i on , concept company disneyconcep t ceo robe r t i ge r , concept : worksfor , concept company disneyconcep t ceo robe r t i ge r , concept : p roxy for , concept company disneyconcep t ceo robe r t i ge r , concept : person leadsorgan iza t ion , concept company disneyconcep t ceo robe r t i ge r , concept : ceoof , concept company disneyconcep t ceo robe r t i ge r , concept : topmemberoforganizat ion , concept b io techcompany the wal t d isney coconcep t ceo robe r t i ge r , concept : o rgan iza t ion te rm ina tedperson inv , concept b io techcompany the wal t d isney coconcep t ceo je f f rey ka tzenberg , concept : person leadsorgan izat ion , concept company disneyconcep t ceo je f f rey ka tzenberg , concept : o rgan i za t i onh i redpe rson inv , concept company disneyconcep t ceo je f f rey ka tzenberg , concept : o rgan iza t ion te rm ina tedperson inv , concept company disneyconcep t ceo je f f rey ka tzenberg , concept : worksfor , concept record label dreamworks skgconcep t ceo je f f rey ka tzenberg , concept : topmemberoforganizat ion , concept record label dreamworks skgconcep t ceo je f f rey ka tzenberg , concept : o rgan iza t ion te rm ina tedperson inv , concept record label dreamworks skgconcep t ceo je f f rey ka tzenberg , concept : ceoof , concept record label dreamworks skgconcept b io techcompany the wal t d isney co , concept : headquarteredin , concep t c i t y burbankconcept b io techcompany the wal t d isney co , concept : o rgan i za t i onheadquar te red inc i t y , concep t c i t y burbankconcept record label dreamworks skg , concept : works fo r inv , concep t ceo je f f r ey ka tzenbe rgconcept record label dreamworks skg , concept : topmemberoforganizat ion inv , concep t ceo je f f r ey ka tzenbe rgconcept record label dreamworks skg , concept : o rgan iza t ion terminatedperson , concep t ceo je f f r ey ka tzenbe rgconcept record label dreamworks skg , concept : ceoof inv , concep t ceo je f f r ey ka tzenbe rgconcept c i ty burbank , concept : a i r p o r t i n c i t y i n v , concept t ranspor ta t ion burbank g lenda le pasadenaconcept t ranspor ta t ion burbank g lenda le pasadena , concept : a i r p o r t i n c i t y , concep t c i t y burbank
23
Published as a conference paper at ICLR 2020
concept_scientist_balmer
concept_university_microsoft
concept_company_microsoft
concept_personus_steve_ballmer
concept_person_robbie_bach
concept_politician_jobs
concept_person_bill
concept_personaustralia_paul_allen
concept_ceo_steve_ballmerconcept_company_gates
concept_buildingfeature_windows
concept_date_bill
concept_website_download
concept_product_powerpoint
concept_charactertrait_vista
concept_product_word_documents
concept_mlalgorithm_microsoft_word
concept_museum_steve
concept_city_outlook
concept_emotion_word
concept_consumerelectronicitem_ms_word
concept_hallwayitem_access
concept_personmexico_ryan_whitney
concept_retailstore_microsoft
concept_company_adobe
concept_beverage_new
concept_sportsteam_harvard_divinity_school
concept_coach_vulcan_inc
concept_sportsteam_state_universityconcept_biotechcompany_microsoft_corp
concept_person_edwards
concept_company_clinton
concept_politicianus_rodham_clinton
concept_geopoliticallocation_kerry
concept_person_mccain
concept_governmentorganization_representativesconcept_politicalparty_senate
concept_politicalparty_college
concept_governmentorganization_house
concept_personus_paul_allen
concept_coach_tim_murphy
concept_software_microsoft_excel
concept_ceo_paul_allen
concept_sportsteam_new_york_jets
concept_company_sunconcept_software_office_2003
concept_company_yahoo001
concept_female_hillary
concept_personafrica_george_bush
concept_stateorprovince_last_year
concept_university_yahoo
concept_geopoliticallocation_world
concept_university_google
concept_company_microsoft_corporation
concept_automobilemaker_jeff_bezos
concept_scientist_balmer
concept_university_microsoft
concept_company_microsoft
concept_personus_steve_ballmer
concept_person_robbie_bachconcept_politician_jobs
concept_person_bill
concept_personmexico_ryan_whitney
concept_retailstore_microsoft
concept_company_adobe
concept_personaustralia_paul_allen
concept_sportsteam_harvard_divinity_school
concept_sportsteam_state_university
Figure 15: WorksFor. The head is concept scientist balmer, the query relation is con-cept:worksfor, and the tail is concept university microsoft. The left is a full subgraph derived withmax attending from per step=20, and the right is a further pruned subgraph from the left based on at-tention. The big yellow node represents the head, and the big red node represents the tail. Color onthe rest indicates attention scores over a T -step reasoning process, where grey means less attention,yellow means more attention gained during early steps, and red means gaining more attention whengetting closer to the final step.
For the WorksFor taskQuery : ( concep t sc ien t i s t ba lmer , concept : worksfor , c o n c e p t u n i v e r s i t y m i c r o s o f t )
Selected key edges :concep t sc ien t i s t ba lmer , concept : topmemberoforganizat ion , concept company microsof tconcep t sc ien t i s t ba lmer , concept : o rgan iza t ion te rm ina tedperson inv , c o n c e p t u n i v e r s i t y m i c r o s o f tconcept company microsoft , concept : topmemberoforganizat ion inv , concept personus steve ba l lmerconcept company microsoft , concept : topmemberoforganizat ion inv , c o n c e p t s c i e n t i s t b a l m e rc o n c e p t u n i v e r s i t y m i c r o s o f t , concept : agentco l labora teswi thagent , concept personus steve ba l lmerc o n c e p t u n i v e r s i t y m i c r o s o f t , concept : pe rson leadsorgan iza t ion inv , concept personus steve ba l lmerc o n c e p t u n i v e r s i t y m i c r o s o f t , concept : pe rson leadsorgan iza t ion inv , c o n c e p t p e r s o n b i l lc o n c e p t u n i v e r s i t y m i c r o s o f t , concept : o rgan iza t ion terminatedperson , c o n c e p t s c i e n t i s t b a l m e rc o n c e p t u n i v e r s i t y m i c r o s o f t , concept : pe rson leadsorgan iza t ion inv , concept person robbie bachconcept personus steve bal lmer , concept : topmemberoforganizat ion , concept company microsof tconcept personus steve bal lmer , concept : agen tco l l abo ra tesw i thagen t inv , c o n c e p t u n i v e r s i t y m i c r o s o f tconcept personus steve bal lmer , concept : person leadsorgan izat ion , c o n c e p t u n i v e r s i t y m i c r o s o f tconcept personus steve bal lmer , concept : worksfor , c o n c e p t u n i v e r s i t y m i c r o s o f tconcept personus steve bal lmer , concept : p roxy for , c o n c e p t r e t a i l s t o r e m i c r o s o f tconcept personus steve bal lmer , concept : subpar to f , c o n c e p t r e t a i l s t o r e m i c r o s o f tconcept personus steve bal lmer , concept : agentcont ro ls , c o n c e p t r e t a i l s t o r e m i c r o s o f tconcep t pe rson b i l l , concept : person leadsorgan izat ion , c o n c e p t u n i v e r s i t y m i c r o s o f tconcep t pe rson b i l l , concept : worksfor , c o n c e p t u n i v e r s i t y m i c r o s o f tconcept person robbie bach , concept : person leadsorgan izat ion , c o n c e p t u n i v e r s i t y m i c r o s o f tconcept person robbie bach , concept : worksfor , c o n c e p t u n i v e r s i t y m i c r o s o f tc o n c e p t r e t a i l s t o r e m i c r o s o f t , concept : p ro xy f o r i n v , concept personus steve ba l lmerc o n c e p t r e t a i l s t o r e m i c r o s o f t , concept : subpa r to f i nv , concept personus steve ba l lmerc o n c e p t r e t a i l s t o r e m i c r o s o f t , concept : agen tcon t ro l s i nv , concept personus steve ba l lmer
For the PersonBornInLocation taskQuery : ( concept person mark001 , concept : personborn in loca t ion , con cep t co un t y yo r k c i t y )
Selected key edges :concept person mark001 , concept : persongraduatedf romunivers i ty , c o n c e p t u n i v e r s i t y c o l l e g econcept person mark001 , concept : persongraduatedschool , c o n c e p t u n i v e r s i t y c o l l e g econcept person mark001 , concept : persongraduatedf romunivers i ty , c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t yconcept person mark001 , concept : persongraduatedschool , c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t yconcept person mark001 , concept : pe rsonborn inc i t y , concept c i t y hampsh i re
24
Published as a conference paper at ICLR 2020
concept_person_mark001
concept_university_collegeconcept_university_state_university
concept_person_diane001
concept_male_world
concept_sportsteam_state_university
concept_politicalparty_college
10
concept_university_syracuse_university
concept_stateorprovince_california
concept_university_high_school
concept_sportsgame_series
concept_city_hampshire
concept_stateorprovince_georgia
concept_geopoliticallocation_world
concept_stateorprovince_massachusettsconcept_stateorprovince_maine
concept_stateorprovince_illinoisconcept_jobposition_king
concept_charactertrait_world
concept_county_york_city
concept_person_bill
concept_person_david
concept_person_greg001
concept_person_james001concept_politician_jobs
concept_language_english
concept_person_michael002
concept_person_joe002concept_person_aaron_brooks
concept_journalist_dan
concept_person_john003
concept_person_andrew001
concept_person_kevin
concept_person_jim
concept_person_adam001
concept_academicfield_science
concept_city_york
concept_lake_new
concept_buildingfeature_american
concept_city_new_yorkconcept_building_the_metropolitan
concept_stateorprovince_new_york
concept_building_metropolitan
concept_room_contemporary
concept_river_arts
concept_country_orleans
concept_governmentorganization_federal
concept_geopoliticallocation_state
concept_personeurope_whitney
concept_person_sean002
concept_person_charles001
concept_person_princess
concept_river_state
concept_person_robert003
concept_female_mary
concept_person_prince
concept_sportsleague_newconcept_book_new
concept_country_monaco
concept_country_luxembourg
concept_writer_new
concept_musicinstrument_guitar
concept_country_brazil
concept_country_sweden
concept_geopoliticalorganization_wurttemberg
concept_country_mecklenburg
concept_country_romania
concept_person_mark001
concept_university_college
concept_university_state_university
concept_person_diane001
concept_male_world
concept_sportsteam_state_university
concept_county_york_city
concept_person_bill
concept_university_syracuse_university
concept_person_david
concept_city_york
concept_lake_new
concept_city_hampshire
concept_person_adam001
concept_country_orleans
Figure 16: PersonBornInLocation. The head is concept person mark001, the query relation is con-cept:personborninlocation, and the tail is concept county york city. The left is a full subgraph derivedwith max attending from per step=20, and the right is a further pruned subgraph from the left based onattention. The big yellow node represents the head, and the big red node represents the tail. Color onthe rest indicates attention scores over a T -step reasoning process, where grey means less attention,yellow means more attention gained during early steps, and red means gaining more attention whengetting closer to the final step.
concept person mark001 , concept : hasspouse , concept person diane001concept person mark001 , concept : hasspouse inv , concept person diane001c o n c e p t u n i v e r s i t y c o l l e g e , concept : persongradua ted f romun ivers i t y inv , concept person mark001c o n c e p t u n i v e r s i t y c o l l e g e , concept : persongraduatedschool inv , concept person mark001c o n c e p t u n i v e r s i t y c o l l e g e , concept : persongradua ted f romun ivers i t y inv , c o n c e p t p e r s o n b i l lc o n c e p t u n i v e r s i t y c o l l e g e , concept : persongraduatedschool inv , c o n c e p t p e r s o n b i l lc o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t y , concept : persongradua ted f romun ivers i t y inv , concept person mark001c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t y , concept : persongraduatedschool inv , concept person mark001c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t y , concept : persongradua ted f romun ivers i t y inv , c o n c e p t p e r s o n b i l lc o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t y , concept : persongraduatedschool inv , c o n c e p t p e r s o n b i l lconcept c i ty hampsh i re , concept : pe rsonbo rn i nc i t y i nv , concept person mark001concept person diane001 , concept : persongraduatedf romunivers i ty , c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t yconcept person diane001 , concept : persongraduatedschool , c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t yconcept person diane001 , concept : hasspouse , concept person mark001concept person diane001 , concept : hasspouse inv , concept person mark001concept person diane001 , concept : personborn in loca t ion , c onc ep t co un t y yo r k c i t yc o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t y , concept : persongradua ted f romun ivers i t y inv , concept person diane001c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t y , concept : persongraduatedschool inv , concept person diane001concep t pe rson b i l l , concept : pe rsonborn inc i t y , c o n c e p t c i t y y o r kconcep t pe rson b i l l , concept : personborn in loca t ion , c o n c e p t c i t y y o r kconcep t pe rson b i l l , concept : persongraduatedf romunivers i ty , c o n c e p t u n i v e r s i t y c o l l e g econcep t pe rson b i l l , concept : persongraduatedschool , c o n c e p t u n i v e r s i t y c o l l e g econcep t pe rson b i l l , concept : persongraduatedf romunivers i ty , c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t yconcep t pe rson b i l l , concept : persongraduatedschool , c o n c e p t u n i v e r s i t y s t a t e u n i v e r s i t yconcep t c i t y yo rk , concept : pe r sonbo rn inc i t y i nv , c o n c e p t p e r s o n b i l lconcep t c i t y yo rk , concept : pe r sonbo rn inc i t y i nv , concept person diane001c o n c e p t u n i v e r s i t y c o l l e g e , concept : persongradua ted f romun ivers i t y inv , concept person diane001concept person diane001 , concept : pe rsonborn inc i t y , c o n c e p t c i t y y o r k
For the PersonLeadsOrganization taskQuery : ( c o n c e p t j o u r n a l i s t b i l l p l a n t e , concept : person leadsorgan izat ion , concept company cnn pbs )
Selected key edges :c o n c e p t j o u r n a l i s t b i l l p l a n t e , concept : worksfor , concep t te lev i s i onne twork cbsc o n c e p t j o u r n a l i s t b i l l p l a n t e , concept : agen tco l l abo ra tesw i thagen t inv , concep t te lev i s i onne twork cbsconcep t te lev is ionne twork cbs , concept : works fo r inv , c o n c e p t j o u r n a l i s t w a l t e r c r o n k i t e
25
Published as a conference paper at ICLR 2020
concept_journalist_bill_plante
concept_televisionnetwork_cbs
concept_person_edward_r__murrow
concept_personus_scott_pelley
concept_actor_daniel_schorr
concept_journalist_wyatt_andrews
concept_athlete_anthony_mason
concept_journalist_connie_chung
concept_journalist_andy_rooney
concept_journalist_ed_bradley
concept_writer_bill_geist
concept_person_byron_pitts
concept_personaustralia_harry_smith
concept_journalist_morley_safer
concept_journalist_gloria_borger
concept_writer_bernard_goldberg
concept_person_eric_sevareid
concept_journalist_lesley_stahl
concept_personmexico_david_martin
concept_journalist_walter_cronkite
concept_personaustralia_phil_jones concept_company_cnn__pbs
concept_nonprofitorganization_cbs_evening
concept_musicartist_television
concept_crustacean_tv
concept_website_cbs_evening_news
concept_city_new_york
concept_personeurope_william_paley
concept_coach_douglas_edwards
concept_academicfield_media
concept_person_sean_mcmanus
concept_televisionstation_wcscconcept_comedian_leslie_moonves
concept_person_nina_tasslerconcept_televisionstation_kion_tv
concept_televisionstation_wowk_tv
concept_televisionstation_kbci_tv
concept_televisionstation_wjtv
concept_politicsissue_entertainment
concept_person_les_moonves
concept_televisionstation_kfda
concept_televisionstation_wdtv
concept_televisionstation_khqa_tv
concept_televisionstation_wrbl
concept_televisionstation_wltx_tv
concept_website_cbs
concept_company_cbs_corp_
concept_recordlabel_universal
concept_blog_mtv
concept_journalist_bill_plante
concept_televisionnetwork_cbs
concept_person_edward_r__murrow
concept_personus_scott_pelley
concept_actor_daniel_schorr
concept_journalist_wyatt_andrews
concept_athlete_anthony_mason
concept_company_cnn__pbs
concept_nonprofitorganization_cbs_evening
concept_musicartist_television
concept_crustacean_tv
concept_journalist_walter_cronkite
concept_city_new_york
concept_personeurope_william_paley
concept_coach_douglas_edwards
concept_academicfield_media
concept_website_cbs
concept_televisionstation_wcsc
Figure 17: PersonLeadsOrganization. The head is concept journalist bill plante, the query relationis concept:organizationheadquarteredincity, and the tail is concept company cnn pbs. The left is a fullsubgraph derived with max attending from per step=20, and the right is a further pruned subgraphfrom the left based on attention. The big yellow node represents the head, and the big red noderepresents the tail. Color on the rest indicates attention scores over a T -step reasoning process,where grey means less attention, yellow means more attention gained during early steps, and redmeans gaining more attention when getting closer to the final step.
concep t te lev is ionne twork cbs , concept : agentco l labora teswi thagent , c o n c e p t j o u r n a l i s t w a l t e r c r o n k i t econcep t te lev is ionne twork cbs , concept : works fo r inv , concep t pe rsonus sco t t pe l l eyconcep t te lev is ionne twork cbs , concept : works fo r inv , c o n c e p t a c t o r d a n i e l s c h o r rconcep t te lev is ionne twork cbs , concept : works fo r inv , concept person edward r murrowconcep t te lev is ionne twork cbs , concept : agentco l labora teswi thagent , concept person edward r murrowconcep t te lev is ionne twork cbs , concept : works fo r inv , c o n c e p t j o u r n a l i s t b i l l p l a n t econcep t te lev is ionne twork cbs , concept : agentco l labora teswi thagent , c o n c e p t j o u r n a l i s t b i l l p l a n t ec o n c e p t j o u r n a l i s t w a l t e r c r o n k i t e , concept : worksfor , concep t te lev i s i onne twork cbsc o n c e p t j o u r n a l i s t w a l t e r c r o n k i t e , concept : agen tco l l abo ra tesw i thagen t inv , concep t te lev i s i onne twork cbsc o n c e p t j o u r n a l i s t w a l t e r c r o n k i t e , concept : worksfor , concep t nonp ro f i t o rgan i za t i on cbs even ingconcep t pe rsonus sco t t pe l l ey , concept : worksfor , concep t te lev i s i onne twork cbsconcep t pe rsonus sco t t pe l l ey , concept : person leadsorgan izat ion , concep t te lev i s i onne twork cbsconcep t pe rsonus sco t t pe l l ey , concept : person leadsorgan izat ion , concept company cnn pbsconcep t ac to r dan ie l scho r r , concept : worksfor , concep t te lev i s i onne twork cbsconcep t ac to r dan ie l scho r r , concept : person leadsorgan izat ion , concep t te lev i s ionne twork cbsconcep t ac to r dan ie l scho r r , concept : person leadsorgan izat ion , concept company cnn pbsconcept person edward r murrow , concept : worksfor , concep t te lev i s i onne twork cbsconcept person edward r murrow , concept : agen tco l l abo ra tesw i thagen t inv , concep t te lev i s i onne twork cbsconcept person edward r murrow , concept : person leadsorgan izat ion , concep t te lev i s i onne twork cbsconcept person edward r murrow , concept : person leadsorgan izat ion , concept company cnn pbsconcep t te lev is ionne twork cbs , concept : o rgan i za t i onheadquar te red inc i t y , concep t c i t y new yorkconcep t te lev is ionne twork cbs , concept : headquarteredin , concep t c i t y new yorkconcep t te lev is ionne twork cbs , concept : agentco l labora teswi thagent , concept personeurope wi l l i am pa leyconcep t te lev is ionne twork cbs , concept : topmemberoforganizat ion inv , concept personeurope wi l l i am pa leyconcept company cnn pbs , concept : headquarteredin , concep t c i t y new yorkconcept company cnn pbs , concept : personbe longs toorgan iza t ion inv , concept personeurope wi l l i am pa leyconcep t nonpro f i t o rgan iza t i on cbs even ing , concept : works fo r inv , c o n c e p t j o u r n a l i s t w a l t e r c r o n k i t econcept c i t y new york , concept : o rgan i za t i onheadqua r t e red i nc i t y i nv , concep t te lev i s i onne twork cbsconcept c i t y new york , concept : headquar tered in inv , concep t te lev i s i onne twork cbsconcept c i t y new york , concept : headquar tered in inv , concept company cnn pbsconcept personeurope wi l l iam paley , concept : agen tco l l abo ra tesw i thagen t inv , concep t te lev i s i onne twork cbsconcept personeurope wi l l iam paley , concept : topmemberoforganizat ion , concep t te lev i s i onne twork cbsconcept personeurope wi l l iam paley , concept : personbe longstoorgan izat ion , concept company cnn pbsconcept personeurope wi l l iam paley , concept : person leadsorgan izat ion , concept company cnn pbs
For the OrganizationHiredPerson task
26
Published as a conference paper at ICLR 2020
concept_stateorprovince_afternoon
concept_dateliteral_n2007
concept_dateliteral_n2006concept_date_n2003
concept_dateliteral_n2002
concept_dateliteral_n2008
concept_dateliteral_n2005concept_date_n2004concept_date_n2001
concept_date_n1996
concept_date_n1999concept_year_n1991
concept_year_n1998concept_date_n2000
concept_book_new
concept_lake_new
concept_university_national
concept_city_home
concept_country_united_states
concept_city_service
concept_governmentorganization_law
concept_governmentorganization_program
concept_book_morning
concept_programminglanguage_project
concept_academicfield_directors
concept_musicsong_nightconcept_website_tour
concept_country_israel
concept_musicsong_end
concept_book_years
concept_country_left_parties
concept_website_visit
concept_governmentorganization_fire
concept_academicfield_trial
concept_website_trip
concept_governmentorganization_launch
concept_company_case
concept_personmexico_ryan_whitney
concept_year_n1997
concept_year_n1992
concept_year_n1994concept_year_n1995
concept_date_n1993
concept_date_n2009
concept_year_n1986concept_date_n1968
concept_year_n1989
concept_year_n1975
concept_year_n1982
concept_director_committee
concept_dayofweek_wednesday
concept_date_n1944
concept_dateliteral_n1990
concept_year_n1978
concept_year_n1967concept_dateliteral_n1987
concept_tradeunion_congress
concept_governmentorganization_house
concept_governmentorganization_epa
concept_governmentorganization_commission
concept_nongovorganization_council
concept_politicsblog_white_house
concept_country_party
concept_geopoliticallocation_iraqconcept_county_records
concept_city_capital
concept_city_team
concept_visualizablething_use
concept_biotechcompany_china
concept_governmentorganization_action
concept_governmentorganization_representatives
concept_city_members
concept_personus_party
concept_person_state
concept_buildingfeature_window
concept_politicianus_president_george_w__bush
concept_person_president
concept_person_mugabe
concept_personasia_number
concept_buildingfeature_windows
concept_astronaut_herbert_hoover
concept_stateorprovince_afternoon
concept_dateliteral_n2007concept_dateliteral_n2006
concept_date_n2003
concept_dateliteral_n2002
concept_dateliteral_n2008
concept_city_home
concept_country_united_states
concept_city_service
concept_governmentorganization_lawconcept_governmentorganization_program
concept_personmexico_ryan_whitneyconcept_year_n1997
concept_year_n1992
concept_year_n1994
concept_tradeunion_congressconcept_governmentorganization_house
concept_governmentorganization_epa
concept_governmentorganization_commission
concept_personus_party
concept_country_left_parties
concept_person_state
Figure 18: OrganizationHiredPerson. The head is concept stateorprovince afternoon, the query rela-tion is concept:organizationhiredperson, and the tail is concept personmexico ryan whitney. The left is afull subgraph derived with max attending from per step=20, and the right is a further pruned subgraphfrom the left based on attention. The big yellow node represents the head, and the big red node rep-resents the tail. Color on the rest indicates attention scores over a T -step reasoning process, wheregrey means less attention, yellow means more attention gained during early steps, and red meansgaining more attention when getting closer to the final step.
Query : ( concep t s ta teo rp rov ince a f te rnoon , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tney )
Selected key edges :concep t s ta teo rp rov ince a f te rnoon , concept : atdate , c o n c e p t d a t e l i t e r a l n 2 0 0 7concep t s ta teo rp rov ince a f te rnoon , concept : atdate , concept date n2003concep t s ta teo rp rov ince a f te rnoon , concept : atdate , c o n c e p t d a t e l i t e r a l n 2 0 0 6concep t da te l i t e ra l n2007 , concept : a tda te inv , concep t coun t r y un i t ed s ta tesconcep t da te l i t e ra l n2007 , concept : a tda te inv , concept c i ty homeconcep t da te l i t e ra l n2007 , concept : a tda te inv , c o n c e p t c i t y s e r v i c econcep t da te l i t e ra l n2007 , concept : a tda te inv , c o n c e p t c o u n t r y l e f t p a r t i e sconcept date n2003 , concept : a tda te inv , concep t coun t r y un i t ed s ta tesconcept date n2003 , concept : a tda te inv , concept c i ty homeconcept date n2003 , concept : a tda te inv , c o n c e p t c i t y s e r v i c econcept date n2003 , concept : a tda te inv , c o n c e p t c o u n t r y l e f t p a r t i e sconcep t da te l i t e ra l n2006 , concept : a tda te inv , concep t coun t r y un i t ed s ta tesconcep t da te l i t e ra l n2006 , concept : a tda te inv , concept c i ty homeconcep t da te l i t e ra l n2006 , concept : a tda te inv , c o n c e p t c i t y s e r v i c econcep t da te l i t e ra l n2006 , concept : a tda te inv , c o n c e p t c o u n t r y l e f t p a r t i e sconcep t coun t r y un i t ed s ta tes , concept : atdate , concept year n1992concep t coun t r y un i t ed s ta tes , concept : atdate , concept year n1997concep t coun t r y un i t ed s ta tes , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyconcept c i ty home , concept : atdate , concept year n1992concept c i ty home , concept : atdate , concept year n1997concept c i ty home , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyc o n c e p t c i t y s e r v i c e , concept : atdate , concept year n1992c o n c e p t c i t y s e r v i c e , concept : atdate , concept year n1997c o n c e p t c i t y s e r v i c e , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyc o n c e p t c o u n t r y l e f t p a r t i e s , concept : works fo r inv , concept personmexico ryan whi tneyc o n c e p t c o u n t r y l e f t p a r t i e s , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyconcept year n1992 , concept : a tda te inv , concept governmentorganizat ion houseconcept year n1992 , concept : a tda te inv , concep t coun t r y un i t ed s ta tesconcept year n1992 , concept : a tda te inv , concept c i ty homeconcept year n1992 , concept : a tda te inv , concept t radeunion congressconcept year n1997 , concept : a tda te inv , concept governmentorganizat ion houseconcept year n1997 , concept : a tda te inv , concep t coun t r y un i t ed s ta tesconcept year n1997 , concept : a tda te inv , concept c i ty homeconcept personmexico ryan whi tney , concept : worksfor , concept governmentorganizat ion house
27
Published as a conference paper at ICLR 2020
concept personmexico ryan whi tney , concept : worksfor , concept t radeunion congressconcept personmexico ryan whi tney , concept : worksfor , c o n c e p t c o u n t r y l e f t p a r t i e sconcept governmentorganizat ion house , concept : personbe longs toorgan iza t ion inv , concept personus par tyconcept governmentorganizat ion house , concept : works fo r inv , concept personmexico ryan whi tneyconcept governmentorganizat ion house , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyconcept t radeunion congress , concept : o rgan iza t ionh i redperson , concept personus par tyconcept t radeunion congress , concept : works fo r inv , concept personmexico ryan whi tneyconcept t radeunion congress , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyc o n c e p t c o u n t r y l e f t p a r t i e s , concept : o rgan iza t ionh i redperson , concept personus par ty
concept_person_mark001
concept_sportsteam_state_university
concept_male_world
concept_politicalparty_college
concept_sportsgame_series
concept_university_state_university
concept_charactertrait_world
concept_university_college
concept_person_diane001
concept_university_high_school
concept_televisionshow_passionconcept_musicsong_gospel
10
concept_university_syracuse_university
concept_city_hampshireconcept_person_louise
concept_jobposition_king
concept_county_york_city
concept_stateorprovince_california
concept_stateorprovince_maine
concept_stateorprovince_georgia
concept_person_greg001concept_person_michael002
concept_stateorprovince_author
concept_person_bill
concept_person_kevin
concept_person_john003
concept_politician_jobs
concept_person_mike
concept_person_stephen
concept_person_fred
concept_person_karen
concept_person_william001
concept_person_tom001
concept_journalist_dan
concept_person_joe002
concept_person_adam001
concept_politician_james
concept_eventoutcome_result
concept_geopoliticallocation_world
concept_recordlabel_friends
concept_politicalparty_house
concept_company_apple001
concept_televisionnetwork_cbs
concept_company_apple002
3
concept_automobilemaker_announcement
concept_personmexico_ryan_whitney
concept_museum_steve
concept_city_downtown_manhattanconcept_attraction_nycconcept_city_nyc_
concept_personasia_number
concept_scientist_no_
concept_country_united_statesconcept_geopoliticallocation_agencies
concept_company_nbc
concept_city_team
concept_terroristorganization_state
concept_automobilemaker_jeff_bezos
concept_ceo_richard
concept_lake_new
concept_city_service
concept_governmentorganization_u_s__department
concept_person_mark001
concept_sportsteam_state_university
concept_male_world
concept_politicalparty_college
concept_sportsgame_series concept_university_state_university
concept_person_greg001
concept_person_michael002
concept_stateorprovince_author
concept_geopoliticallocation_world
concept_recordlabel_friends
concept_personmexico_ryan_whitney
concept_politician_jobs
concept_eventoutcome_result
concept_museum_steve
Figure 19: AgentBelongsToOrganization. The head is concept person mark001, the query relationis concept:agentbelongstoorganization, and the tail is concept geopoliticallocation world. The left is afull subgraph derived with max attending from per step=20, and the right is a further pruned subgraphfrom the left based on attention. The big yellow node represents the head, and the big red noderepresents the tail. Color on the rest indicates attention scores over a T -step reasoning process,where grey means less attention, yellow means more attention gained during early steps, and redmeans gaining more attention when getting closer to the final step.
For the AgentBelongsToOrganization taskQuery : ( concept person mark001 , concept : agentbe longstoorgan iza t ion , c o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l d )
Selected key edges :concept person mark001 , concept : personbe longstoorgan izat ion , c o n c e p t s p o r t s t e a m s t a t e u n i v e r s i t yconcept person mark001 , concept : agentco l labora teswi thagent , concept male wor ldconcept person mark001 , concept : agen tco l l abo ra tesw i thagen t inv , concept male wor ldconcept person mark001 , concept : personbe longstoorgan izat ion , c o n c e p t p o l i t i c a l p a r t y c o l l e g econcep t spo r t s t eam s ta te un i ve rs i t y , concept : personbe longs toorgan iza t ion inv , c o n c e p t p o l i t i c i a n j o b sconcep t spo r t s t eam s ta te un i ve rs i t y , concept : personbe longs toorgan iza t ion inv , concept person mark001concep t spo r t s t eam s ta te un i ve rs i t y , concept : personbe longs toorgan iza t ion inv , concept person greg001concep t spo r t s t eam s ta te un i ve rs i t y , concept : personbe longs toorgan iza t ion inv , concept person michael002concept male world , concept : agentco l labora teswi thagent , c o n c e p t p o l i t i c i a n j o b sconcept male world , concept : agen tco l l abo ra tesw i thagen t inv , c o n c e p t p o l i t i c i a n j o b sconcept male world , concept : agentco l labora teswi thagent , concept person mark001concept male world , concept : agen tco l l abo ra tesw i thagen t inv , concept person mark001concept male world , concept : agentco l labora teswi thagent , concept person greg001concept male world , concept : agen tco l l abo ra tesw i thagen t inv , concept person greg001concept male world , concept : agentcont ro ls , concept person greg001concept male world , concept : agentco l labora teswi thagent , concept person michael002concept male world , concept : agen tco l l abo ra tesw i thagen t inv , concept person michael002c o n c e p t p o l i t i c a l p a r t y c o l l e g e , concept : personbe longs toorgan iza t ion inv , concept person mark001c o n c e p t p o l i t i c a l p a r t y c o l l e g e , concept : personbe longs toorgan iza t ion inv , concept person greg001c o n c e p t p o l i t i c a l p a r t y c o l l e g e , concept : personbe longs toorgan iza t ion inv , concept person michael002c o n c e p t p o l i t i c i a n j o b s , concept : personbe longstoorgan izat ion , c o n c e p t s p o r t s t e a m s t a t e u n i v e r s i t yc o n c e p t p o l i t i c i a n j o b s , concept : agentco l labora teswi thagent , concept male wor ld
28
Published as a conference paper at ICLR 2020
c o n c e p t p o l i t i c i a n j o b s , concept : agen tco l l abo ra tesw i thagen t inv , concept male wor ldc o n c e p t p o l i t i c i a n j o b s , concept : worksfor , c o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l dconcept person greg001 , concept : personbe longstoorgan izat ion , c o n c e p t s p o r t s t e a m s t a t e u n i v e r s i t yconcept person greg001 , concept : agentco l labora teswi thagent , concept male wor ldconcept person greg001 , concept : agen tco l l abo ra tesw i thagen t inv , concept male wor ldconcept person greg001 , concept : agen tcon t ro l s i nv , concept male wor ldconcept person greg001 , concept : agentbe longstoorgan iza t ion , c o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l dconcept person greg001 , concept : personbe longstoorgan izat ion , c o n c e p t p o l i t i c a l p a r t y c o l l e g econcept person greg001 , concept : agentbe longstoorgan iza t ion , c o n c e p t r e c o r d l a b e l f r i e n d sconcept person michael002 , concept : personbe longstoorgan izat ion , c o n c e p t s p o r t s t e a m s t a t e u n i v e r s i t yconcept person michael002 , concept : agentco l labora teswi thagent , concept male wor ldconcept person michael002 , concept : agen tco l l abo ra tesw i thagen t inv , concept male wor ldconcept person michael002 , concept : agentbe longstoorgan iza t ion , c o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l dconcept person michael002 , concept : personbe longstoorgan izat ion , c o n c e p t p o l i t i c a l p a r t y c o l l e g ec o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l d , concept : works fo r inv , concept personmexico ryan whi tneyc o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l d , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyc o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l d , concept : works fo r inv , c o n c e p t p o l i t i c i a n j o b sconcep t reco rd l abe l f r i ends , concept : o rgan iza t ionh i redperson , concept personmexico ryan whi tneyconcept personmexico ryan whi tney , concept : worksfor , c o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l dconcept personmexico ryan whi tney , concept : o rgan i za t i onh i redpe rson inv , c o n c e p t g e o p o l i t i c a l l o c a t i o n w o r l dconcept personmexico ryan whi tney , concept : o rgan i za t i onh i redpe rson inv , c o n c e p t r e c o r d l a b e l f r i e n d s
concept_sportsteam_mavericks
concept_sport_basketball
concept_sportsteam_boston_celtics
concept_sportsteam_spurs
concept_sportsteam_suns
concept_sportsteam_esu_hornets
concept_sportsteam_rockets
concept_sportsteam_knicks
concept_sportsteam_memphis_grizzlies
concept_sportsteam_kings_college
concept_sportsteam_san_antonio
concept_sportsgame_series
concept_sportsteam_chicago_bulls
concept_sportsteam_golden_state_warriors
concept_sportsteam_utah_jazz
concept_sportsgame_championship
concept_sportsteam_la_clippers
concept_convention_games
concept_athlete_josh_howard
concept_awardtrophytournament_nba_finals
concept_sportsteam_marshall_university
concept_sportsleague_nba
concept_sportsteam_college
concept_sportsteam_pacers
concept_athlete_o__j__simpson
concept_sportsteam_devils
concept_athlete_dikembe_mutombo
concept_sportsleague_nascar
concept_sportsteam_pistons
concept_sportsteam_los_angeles_lakers
concept_athlete_shane_battier
concept_sportsteam_hawks
concept_sportsteam_washington_wizards
concept_sportsteam_michigan_state_university
concept_sportsteam_new_jersey_nets
concept_sportsteam_astros
concept_sportsteam_trail_blazers
concept_athlete_cuttino_mobley
concept_sportsteam_george_mason_university
concept_athlete_colin_long
concept_sportsleague_international
concept_city_huntington
concept_sportsleague_nhl
concept_sportsleague_mlb
concept_stadiumoreventvenue_toyota_center
concept_sportsteam_mavericks
concept_sport_basketball
concept_sportsteam_boston_celtics
concept_sportsteam_spurs
concept_sportsteam_suns
concept_sportsteam_esu_hornets
concept_sportsteam_marshall_university
concept_sportsleague_nba
concept_sportsteam_college
concept_sportsteam_pacers
concept_athlete_o__j__simpson
concept_sportsleague_international
concept_city_huntington
concept_sportsleague_nhl
concept_sportsleague_nascar
Figure 20: TeamPlaysInLeague. The head is concept sportsteam mavericks, the query relation isconcept:teamplaysinleague, and the tail is concept sportsleague nba. The left is a full subgraph derivedwith max attending from per step=20, and the right is a further pruned subgraph from the left based onattention. The big yellow node represents the head, and the big red node represents the tail. Color onthe rest indicates attention scores over a T -step reasoning process, where grey means less attention,yellow means more attention gained during early steps, and red means gaining more attention whengetting closer to the final step.
For the TeamPlaysInLeague taskQuery : ( concept sportsteam maver icks , concept : teamplaysinleague , concept spor ts league nba )
Selected key edges :concept sportsteam maver icks , concept : teamplayssport , concep t spo r t baske tba l lconcept sportsteam maver icks , concept : teamplaysagainstteam , concep t spo r t s team bos ton ce l t i csconcept sportsteam maver icks , concept : teamplaysagainst team inv , concep t spo r t s team bos ton ce l t i csconcept sportsteam maver icks , concept : teamplaysagainstteam , concept spor ts team spursconcept sportsteam maver icks , concept : teamplaysagainst team inv , concept spor ts team spursconcep t spo r t baske tba l l , concept : teamplaysspor t inv , concept spor ts team co l legeconcep t spo r t baske tba l l , concept : teamplaysspor t inv , concep t spo r t s team marsha l l un i ve rs i t yconcep t spor t s team bos ton ce l t i cs , concept : teamplaysinleague , concept spor ts league nbaconcept spor ts team spurs , concept : teamplaysinleague , concept spor ts league nbaconcept spor ts league nba , concept : agentcompeteswithagent , concept spor ts league nba
29
Published as a conference paper at ICLR 2020
concept spor ts league nba , concept : agentcompeteswithagent inv , concept spor ts league nbaconcept spor ts team col lege , concept : teamplaysinleague , c o n c e p t s p o r t s l e a g u e i n t e r n a t i o n a l
30