12
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 229–240 Florence, Italy, July 28 - August 2, 2019. c 2019 Association for Computational Linguistics 229 Neural Relation Extraction for Knowledge Base Enrichment Bayu Distiawan Trisedya 1 , Gerhard Weikum 2 , Jianzhong Qi 1 , Rui Zhang 1* 1 The University of Melbourne, Australia 2 Max Planck Institute for Informatics, Saarland Informatics Campus, Germany {btrisedya@student, jianzhong.qi@, rui.zhang@}unimelb.edu.au [email protected] Abstract We study relation extraction for knowledge base (KB) enrichment. Specifically, we aim to extract entities and their relationships from sentences in the form of triples and map the elements of the extracted triples to an existing KB in an end-to-end manner. Previous stud- ies focus on the extraction itself and rely on Named Entity Disambiguation (NED) to map triples into the KB space. This way, NED er- rors may cause extraction errors that affect the overall precision and recall. To address this problem, we propose an end-to-end relation extraction model for KB enrichment based on a neural encoder-decoder model. We collect high-quality training data by distant supervi- sion with co-reference resolution and para- phrase detection. We propose an n-gram based attention model that captures multi-word en- tity names in a sentence. Our model employs jointly learned word and entity embeddings to support named entity disambiguation. Finally, our model uses a modified beam search and a triple classifier to help generate high-quality triples. Our model outperforms state-of-the- art baselines by 15.51% and 8.38% in terms of F1 score on two real-world datasets. 1 Introduction Knowledge bases (KBs), often in the form of knowledge graphs (KGs), have become essential resources in many tasks including Q&A systems, recommender system, and natural language gener- ation. Large KBs such as DBpedia (Auer et al., 2007), Wikidata (Vrandecic and Kr¨ otzsch, 2014) and Yago (Suchanek et al., 2007) contain millions of facts about entities, which are represented in the form of subject-predicate-object triples. However, these KBs are far from complete and mandate con- tinuous enrichment and curation. * Rui Zhang is the corresponding author. Input sentence: "New York University is a private university in Manhattan." Unsupervised approach output: hNYU,is,private universityi hNYU,is private university in,Manhattani Supervised approach output: hNYU, instance of, Private Universityi hNYU, located in, Manhattani Canonicalized output: hQ49210, P31, Q902104i hQ49210, P131, Q11299i Table 1: Relation extraction example. Previous studies work on embedding-based model (Nguyen et al., 2018; Wang et al., 2015) and entity alignment model (Chen et al., 2017; Sun et al., 2017; Trisedya et al., 2019) to en- rich a knowledge base. Following the success of the sequence-to-sequence architecture (Bahdanau et al., 2015) for generating sentences from struc- tured data (Marcheggiani and Perez-Beltrachini, 2018; Trisedya et al., 2018), we employ this ar- chitecture to do the opposite, which is extracting triples from a sentence. In this paper, we study how to enrich a KB by relation exaction from textual sources. Specif- ically, we aim to extract triples in the form of hh, r, ti, where h is a head entity, t is a tail en- tity, and r is a relationship between the enti- ties. Importantly, as KBs typically have much better coverage on entities than on relationships, we assume that h and t are existing entities in a KB, r is a predicate that falls in a prede- fined set of predicates we are interested in, but the relationship hh, r, ti does not exist in the KB yet. We aim to find more relationships between h and t and add them to the KB. For exam- ple, from the first extracted triples in Table 1 we may recognize two entities "NYU" (abbreviation of New York University) and "Private University", which already exist in the KB;

Neural Relation Extraction for Knowledge Base EnrichmentBayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi 1, Rui Zhang 1 The University of Melbourne, Australia 2 Max Planck Institute

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Neural Relation Extraction for Knowledge Base EnrichmentBayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi 1, Rui Zhang 1 The University of Melbourne, Australia 2 Max Planck Institute

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 229–240Florence, Italy, July 28 - August 2, 2019. c©2019 Association for Computational Linguistics

229

Neural Relation Extraction for Knowledge Base Enrichment

Bayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi1, Rui Zhang1∗

1 The University of Melbourne, Australia2 Max Planck Institute for Informatics, Saarland Informatics Campus, Germany

{btrisedya@student, jianzhong.qi@, rui.zhang@}[email protected]

Abstract

We study relation extraction for knowledgebase (KB) enrichment. Specifically, we aimto extract entities and their relationships fromsentences in the form of triples and map theelements of the extracted triples to an existingKB in an end-to-end manner. Previous stud-ies focus on the extraction itself and rely onNamed Entity Disambiguation (NED) to maptriples into the KB space. This way, NED er-rors may cause extraction errors that affect theoverall precision and recall. To address thisproblem, we propose an end-to-end relationextraction model for KB enrichment based ona neural encoder-decoder model. We collecthigh-quality training data by distant supervi-sion with co-reference resolution and para-phrase detection. We propose an n-gram basedattention model that captures multi-word en-tity names in a sentence. Our model employsjointly learned word and entity embeddings tosupport named entity disambiguation. Finally,our model uses a modified beam search anda triple classifier to help generate high-qualitytriples. Our model outperforms state-of-the-art baselines by 15.51% and 8.38% in terms ofF1 score on two real-world datasets.

1 Introduction

Knowledge bases (KBs), often in the form ofknowledge graphs (KGs), have become essentialresources in many tasks including Q&A systems,recommender system, and natural language gener-ation. Large KBs such as DBpedia (Auer et al.,2007), Wikidata (Vrandecic and Krotzsch, 2014)and Yago (Suchanek et al., 2007) contain millionsof facts about entities, which are represented in theform of subject-predicate-object triples. However,these KBs are far from complete and mandate con-tinuous enrichment and curation.

∗Rui Zhang is the corresponding author.

Input sentence:"New York University is a privateuniversity in Manhattan."

Unsupervised approach output:〈NYU,is,private university〉〈NYU,is private university in,Manhattan〉

Supervised approach output:〈NYU, instance of, Private University〉〈NYU, located in, Manhattan〉

Canonicalized output:〈Q49210, P31, Q902104〉〈Q49210, P131, Q11299〉

Table 1: Relation extraction example.

Previous studies work on embedding-basedmodel (Nguyen et al., 2018; Wang et al., 2015)and entity alignment model (Chen et al., 2017;Sun et al., 2017; Trisedya et al., 2019) to en-rich a knowledge base. Following the success ofthe sequence-to-sequence architecture (Bahdanauet al., 2015) for generating sentences from struc-tured data (Marcheggiani and Perez-Beltrachini,2018; Trisedya et al., 2018), we employ this ar-chitecture to do the opposite, which is extractingtriples from a sentence.

In this paper, we study how to enrich a KB byrelation exaction from textual sources. Specif-ically, we aim to extract triples in the form of〈h, r, t〉, where h is a head entity, t is a tail en-tity, and r is a relationship between the enti-ties. Importantly, as KBs typically have muchbetter coverage on entities than on relationships,we assume that h and t are existing entities ina KB, r is a predicate that falls in a prede-fined set of predicates we are interested in, butthe relationship 〈h, r, t〉 does not exist in the KByet. We aim to find more relationships betweenh and t and add them to the KB. For exam-ple, from the first extracted triples in Table 1 wemay recognize two entities "NYU" (abbreviationof New York University) and "PrivateUniversity", which already exist in the KB;

Page 2: Neural Relation Extraction for Knowledge Base EnrichmentBayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi 1, Rui Zhang 1 The University of Melbourne, Australia 2 Max Planck Institute

230

also the predicate "instance of" is in theset of predefined predicates we are interested in,but the relationship of 〈NYU, instance of,Private University〉 does not exist in theKB. We aim to add this relationship to our KB.This is the typical situation for KB enrichment(as opposed to constructing a KB from scratch orperforming relation extraction for other purposes,such as Q&A or summarization).

KB enrichment mandates that the entities andrelationships of the extracted triples are canonical-ized by mapping them to their proper entity andpredicate IDs in a KB. Table 1 illustrates an ex-ample of triples extracted from a sentence. Theentities and predicate of the first extracted triple,including NYU, instance of, and PrivateUniversity, are mapped to their unique IDsQ49210, P31, and Q902104, respectively, tocomply with the semantic space of the KB.

Previous studies on relation extraction haveemployed both unsupervised and supervised ap-proaches. Unsupervised approaches typically startwith a small set of manually defined extractionpatterns to detect entity names and phrases aboutrelationships in an input text. This paradigm isknown as Open Information Extraction (Open IE)(Banko et al., 2007; Corro and Gemulla, 2013;Gashteovski et al., 2017). In this line of ap-proaches, both entities and predicates are capturedin their surface forms without canonicalization.Supervised approaches train statistical and neuralmodels for inferring the relationship between twoknown entities in a sentence (Mintz et al., 2009;Riedel et al., 2010, 2013; Zeng et al., 2015; Linet al., 2016). Most of these studies employ a pre-processing step to recognize the entities. Onlyfew studies have fully integrated the mapping ofextracted triples onto uniquely identified KB en-tities by using logical reasoning on the existingKB to disambiguate the extracted entities (e.g.,(Suchanek et al., 2009; Sa et al., 2017)).

Most existing methods thus entail the need forNamed Entity Disambiguation (NED) (cf. the sur-vey by Shen et al. (2015)) as a separate process-ing step. In addition, the mapping of relationshipphrases onto KB predicates necessitates anothermapping step, typically aided by paraphrase dic-tionaries. This two-stage architecture is inherentlyprone to error propagation across its two stages:NED errors may cause extraction errors (and viceversa) that lead to inaccurate relationships being

added to the KB.We aim to integrate the extraction and the

canonicalization tasks by proposing an end-to-end neural learning model to jointly extracttriples from sentences and map them intoan existing KB. Our method is based on theencoder-decoder framework (Cho et al., 2014)by treating the task as a translation of a sentenceinto a sequence of elements of triples. For theexample in Table 1, our model aims to translate"New York University is a privateuniversity in Manhattan" into a se-quence of IDs "Q49210 P31 Q902104Q49210 P131 Q11299", from which we canderive two triples to be added to the KB.

A standard encoder-decoder model with atten-tion (Bahdanau et al., 2015) is, however, unableto capture the multi-word entity names and ver-bal or noun phrases that denote predicates. Toaddress this problem, we propose a novel formof n-gram based attention that computes the n-gram combination of attention weight to capturethe verbal or noun phrase context that comple-ments the word level attention of the standard at-tention model. Our model thus can better cap-ture the multi-word context of entities and rela-tionships. Our model harnesses pre-trained wordand entity embeddings that are jointly learned withskip gram (Mikolov et al., 2013) and TransE (Bor-des et al., 2013). The advantages of our jointlylearned embeddings are twofold. First, the em-beddings capture the relationship between wordsand entities, which is essential for named entitydisambiguation. Second, the entity embeddingspreserve the relationships between entities, whichhelp to build a highly accurate classifier to filterthe invalid extracted triples. To cope with the lackof fully labeled training data, we adapt distant su-pervision to generate aligned pairs of sentence andtriple as the training data. We augment the processwith co-reference resolution (Clark and Manning,2016) and dictionary-based paraphrase detection(Ganitkevitch et al., 2013; Grycner and Weikum,2016). The co-reference resolution helps extractsentences with implicit entity names, which en-larges the set of candidate sentences to be alignedwith existing triples in a KB. The paraphrase de-tection helps filter sentences that do not expressany relationships between entities.

The main contributions of this paper are:

• We propose an end-to-end model for extract-

Page 3: Neural Relation Extraction for Knowledge Base EnrichmentBayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi 1, Rui Zhang 1 The University of Melbourne, Australia 2 Max Planck Institute

231

ing and canonicalizing triples to enrich a KB.The model reduces error propagation betweenrelation extraction and NED, which existingapproaches are prone to.• We propose an n-gram based attention model

to effectively map the multi-word mentions ofentities and their relationships into uniquelyidentified entities and predicates. We proposejoint learning of word and entity embeddingsto capture the relationship between words andentities for named entity disambiguation. Wefurther propose a modified beam search and atriple classifier to generate high-quality triples.• We evaluate the proposed model over two

real-world datasets. We adapt distant super-vision with co-reference resolution and para-phrase detection to obtain high-quality trainingdata. The experimental results show that ourmodel consistently outperforms a strong base-line for neural relation extraction (Lin et al.,2016) coupled with state-of-the-art NED mod-els (Hoffart et al., 2011; Kolitsas et al., 2018).

2 Related Work

2.1 Open Information Extraction

Banko et al. (2007) introduced the paradigm ofOpen Information Extraction (Open IE) and pro-posed a pipeline that consists of three stages:learner, extractor, and assessor. The learner usesdependency-parsing information to learn patternsfor extraction, in an unsupervised way. The ex-tractor generates candidate triples by identifyingnoun phrases as arguments and connecting phrasesas predicates. The assessor assigns a probability toeach candidate triple based on statistical evidence.This approach was prone to extracting incorrect,verbose and uninformative triples. Various follow-up studies (Fader et al., 2011; Mausam et al.,2012; Angeli et al., 2015; Mausam, 2016) im-proved the accuracy of Open IE, by adding hand-crafted patterns or by using distant supervision.Corro and Gemulla (2013) developed ClausIE, amethod that analyzes the clauses in a sentence andderives triples from this structure. Gashteovskiet al. (2017) developed MinIE to advance ClausIEby making the resulting triples more concise.

Stanovsky et al. (2018) proposed a supervisedlearner for Open IE by casting relation extrac-tion into sequence tagging. A bi-LSTM modelis trained to predict the label (entity, predicate, orother) of each token of the input. The work most

related to ours is Neural Open IE (Cui et al., 2018),which proposed an encoder-decoder with attentionmodel to extract triples. However, this work is notgeared for extracting relations of canonicalized en-tities. Another line of studies use neural learningfor semantic role labeling (He et al., 2018), but thegoal here is to recognize the predicate-argumentstructure of a single input sentence – as opposedto extracting relations from a corpus.

All of these methods generate triples where thehead and tail entities and the predicate stay intheir surface forms. Therefore, different namesand phrases for the same entities result in multipletriples, which would pollute the KG if added thisway. The only means to map triples to uniquelyidentified entities in a KG is by post-processingvia entity linking (NED) methods (Shen et al.,2015) or by clustering with subsequent mapping(Galarraga et al., 2014).

2.2 Entity-aware Relation Extraction

Inspired by the work of Brin (1998), state-of-the-art methods employ distant supervision by lever-aging seed facts from an existing KG (Mintz et al.,2009; Suchanek et al., 2009; Carlson et al., 2010).These methods learn extraction patterns from seedfacts, apply the patterns to extract new fact candi-dates, iterate this principle, and finally use statis-tical inference (e.g., a classifier) for reducing thefalse positive rate. Some of these methods hingeon the assumption that the co-occurrence of a seedfact’s entities in the same sentence is an indicatorof expressing a semantic relationship between theentities. This is a potential source of wrong la-beling. Follow-up studies (Hoffmann et al., 2010;Riedel et al., 2010, 2013; Surdeanu et al., 2012)overcome this limitation by various means, includ-ing the use of relation-specific lexicons and latentfactor models. Still, these methods treat entities bytheir surface forms and disregard their mapping toexisting entities in the KG.

Suchanek et al. (2009) and Sa et al. (2017) usedprobabilistic-logical inference to eliminate falsepositives, based on constraint solving or MonteCarlo sampling over probabilistic graphical mod-els, respectively. These methods integrate entitylinking (i.e., NED) into their models. However,both have high computational complexity and relyon modeling constraints and appropriate priors.

Recent studies employ neural networks to learnthe extraction of triples. Nguyen and Grish-

Page 4: Neural Relation Extraction for Knowledge Base EnrichmentBayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi 1, Rui Zhang 1 The University of Melbourne, Australia 2 Max Planck Institute

232

Wikidata

Wikipedia article

Joint learning skip-gram &

TransE

Word Embeddings

0.2 0.4 0.1 0.2 0.1

0.1 0.5 0.1 0.1 0.2

0.2 0.3 0.3 0.3 0.2

0.4 0.2 0.2 0.1 0.1

Entity Embeddings

0.1 0.5 0.1 0.4 0.2

0.1 0.5 0.1 0.5 0.1

0.2 0.3 0.3 0.3 0.3

0.2 0.3 0.3 0.3 0.1

Distant supervision

Sentence-Triple pairsSentence input: New York University is a private university in Manhattan.Expected output: Q49210 P31 Q902104 Q49210 P131 Q11299

Sentence input: New York Times Building is a skyscraper in ManhattanExpected output: Q192680 P131 Q11299...

Sentence input: New York University is a private university in Manhattan.

Expected output:<Q49210,P31,Q902104>;<Q387638,P161,Q40026>

N-gram-based attention

Encoder

Decoder

Triple classifier

Dataset Collection Module Embedding Module Neural Relation Extraction

Module

Figure 1: Overview of our proposed solution.

man (2015) proposed Convolution Networks withmulti-sized window kernel. Zeng et al. (2015)proposed Piecewise Convolution Neural Networks(PCNN). Lin et al. (2016, 2017) improved this ap-proach by proposing PCNN with sentence-levelattention. This method performed best in exper-imental studies; hence we choose it as the mainbaseline against which we compare our approach.Follow-up studies considered further variations:Zhou et al. (2018) proposed hierarchical attention,Ji et al. (2017) incorporated entity descriptions,Miwa and Bansal (2016) incorporated syntacticfeatures, and Sorokin and Gurevych (2017) usedbackground knowledge for contextualization.

None of these neural models is geared for KGenrichment, as the canonicalization of entities isout of their scope.

3 Proposed Model

We start with the problem definition. Let G =(E,R) be an existing KG where E and R arethe sets of entities and relationships (predicates)in G, respectively. We consider a sentence S =

〈w1, w2, ..., wi〉 as the input, where wi is a tokenat position i in the sentence. We aim to extract aset of triples O = {o1, o2, ..., oj} from the sen-tence, where oj = 〈hj , rj , tj〉, hj , tj ∈ E, andrj ∈ R. Table 1 illustrates the input and targetoutput of our problem.

3.1 Solution Framework

Figure 1 illustrates the overall solution frame-work. Our framework consists of three compo-nents: data collection module, embedding mod-ule, and neural relation extraction module.

In the data collection module (detailed in Sec-tion 3.2), we align known triples in an existing KBwith sentences that contain such triples from a textcorpus. The aligned pairs of sentences and tripleswill later be used as the training data in our neuralrelation extraction module. This alignment is doneby distant supervision. To obtain a large numberof high-quality alignments, we augment the pro-cess with a co-reference resolution to extract sen-tences with implicit entity names, which enlargesthe set of candidate sentences to be aligned. We

Page 5: Neural Relation Extraction for Knowledge Base EnrichmentBayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi 1, Rui Zhang 1 The University of Melbourne, Australia 2 Max Planck Institute

233

further use dictionary based paraphrase detectionto filter sentences that do not express any relation-ships between entities.

In the embedding module (detailed in Sec-tion 3.3), we propose a joint learning of wordand entity embeddings by combining skip-gram(Mikolov et al., 2013) to compute the word em-beddings and TransE (Bordes et al., 2013) to com-pute the entity embeddings. The objective of thejoint learning is to capture the similarity of wordsand entities that helps map the entity names intothe related entity IDs. Moreover, the resulting en-tity embeddings are used to train a triple classi-fier that helps filter invalid triples generated by ourneural relation extraction model.

In the neural relation extraction module (de-tailed in Section 3.4), we propose an n-gram basedattention model by expanding the attention mech-anism to the n-gram token of a sentence. The n-gram attention computes the n-gram combinationof attention weight to capture the verbal or nounphrase context that complements the word levelattention of the standard attention model. Thisexpansion helps our model to better capture themulti-word context of entities and relationships.

The output of the encoder-decoder model is asequence of the entity and predicate IDs where ev-ery three IDs indicate a triple. To generate high-quality triples, we propose two strategies. The firststrategy uses a modified beam search that com-putes the lexical similarity of the extracted enti-ties with the surface form of entity names in theinput sentence to ensure the correct entity predic-tion. The second strategy uses a triple classifierthat is trained using the entity embeddings fromthe joint learning to filter the invalid triples. Thetriple generation process is detailed in Section 3.5

3.2 Dataset Collection

We aim to extract triples from a sentence forKB enrichment by proposing a supervised rela-tion extraction model. To train such a model, weneed a large volume of fully labeled training datain the form of sentence-triple pairs. FollowingSorokin and Gurevych (2017), we use distant su-pervision (Mintz et al., 2009) to align sentences inWikipedia1 with triples in Wikidata2 (Vrandecicand Krotzsch, 2014).

1https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2

2https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.gz

We map an entity mention in a sentence to thecorresponding entity entry (i.e., Wikidata ID) inWikidata via the hyperlink associated to the en-tity mention, which is recorded in Wikidata as theurl property of the entity entry. Each pair maycontain one sentence and multiple triples. We sortthe order of the triples based on the order of thepredicate paraphrases that indicate the relation-ships between entities in the sentence. We col-lect sentence-triple pairs by extracting sentencesthat contain both head and tail entities of Wikidatatriples. To generate high-quality sentence-triplepairs, we propose two additional steps: (1) extract-ing sentences that contain implicit entity namesusing co-reference resolution, and (2) filtering sen-tences that do not express any relationships usingparaphrase detection. We detail these steps below.

Prior to aligning the sentences with triples,in Step (1), we find the implicit entity namesto increase the number of candidate sentencesto be aligned. We apply co-reference res-olution (Clark and Manning, 2016) to eachparagraph in a Wikipedia article and replacethe extracted co-references with the proper en-tity name. We observe that the first sen-tence of a paragraph in a Wikipedia arti-cle may contain a pronoun that refers to themain entity. For example, there is a para-graph in the Barack Obama article that startswith a sentence "He was reelected tothe Illinois Senate in 1998". Thismay cause the standard co-reference resolution tomiss the implicit entity names for the rest of theparagraph. To address this problem, we heuristi-cally replace the pronouns in the first sentence of aparagraph if the main entity name of the Wikipediapage is not mentioned. For the sentence in the pre-vious example, we replace "He" with "BarackObama". The intuition is that a Wikipedia articlecontains content of a single entity of interest, andthat the pronouns mentioned in the first sentenceof a paragraph mostly relate to the main entity.

In Step (2), we use a dictionary based para-phrase detection to capture relationships betweenentities in a sentence. First, we create a dictionaryby populating predicate paraphrases from threesources including PATTY (Nakashole et al.,2012), POLY (Grycner and Weikum, 2016), andPPDB (Ganitkevitch et al., 2013) that yield 540predicates and 24, 013 unique paraphrases. Forexample, predicate paraphrases for the relation-

Page 6: Neural Relation Extraction for Knowledge Base EnrichmentBayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi 1, Rui Zhang 1 The University of Melbourne, Australia 2 Max Planck Institute

234

#pairs #triples #entities #predicatesAll (WIKI) 255,654 330,005 279,888 158Train+val 225,869 291,352 249,272 157Test (WIKI) 29,785 38,653 38,690 109Test (GEO) 1,000 1,095 124 11

Table 2: Statistics of the dataset.

ship "place of birth" are {born in,was born in, ...}. Then, we use thisdictionary to filter sentences that do not expressany relationships between entities. We use exactstring matching to find verbal or noun phrases ina sentence which is a paraphrases of a predicateof a triple. For example, for the triple 〈BarackObama, place of birth, Honolulu〉,the sentence "Barack Obama was bornin 1961 in Honolulu, Hawaii" will beretained while the sentence "Barack Obamavisited Honolulu in 2010" will beremoved (the sentence may be retained if thereis another valid triple 〈Barack Obama,visited, Honolulu〉). This helps filternoises for the sentence-triple alignment.

The collected dataset contains 255,654sentence-triple pairs. For each pair, the maximumnumber of triples is four (i.e., a sentence canproduce at most four triples). We split the datasetinto train set (80%), dev set (10%) and test set(10%) (we call it the WIKI test dataset). Forstress testing (to test the proposed model on adifferent style of text than the training data), wealso collect another test dataset outside Wikipedia.We apply the same procedure to the user reviewsof a travel website. First, we collect user reviewson 100 popular landmarks in Australia. Then,we apply the adapted distant supervision to thereviews and collect 1,000 sentence-triple pairs (wecall it the GEO test dataset). Table 2 summarizesthe statistics of our datasets.

3.3 Joint Learning of Word and EntityEmbeddings

Our relation extraction model is based onthe encoder-decoder framework which has beenwidely used in Neural Machine Translation totranslate text from one language to another. In oursetup, we aim to translate a sentence into triples,and hence the vocabulary of the source input is aset of English words while the vocabulary of thetarget output is a set of entity and predicate IDsin an existing KG. To compute the embeddingsof the source and target vocabularies, we propose

a joint learning of word and entity embeddingsthat is effective to capture the similarity betweenwords and entities for named entity disambigua-tion (Yamada et al., 2016). Note that our methoddiffers from that of Yamada et al. (2016). We usejoint learning by combining skip-gram (Mikolovet al., 2013) to compute the word embeddings andTransE (Bordes et al., 2013) to compute the entityembeddings (including the relationship embed-dings), while Yamada et al. (2016) use WikipediaLink-based Measure (WLM) (Milne and Witten,2008) that does not consider the relationship em-beddings.

Our model learns the entity embeddings by min-imizing a margin-based objective function JE :

JE =∑

tr∈Tr

∑t′r∈T ′

r

max(0,[γ + f(tr)− f(t′r)

])(1)

Tr = {〈h, r, t〉|〈h, r, t〉 ∈ G} (2)

Tr′ =

{⟨h′, r, t

⟩|h′ ∈ E

}∪{⟨h, r, t′

⟩| t′ ∈ E

}(3)

f(tr) = ‖h+ r− t‖ (4)

Here, ‖x‖ is the L1-Norm of vector x, γ is a mar-gin hyperparameter, Tr is the set of valid relation-ship triples from a KG G, and T ′r is the set of cor-rupted relationship triples (recall that E is the setof entities in G). The corrupted triples are usedas negative samples, which are created by replac-ing the head or tail entity of a valid triple in Trwith a random entity. We use all triples in Wiki-data except those which belong to the testing datato compute the entity embeddings.

To establish the interaction between the entityand word embeddings, we follow the AnchorContext Model proposed by Yamada et al. (2016).First, we generate a text corpus by combiningthe original text and the modified anchor textof Wikipedia. This is done by replacing theentity names in a sentence with the related entityor predicate IDs. For example, the sentence"New York University is a privateuniversity in Manhattan" is mod-ified into "Q49210 is a Q902104 inQ11299". Then, we use the skip-gram method tocompute the word embeddings from the generatedcorpus (the entity IDs in the modified anchor textare treated as words in the skip-gram model).Given a sequence of n words [w1, w2, ..., wn],The model learns the word embeddings, byminimizing the following objective function JW :

JW =1

T

n∑t=1

∑−c≤j≤c,j 6=0

logP (wt+j |wt) (5)

Page 7: Neural Relation Extraction for Knowledge Base EnrichmentBayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi 1, Rui Zhang 1 The University of Melbourne, Australia 2 Max Planck Institute

235

P (wt+j |wt) =exp(v

′wt+j

>vwt)∑W

i=1(v′i>vwt)

(6)

where c is the size of the context window, wt

denotes the target word, and wt+j is the contextword; vw and v

′w are the input and output vector

representations of word w, and W is the vocab-ulary size. The overall objective function of thejoint learning of word and entity embeddings is:

J = JE + JW (7)

3.4 N-gram Based Attention Model

Our proposed relation extraction model integratesthe extraction and canonicalization tasks for KBenrichment in an end-to-end manner. To buildsuch a model, we employ an encoder-decodermodel (Cho et al., 2014) to translate a sentenceinto a sequence of triples. The encoder encodesa sentence into a vector that is used by the de-coder as a context to generate a sequence of triples.Because we treat the input and output as a se-quence, We use the LSTM networks (Hochreiterand Schmidhuber, 1997) in the encoder and thedecoder.

The encoder-decoder with attention model(Bahdanau et al., 2015) has been used in machinetranslation. However, in the relation extractiontask, the attention model cannot capture the multi-word entity names. In our preliminary investiga-tion, we found that the attention model yields mis-alignment between the word and the entity.

The above problem is due to the same wordsin the names of different entities (e.g., the wordUniversity in different university names suchas New York University, WashingtonUniversity, etc.). During training, the modelpays more attention to the word Universityto differentiate different types of entities of asimilar name, e.g., New York University,New York Times Building, or NewYork Life Building, but not the sametypes of entities of different names (e.g., NewYork University and WashingtonUniversity). This may cause errors in entityalignment, especially when predicting the ID ofan entity that is not in the training data. Eventhough we add 〈Entity-name, Entity-ID〉pairs as training data (see the Training section),the misalignments still take place.

We address the above problem by proposing ann-gram based attention model. This model com-putes the attention of all possible n-grams of thesentence input. The attention weights are com-puted over the n-gram combinations of the wordembeddings, and hence the context vector for thedecoder is computed as follows.

cdt =

he;

|N |∑n=1

Wn

|Xn|∑i=1

αni x

ni

(8)

αni =

exp(he>Vnxni )∑|Xn|

j=1 exp(he>Vnxnj )

(9)

Here, cdt is the context vector of the decoder attimestep t, he is the last hidden state of the en-coder, the superscript n indicates the n-gram com-bination, x is the word embeddings of input sen-tence, |Xn| is the total number of n-gram tokencombination, N indicates the maximum value ofn used in the n-gram combinations (N = 3 inour experiments), W and V are learned parametermatrices, and α is the attention weight.

TrainingIn the training phase, in addition to the sentence-triple pairs collected using distant supervi-sion (see Section 3.2), we also add pairs of〈Entity-name, Entity-ID〉 of all entitiesin the KB to the training data, e.g., 〈New YorkUniversity, Q49210〉. This allows themodel to learn the mapping between entity namesand entity IDs, especially for the unseen entities.

3.5 Triple GenerationThe output of the encoder-decoder model is a se-quence of the entity and predicate IDs where everythree tokens indicate a triple. Therefore, to extracta triple, we simply group every three tokens of thegenerated output. However, the greedy approach(i.e., picking the entity with the highest probabil-ity of the last softmax layer of the decoder) maylead the model to extract incorrect entities due tothe similarity between entity embeddings (e.g., theembeddings of New York City and Chicagomay be similar because both are cities in USA). Toaddress this problem, we propose two strategies:re-ranking the predicted entities using a modifiedbeam search and filtering invalid triples using atriple classifier.

The modified beam search re-ranks top-k (k =10 in our experiments) entity IDs that are predicted

Page 8: Neural Relation Extraction for Knowledge Base EnrichmentBayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi 1, Rui Zhang 1 The University of Melbourne, Australia 2 Max Planck Institute

236

ModelWIKI GEO

Precision Recall F1 Precision Recall F1

ExistingModels

MinIE (+AIDA) 0.3672 0.4856 0.4182 0.3574 0.3901 0.3730MinIE (+NeuralEL) 0.3511 0.3967 0.3725 0.3644 0.3811 0.3726ClausIE (+AIDA) 0.3617 0.4728 0.4099 0.3531 0.3951 0.3729ClausIE (+NeuralEL) 0.3445 0.3786 0.3607 0.3563 0.3791 0.3673CNN (+AIDA) 0.4035 0.3503 0.3750 0.3715 0.3165 0.3418CNN (+NeuralEL) 0.3689 0.3521 0.3603 0.3781 0.3005 0.3349

Encoder-DecoderModels

Single Attention 0.4591 0.3836 0.4180 0.4010 0.3912 0.3960Single Attention (+pre-trained) 0.4725 0.4053 0.4363 0.4314 0.4311 0.4312Single Attention (+beam) 0.6056 0.5231 0.5613 0.5869 0.4851 0.5312Single Attention (+triple classifier) 0.7378 0.5013 0.5970 0.6704 0.5301 0.5921Transformer 0.4628 0.3897 0.4231 0.4575 0.4620 0.4597Transformer (+pre-trained) 0.4748 0.4091 0.4395 0.4841 0.4831 0.4836Transformer (+beam) 0.5829 0.5025 0.5397 0.6181 0.6161 0.6171Transformer (+triple classifier) 0.7307 0.4866 0.5842 0.7124 0.5761 0.6370

Proposed

N-gram Attention 0.7014 0.6432 0.6710 0.6029 0.6033 0.6031N-gram Attention (+pre-trained) 0.7157 0.6634 0.6886 0.6581 0.6631 0.6606N-gram Attention (+beam) 0.7424 0.6845 0.7123 0.6816 0.6861 0.6838N-gram Attention (+triple classifier) 0.8471 0.6762 0.7521 0.7705 0.6771 0.7208

Table 3: Experiments result.

by the decoder by computing the edit distance be-tween the entity names (obtained from the KB)and every n-gram token of the input sentence. Theintuition is that the entity name should be men-tioned in the sentence so that the entity with thehighest similarity will be chosen as the output.

Our triple classifier is trained with entity em-beddings from the joint learning (see Section 3.3).Triple classification is one of the metrics to evalu-ate the quality of entity embeddings (Socher et al.,2013). We build a classifier to determine the valid-ity of a triple 〈h, r, t〉. We train a binary classifierbased on the plausibility score (h+ r− t) (thescore to compute the entity embeddings). We cre-ate negative samples by corrupting the valid triples(i.e., replacing the head or tail entity by a randomentity). The triple classifier is effective to filter in-valid triple such as 〈New York University,capital of, Manhattan〉.

4 Experiments

We evaluate our model on two real datasets includ-ing WIKI and GEO test datasets (see Section 3.2).We use precision, recall, and F1 score as the eval-uation metrics.

4.1 Hyperparameters

We use grid search to find the best hyper-parameters for the networks. We use 512 hiddenunits for both the encoder and the decoder. We use64 dimensions of pre-trained word and entity em-beddings (see Section 3.3). We use a 0.5 dropoutrate for regularization on both the encoder and thedecoder. We use Adam (Kingma and Ba, 2015)

with a learning rate of 0.0002.

4.2 Models

We compare our proposed model3 with three ex-isting models including CNN (the state-of-the-art supervised approach by Lin et al. (2016)),MiniE (the state-of-the-art unsupervised approachby Gashteovski et al. (2017)), and ClausIE byCorro and Gemulla (2013). To map the extractedentities by these models, we use two state-of-the-art NED systems including AIDA (Hoffart et al.,2011) and NeuralEL (Kolitsas et al., 2018). Theprecision (tested on our test dataset) of AIDA andNeuralEL are 70% and 61% respectively. To mapthe extracted predicates (relationships) of the un-supervised approaches output, we use the dictio-nary based paraphrase detection. We use the samedictionary that is used to collect the dataset (i.e.,the combination of three paraphrase dictionariesincluding PATTY (Nakashole et al., 2012), POLY(Grycner and Weikum, 2016), and PPDB (Gan-itkevitch et al., 2013)). We replace the extractedpredicate with the correct predicate ID if one ofthe paraphrases of the correct predicate (i.e., thegold standard) appear in the extracted predicate.Otherwise, we replace the extracted predicate with"NA" to indicate an unrecognized predicate. Wealso compare our N-gram Attention model withtwo encoder-decoder based models including theSingle Attention model (Bahdanau et al., 2015)and Transformer model (Vaswani et al., 2017).

3The code and the dataset are made available athttp://www.ruizhang.info/GKB/gkb.htm

Page 9: Neural Relation Extraction for Knowledge Base EnrichmentBayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi 1, Rui Zhang 1 The University of Melbourne, Australia 2 Max Planck Institute

237

4.3 Results

Table 3 shows that the end-to-end models outper-form the existing model. In particular, our pro-posed n-gram attention model achieves the bestresults in terms of precision, recall, and F1 score.Our proposed model outperforms the best existingmodel (MinIE) by 33.39% and 34.78% in termsof F1 score on the WIKI and GEO test dataset re-spectively. These results are expected since theexisting models are affected by the error propa-gation of the NED. As expected, the combinationof the existing models with AIDA achieves higherF1 scores than the combination with NeuralEL asAIDA achieves a higher precision than NeuralEL.

To further show the effect of error propagation,we set up an experiment without the canonical-ization task (i.e., the objective is predicting a re-lationship between known entities). We removethe NED pre-processing step by allowing the CNNmodel to access the correct entities. Meanwhile,we provide the correct entities to the decoder ofour proposed model. In this setup, our proposedmodel achieves 86.34% and 79.11%, while CNNachieves 81.92% and 75.82% in precision over theWIKI and GEO test datasets, respectively.

Our proposed n-gram attention model outper-forms the end-to-end models by 15.51% and8.38% in terms of F1 score on the WIKI and GEOtest datasets, respectively. The Transformer modelalso only yields similar performance to that ofthe Single Attention model, which is worse thanours. These results indicate that our model cap-tures multi-word entity name (in both datasets,82.9% of the entities have multi-word entity name)in the input sentence better than the other models.

Table 3 also shows that the pre-trained embed-dings improve the performance of the model in allmeasures. Moreover, the pre-trained embeddingshelp the model to converge faster. In our experi-ments, the models that use the pre-trained embed-dings converge in 20 epochs on average, while themodels that do not use the pre-trained embeddingsconverge in 30 − 40 epochs. Our triple classifiercombined with the modified beam search boost theperformance of the model. The modified beamsearch provides a high recall by extracting the cor-rect entities based on the surface form in the inputsentence while the triple classifier provides a highprecision by filtering the invalid triples.

DiscussionWe further perform manual error analysis. Wefound that the incorrect output of our model iscaused by the same entity name of two differ-ent entities (e.g., the name of Michael Jordanthat refers to the American basketball player orthe English footballer). The modified beam searchcannot disambiguate those entities as it only con-siders the lexical similarity. We consider usingcontext-based similarity as future work.

5 Conclusions

We proposed an end-to-end relation extractionmodel for KB enrichment that integrates the ex-traction and canonicalization tasks. Our modelthus reduces the error propagation between rela-tion extraction and NED that existing approachesare prone to. To obtain high-quality training data,we adapt distant supervision and augment it withco-reference resolution and paraphrase detection.We propose an n-gram based attention model thatbetter captures the multi-word entity names in asentence. Moreover, we propose a modified beamsearch and a triple classification that helps themodel to generate high-quality triples.

Experimental results show that our proposedmodel outperforms the existing models by 33.39%and 34.78% in terms of F1 score on the WIKIand GEO test dataset respectively. These re-sults confirm that our model reduces the errorpropagation between NED and relation extraction.Our proposed n-gram attention model outperformsthe other encoder-decoder models by 15.51% and8.38% in terms of F1 score on the two real-worlddatasets. These results confirm that our model bet-ter captures the multi-word entity names in a sen-tence. In the future, we plan to explore context-based similarity to complement the lexical simi-larity to improve the overall performance.

Acknowledgments

Bayu Distiawan Trisedya is supported by the In-donesian Endowment Fund for Education (LPDP).This work is done while Bayu Distiawan Trisedyais visiting the Max Planck Institute for In-formatics. This work is supported by Aus-tralian Research Council (ARC) Discovery ProjectDP180102050, Google Faculty Research Award,and the National Science Foundation of China(Project No. 61872070 and No. 61402155).

Page 10: Neural Relation Extraction for Knowledge Base EnrichmentBayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi 1, Rui Zhang 1 The University of Melbourne, Australia 2 Max Planck Institute

238

References

Gabor Angeli, Melvin Jose Johnson Premkumar, andChristopher D. Manning. 2015. Leveraging linguis-tic structure for open domain information extrac-tion. In Proceedings of Association for Computa-tional Linguistics, pages 344–354.

Soren Auer, Christian Bizer, Georgi Kobilarov, JensLehmann, Richard Cyganiak, and Zachary G. Ives.2007. Dbpedia: A nucleus for a web of open data.In Proceedings of International Semantic Web Con-ference, pages 722–735.

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-gio. 2015. Neural machine translation by jointlylearning to align and translate. In Proceedings ofthe International Conference on Learning Represen-tations.

Michele Banko, Michael J. Cafarella, Stephen Soder-land, Matthew Broadhead, and Oren Etzioni. 2007.Open information extraction from the web. In Pro-ceedings of International Joint Conference on Artif-ical intelligence, pages 2670–2676.

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko.2013. Translating embeddings for modeling multi-relational data. In Proceedings of InternationalConference on Neural Information Processing Sys-tems, pages 2787–2795.

Sergey Brin. 1998. Extracting patterns and relationsfrom the world wide web. In Proceedings of TheWorld Wide Web and Databases International Work-shop, pages 172–183.

Andrew Carlson, Justin Betteridge, Bryan Kisiel, BurrSettles, Estevam R. Hruschka Jr., and Tom M.Mitchell. 2010. Toward an architecture for never-ending language learning. In Proceedings of AAAIConference on Artificial Intelligence, pages 1306–1313.

Muhao Chen, Yingtao Tian, Mohan Yang, and CarloZaniolo. 2017. Multilingual knowledge graph em-beddings for cross-lingual knowledge alignment. InProceedings of International Joint Conference onArtificial Intelligence, pages 1511–1517.

Kyunghyun Cho, Bart van Merrienboer, Caglar Gul-cehre, Dzmitry Bahdanau, Fethi Bougares, HolgerSchwenk, and Yoshua Bengio. 2014. Learningphrase representations using rnn encoder–decoderfor statistical machine translation. In Proceedingsof Empirical Methods in Natural Language Process-ing, pages 1724–1734.

Kevin Clark and Christopher D. Manning. 2016. Deepreinforcement learning for mention-ranking corefer-ence models. In Proceedings of Empirical Methodsin Natural Language Processing, pages 2256–2262.

Luciano Del Corro and Rainer Gemulla. 2013. Clausie:clause-based open information extraction. In Pro-ceedings of International Conference on World WideWeb, pages 355–366.

Lei Cui, Furu Wei, and Ming Zhou. 2018. Neural openinformation extraction. In Proceedings of Associa-tion for Computational Linguistics, pages 407–413.

Anthony Fader, Stephen Soderland, and Oren Etzioni.2011. Identifying relations for open information ex-traction. In Proceedings of Empirical Methods inNatural Language Processing, pages 1535–1545.

Luis Galarraga, Geremy Heitz, Kevin Murphy, andFabian M. Suchanek. 2014. Canonicalizing openknowledge bases. In Proceedings of InternationalConference on Information and Knowledge Man-agement, pages 1679–1688.

Juri Ganitkevitch, Benjamin Van Durme, and ChrisCallison-Burch. 2013. Ppdb: The paraphrasedatabase. In Proceedings of North American Chap-ter of the Association for Computational Linguistics:Human Language Technologies, pages 758–764.

Kiril Gashteovski, Rainer Gemulla, and Luciano DelCorro. 2017. Minie: Minimizing facts in open in-formation extraction. In Proceedings of Empiri-cal Methods in Natural Language Processing, pages2620–2630.

Adam Grycner and Gerhard Weikum. 2016. Poly:Mining relational paraphrases from multilingualsentences. In Proceedings of Empirical Methods inNatural Language Processing, pages 2183–2192.

Luheng He, Kenton Lee, Omer Levy, and Luke Zettle-moyer. 2018. Jointly predicting predicates and argu-ments in neural semantic role labeling. In Proceed-ings of Association for Computational Linguistics,pages 364–369.

Sepp Hochreiter and Jurgen Schmidhuber. 1997.Long short-term memory. Neural Computation,9(8):1735–1780.

Johannes Hoffart et al. 2011. Robust disambiguationof named entities in text. In Proceedings of Empiri-cal Methods in Natural Language Processing, pages782–792.

Raphael Hoffmann, Congle Zhang, and Daniel S.Weld. 2010. Learning 5000 relational extractors. InProceedings of Association for Computational Lin-guistics, pages 286–295.

Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao.2017. Distant supervision for relation extractionwith sentence-level attention and entity descriptions.In Proceedings of AAAI Conference on Artificial In-telligence, pages 3060–3066.

Diederik P. Kingma and Jimmy Lei Ba. 2015. Adam:A method for stochastic optimization. In Proceed-ings of International Conference on Learning Rep-resentations.

Page 11: Neural Relation Extraction for Knowledge Base EnrichmentBayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi 1, Rui Zhang 1 The University of Melbourne, Australia 2 Max Planck Institute

239

Nikolaos Kolitsas, Octavian-Eugen Ganea, andThomas Hofmann. 2018. End-to-end neural en-tity linking. In Proceedings of Conference onComputational Natural Language Learning, pages519–529.

Yankai Lin, Zhiyuan Liu, and Maosong Sun. 2017.Neural relation extraction with multi-lingual atten-tion. In Proceedings of Association for Computa-tional Linguistics, volume 1, pages 34–43.

Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan,and Maosong Sun. 2016. Neural relation extractionwith selective attention over instances. In Proceed-ings of Association for Computational Linguistics,volume 1, pages 2124–2133.

Diego Marcheggiani and Laura Perez-Beltrachini.2018. Deep graph convolutional encoders for struc-tured data to text generation. Proceedings of Inter-national Conference on Natural Language Genera-tion, pages 1–9.

Mausam. 2016. Open information extraction systemsand downstream applications. In Proceedings ofInternational Joint Conference on Artificial Intelli-gence, pages 4074–4077.

Mausam, Michael Schmitz, Stephen Soderland, RobertBart, and Oren Etzioni. 2012. Open language learn-ing for information extraction. In Proceedings ofEmpirical Methods in Natural Language Process-ing, pages 523–534.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S.Corrado, and Jeffrey Dean. 2013. Distributed rep-resentations of words and phrases and their com-positionality. In Proceedings of International Con-ference on Neural Information Processing Systems,pages 3111–3119.

David Milne and Ian H. Witten. 2008. An effective,low-cost measure of semantic relatedness obtainedfrom wikipedia links. In Proceedings of AAAI Work-shop on Wikipedia and Artificial Intelligence, pages25–30.

Mike Mintz, Steven Bills, Rion Snow, and Daniel Ju-rafsky. 2009. Distant supervision for relation ex-traction without labeled data. In Proceedings of theJoint Conference of Association for ComputationalLinguistics and International Joint Conference onNatural Language Processing of the AFNLP, pages1003–1011.

Makoto Miwa and Mohit Bansal. 2016. End-to-end re-lation extraction using lstms on sequences and treestructures. In Proceedings of Association for Com-putational Linguistics.

Ndapandula Nakashole, Gerhard Weikum, andFabian M. Suchanek. 2012. Patty: A taxonomy ofrelational patterns with semantic types. In Proceed-ings of Empirical Methods in Natural LanguageProcessing, pages 1135–1145.

Dai Quoc Nguyen, Tu Dinh Nguyen, Dat QuocNguyen, and Dinh Q. Phung. 2018. A novel embed-ding model for knowledge base completion basedon convolutional neural network. In Proceedingsof North American Chapter of the Association forComputational Linguistics: Human Language Tech-nologies, volume 2, pages 327–333.

Thien Huu Nguyen and Ralph Grishman. 2015. Rela-tion extraction: Perspective from convolutional neu-ral networks. In Proceedings of Workshop on VectorSpace Modeling for Natural Language Processing,pages 39–48.

Sebastian Riedel, Limin Yao, and Andrew McCallum.2010. Modeling relations and their mentions with-out labeled text. In Proceedings of European Con-ference on Machine Learning and Knowledge Dis-covery in Databases, pages 148–163.

Sebastian Riedel, Limin Yao, Andrew McCallum, andBenjamin M. Marlin. 2013. Relation extraction withmatrix factorization and universal schemas. In Pro-ceedings of North American Chapter of the Associ-ation for Computational Linguistics: Human Lan-guage Technologies, pages 74–84.

Christopher De Sa, Alexander Ratner, Christopher Re,Jaeho Shin, Feiran Wang, Sen Wu, and Ce Zhang.2017. Incremental knowledge base constructionusing deepdive. Very Large Data Bases Journal,26(1):81–105.

Wei Shen, Jianyong Wang, and Jiawei Han. 2015. En-tity linking with a knowledge base: Issues, tech-niques, and solutions. IEEE Trans. Knowl. DataEng., 27(2):443–460.

Richard Socher, Danqi Chen, Christopher D. Manning,and Andrew Y. Ng. 2013. Reasoning with neuraltensor networks for knowledge base completion. InProceedings of International Conference on NeuralInformation Processing Systems, pages 926–934.

Daniil Sorokin and Iryna Gurevych. 2017. Context-aware representations for knowledge base relationextraction. In Proceedings of Empirical Methods inNatural Language Processing, pages 1784–1789.

Gabriel Stanovsky, Julian Michael, Ido Dagan, andLuke Zettlemoyer. 2018. Supervised open infor-mation extraction. In Proceedings of North Amer-ican Chapter of the Association for ComputationalLinguistics: Human Language Technologies, pages885–895.

Fabian M. Suchanek, Gjergji Kasneci, and GerhardWeikum. 2007. Yago: a core of semantic knowl-edge. In Proceedings of International Conferenceon World Wide Web, pages 697–706.

Fabian M. Suchanek, Mauro Sozio, and GerhardWeikum. 2009. SOFIE: a self-organizing frame-work for information extraction. In Proceedings ofInternational Conference on World Wide Web, pages631–640.

Page 12: Neural Relation Extraction for Knowledge Base EnrichmentBayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi 1, Rui Zhang 1 The University of Melbourne, Australia 2 Max Planck Institute

240

Zequn Sun, Wei Hu, and Chengkai Li. 2017.Cross-lingual entity alignment via joint attribute-preserving embedding. Proceedings of Interna-tional Semantic Web Conference, pages 628–644.

Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati,and Christopher D. Manning. 2012. Multi-instancemulti-label learning for relation extraction. In Pro-ceedings of Empirical Methods in Natural LanguageProcessing, pages 455–465.

Bayu Distiawan Trisedya, Jianzhong Qi, and RuiZhang. 2019. Entity alignment between knowledgegraphs using attribute embeddings. In Proceedingsof AAAI Conference on Artificial Intelligence.

Bayu Distiawan Trisedya, Jianzhong Qi, Rui Zhang,and Wei Wang. 2018. Gtr-lstm: A triple encoder forsentence generation from rdf data. In Proceedingsof Association for Computational Linguistics, pages1627–1637.

Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N. Gomez, LukaszKaiser, and Illia Polosukhin. 2017. Attention is allyou need. In Proceedings of Neural InformationProcessing Systems, pages 5998–6008.

Denny Vrandecic and Markus Krotzsch. 2014. Wiki-data: a free collaborative knowledgebase. Commun.ACM, 57(10):78–85.

Quan Wang, Bin Wang, and Li Guo. 2015. Knowl-edge base completion using embeddings and rules.In Proceedings of International Joint Conference onArtificial Intelligence, pages 1859–1865.

Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, andYoshiyasu Takefuji. 2016. Joint learning of theembedding of words and entities for named entitydisambiguation. In Proceedings of Conference onComputational Natural Language Learning, pages250–259.

Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao.2015. Distant supervision for relation extraction viapiecewise convolutional neural networks. In Pro-ceedings of Empirical Methods in Natural LanguageProcessing, pages 1753–1762.

Peng Zhou, Jiaming Xu, Zhenyu Qi, Hongyun Bao,Zhineng Chen, and Bo Xu. 2018. Distant supervi-sion for relation extraction with hierarchical selec-tive attention. Neural Networks, 108:240–247.