A State-transition Framework to Answer Complex Questions

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2098–2108Brussels, Belgium, October 31 - November 4, 2018. c©2018 Association for Computational Linguistics

2098

A State-transition Framework to Answer Complex Questions overKnowledge Base

Sen Hu1, Lei Zou1,2, Xinbo Zhang1

1Peking University, China; 2Beijing Institute of Big Data Research, China;{husen,zoulei,zhangxinbo}@pku.edu.cn

Abstract

Although natural language question answer-ing over knowledge graphs have been studiedin the literature, existing methods have somelimitations in answering complex questions.To address that, in this paper, we propose aState Transition-based approach to translatea complex natural language question N to asemantic query graph (SQG) QS , which isused to match the underlying knowledge graphto find the answers to question N . In or-der to generate QS , we propose four prim-itive operations (expand, fold, connect andmerge) and a learning-based state transitionapproach. Extensive experiments on severalbenchmarks (such as QALD, WebQuestionsand ComplexQuestions) with two knowledgebases (DBpedia and Freebase) confirm the su-periority of our approach compared with state-of-the-arts.

1 Introduction

Complex Question Answering, in which the ques-tion corresponds to multiple triples in knowledgebase, has attracted researchers’ attentions recently(Bao et al., 2014; Xu et al., 2016; Berant andLiang, 2014; Bast and Haussmann, 2015; Yihet al., 2015). However, most of existing solu-tions employ the predefined patterns or templatesin understanding complex questions. For exam-ple, (Bao et al., 2014; Xu et al., 2016) use de-pendency tree patterns to decompose a complexquestion to several simple questions. (Berant andLiang, 2014; Bast and Haussmann, 2015) rely ontemplates to translate natural language sentencesto predefined logical forms. Obviously, it is im-possible to cover all real-world cases using thosehandcrafted templates. The recent work STAGG(Yih et al., 2015) proposes a state transition strat-egy to tackle complex questions. However, thegenerated query graph’s structure in STAGG is

limited, as shown in Figure 1. Generally, the struc-ture is formed by linking the topic entity e and theanswer node t through a core relation. It only al-lows entity constraints (such as c) and one vari-able (the answer node t), the expressivity is quitelimited. It cannot cover all types of complex ques-tions such as the question in Figure 2, whose querygraph contains variable constraint (v4) and multi-edges (between v1 and v4). Therefore, in this pa-per, we propose a more flexible strategy to answercomplex questions without handcrafted templatesor restricting the query graph’s structure.

1.1 Properties and Challenges of ComplexQuestions

We first analyze the inherent challenges for com-plex question answering, which has not been wellstudied in existing literatures.Multi or Implicit relations. A complex ques-tion has multiple relations and some relations maybe multi-hop or implicit. Consider the questionN1 = “Which Russian astronauts died in thesame place they were born in?”. Obviously thereare two explicit relations represented by the twophrases “died in” and “born in”, but many tradi-tional methods only extract a single relation be-tween topic entity and the answer node. On theother hand, there is an implicit relation between“Russian” and “astronauts” (i.e., 〈Nationality〉),which lacks a corresponding relation phrase.Multi or No entities. Different from simple (one-hop) questions that only contain a single topic en-tity in the question sentenceN , complex questionsmay have multiple entities. For example, “WhichRussian astronauts were died in Moscow and bornin Soviet Union”. There are three entities “Rus-sian”, “Moscow” and “Soviet Union”. In somecases, the question sentence does not include anyentity, such as “Who died in the same place theywere born in”. Therefore, existing methods that

2099

e t c

Topic Entity Answer Constraint

Figure 1: The query graph structure of STAGG. Node e isthe “topic entity”, path e-t is the “core inferential chain” andc represents “augmenting constraints”. Notice we omit theCVT node as it is a virtual node only existed in Freebase.

Russia ?cosmonauts

v1

v2v3

Which cosmonauts died in the same place they were born in?

<birthPlace>

?place

v4

Astronaut <nationality> <type>

<deathPlace>(died in) (born in)

Figure 2: The example of complex questions with semanticquery graph

only care about a single topic entity cannot workfor such types of complex questions.Variables and Co-reference. Existing solutionsassume that there is a single variable in the ques-tion N , i.e., the wh-word, which is focus of thequestion (representing the answer node). How-ever, complex questions may have other variables.Consider the question in Figure 2, there are fourvariables “Which”, “cosmonauts”, “place” and“they” recognized. While the three in yellow grid(“Which”, “cosmonaut”, “they”) refer to the samething, which is the co-reference phenomenon.Composition. In the simple questions, the querygraph’s structure is determinate (one-hop edge).For complex questions, even when all entities,variables and relations are recognized, how to as-semble them to logical forms or executable queries(query graphs) is still a challenge. As mentionedearlier, existing solutions that depend on prede-fined templates (such as (Xu et al., 2016; Bast andHaussmann, 2015)) or query graph structure pat-terns (such as (Yih et al., 2015)) may not be suit-able due to the diversity of complex questions.

1.2 Our Solution

Considering the above challenges, we proposea state transition based framework to translateusers’ question N into a semantic query graph(SQG) QS , further the answers of N can be foundby matching QS over the knowledge graph G.Specifically, we first recognize nodes (such as en-tities and variables) from N as the initial state(Section 2.1). Then, to enable and speed up the

state transition process, we propose four primitiveoperations with corresponding conditions (Section2.2). The entity/relation candidates are extractedusing existing entity linking algorithms and ourproposed MCCNN model (Section 2.3). To guidethe state transition, we learn a reward function us-ing SVM ranking algorithm (Section 3) to greedilyselect a subsequent state.

Compared with existing solutions, our frame-work has stronger expressivity to solve more typesof complex questions (considering the challengesintroduced in Section 1.1) and does not rely onhandcrafted templates.

2 Semantic Query Graph Generation

This section outlines the whole framework of oursystem. Given a natural language question N , weaim to translate N into a semantic query graph(SQG) (see Definition 1).

Definition 1 (Semantic Query Graph). A seman-tic query graph (denoted as QS) is a connectedgraph, in which each vertex vi corresponds aphrase in the question sentence N and associatedwith a list of entity/type/literal candidates; andeach edge vivj is associated with a list of relationcandidates, 1 ≤ i, j ≤ |V (QS)|.

Given a question sentence N1 = “Which cos-monauts died in the same place they were bornin?”, the corresponding semantic query graph QS

1

is given in Figure 2. In QS1 , the subgraph {v1,

v2 and v3} corresponds “cosmonauts”, v4 corre-sponds “place”. The relation phrases “born in” and“died in” denote two relations between v1 and v4.Each edge in QS has a list of relation(predicate)candidates with confidence probabilities. In thispaper, we only draw one relation candidate ofeach edge for simplicity. Notice that the edgev1v3 and v1v2 have no relation phrases, however,they still have corresponding relation candidates〈nationality〉 and 〈type〉.

After obtaining semantic query graph QS , wefind matches of QS over the underlying knowl-edge graph G to find answers to the original ques-tion N . This stage is identical to gAnswer (Zouet al., 2014) and NFF (Hu et al., 2018). Thus, inthis paper, we concentrate ourselves on discussinghow to generate semantic query graph QS , whichis different from NFF that relies on the paraphras-ing dictionaries and heuristic rules to build QS .

2100

2.1 Node Recognition

The first step is to detect the “nodes” (for build-ing SQG QS) in users’ question sentences. Ac-cording to Definition 1, each node in the semanticquery graph is a phrase corresponding to a subjector object in the SPARQL query. Generally, con-stant nodes (e.g., Titanic) can be divided into threecategories according to their roles (entity, type, lit-eral) in knowledge graph. Besides, a variable node(e.g., ?place) has the potential to map any verticesin knowledge graph.

Generally, the node recognition task is regardedas a sequence labeling problem and we adopt aBLSTM-CRF model (Huang et al., 2015) to re-alize it. However, this model doesn’t work verywell on recognizing entity/type nodes as the entityphrases may be too long or too complex. There-fore, we first detect entity and type nodes by uti-lize existing entity linking algorithms of specificknowledge bases (see section 2.3), and then detectvariable/literal nodes by the BLSTM-CRF model.A phrase wij , which contains the words from i toj in the question, is recognized as entity/type nodeif we can find entity/type mappings of wij throughthe entity linking algorithm. We check all possiblephrases and select the longer one if two candidatenodes share any words. The detected entity/typenodes will be replaced by special tokens 〈E〉 or〈T〉 before calling the BLSTM-CRF model.

In practice, we find that some nodes have hid-den information, which can be expanded to asubgraph (i.e, one or multiple triples). For ex-ample, the hidden information of “?cosmonauts”in question N1 in Figure 2 can extends twotriples: 〈?cosmonauts rdf:type dbo:Astronaut〉 and〈?cosmonauts dbo:nationality dbr:Russia〉. Suchinformation can help us to reduce the matchingspace and make the answers more accurate.

We mine those nodes with hidden informationfrom several QA benchmarks1 and build a transla-tion dictionary DT in offline. Given a question Nwith the gold query graphQG, we try to matchQG

to the SQG QS generated by our approach. Eachunmatched connected subgraph in QG is regardedas hidden information of the nearest matched nodev. For the benchmarks only have gold question-answers pairs, we utilize the method in (Zhang andZou, 2017) to mine the gold query graph first.

1QALD, WebQuestions and ComplexQuestions, only thetraining sets are used to build the translation dictionary DT .

type country

?cosmonautsAstronaut

v'2 v'1 v'3

(c) Expand

?cosmonauts

v1

Russia

compose

?soundtrack?Who

v2 v1 v3

(d) Fold

Titanic

for

musicComposer

Titanic?Who

v'2 v'3

Give me all cosmonauts. Who composed the soundtrack for Cameron's Titanic?

?they?place

v3 v4

?presidents?Which

v2 v1

Which presidents died in the same place they were born in?

deathPlace birthPlace

?placev'3

?presidentsv'1 deathPlace

birthPlacetype

v'1 v'2

(a) Connect

Trump

v1

Is Trump a president?

president

v2

Trump president

(b) Merge

Figure 3: Samples of Operations

2.2 SQG’s Structure Construction

As mentioned earlier, our SQG’s structure is gen-erated in a state-transition paradigm. Initially, thestate only contains several isolated nodes that arerecognized in the first step. To enable the statetransition, we propose four primitive operations asfollows.

1. Connect. Given two operate nodes u1 andu2, we introduce an edge u1u2 between themby the connect operation. Note that the rela-tion mention and candidate relations (predi-cates) of the edge u1u2 can be found throughrelation extraction model (see section 2.3).

2. Merge. Given a SQG QS = {V,E} and twooperate nodes u,v, this operation is to mergethe node u into v. The new SQG Q

′S = {V \{u}, (E \ Ed) ∪ Ea}, Ed = {uw ∈ E} andEa = {vw|uw ∈ E ∧ w 6= v}. Notice thatv also inherits the properties of u. The mergeoperation is designed to support co-referenceresolution. The Figure 3(b) gives an example.For the question “Which presidents died inthe same place they were born in?”, the nodev2(“which”) is redundant and can be mergedinto v1(“presidents”). The node v4(“they”)and v1(“presidents”) are co-reference, we canmerge v4 to v1. The edge v3v4(birthPlace) isreplaced by edge v′

3v′1(birthPlace).

3. Expand. Given a SQG QS = {V,E} andthe operate node u ∈ V , this operation isto expand u to a subgraph QS

u = {Vu, Eu}.The new SQG Q

′S = {V ∪ Vu, E ∪ Eu}.The expand operation is designed to those

2101

Russia

?cosmonauts v'2v'4

?place

Astronaut

died in born in?cosmonauts

?they?placev3

v4

?cosmonauts ?Which

v2

?cosmonauts

v'1

?place

v'3

died in born in

v1

Expand(v1)

Merge(v1,v2)

Merge(v1,v4)

born in

died in

?Which

?place

?they

Connect(v1,v3)

v1

v2

v3

v4

Connect(v3,v4)

Connect(v1,v2)

v'1

v'3

Figure 4: Semantic Query Graph Generation

nodes that have hidden information. Let ussee Figure 3(c). The word “cosmonauts”(v1)means “Russia astronauts”, but it cannot bemapped to a certain entity or type in theknowledge base directly. To match “cosmo-nauts”, we can expand it to an equivalent sub-graph {v′

1,v′2,v

′3} (according to the transition

dictionaryDT ) that can match the underlyingknowledge base.

4. Fold. Given a SQG QS = {V,E} and an op-erate node u ∈ V , this operation is to deleteu and merge the associated relations. For-mally, the new SQG Q

′S = {V \ {u}, (E \Ed) ∪ Ea}, in which Ed = {uvi ∈ E}and Ea = {vivj |uvi ∈ E ∧ uvj ∈ E ∧i 6= j}. The fold operation is designedto eliminate those nodes which are uselessor mis-recognized. Figure 3(d) gives an ex-ample. For the question “Who composedthe soundtrack for Cameron’s Titanic?”, SQG{v1, v2, v3} cannot find matches in knowl-edge base because there are no correct rela-tions of v1v2 and v1v3. By applying a foldoperation, these two edges with the redundantnode v1 are removed, while the new edgev′1v′2 can be matched to the correct relation〈musicComposer〉 (according to the relationextraction model).

We focus on generating the correct SQG withthe above operations. Obviously, it is impossibleto enumerate all possible transitions in each stepdue to exponential search space. Therefore, wepropose a greedy search algorithm2 inspired by(Yih et al., 2015). The intention is that an betterintermediate state should have more opportunitiesto be visited and produce subsequent states. Aseach state can be seen as a partial SQG, we pro-pose a reward function using a log-linear model(see Section 3) to estimate the likelihood of eachstate. A priority queue is utilized to maintain the

2The formal algorithm can be found in Appendix A.

order of unvisited states. Each turn we choose thestate with maximum correctness likelihood and tryall possible transitions by each operation. Specif-ically, the connect and merge operations try everypair of nodes in current state as the two operatenodes, while the expand and fold operations tryeach node as the operate node. The newly gener-ated states are estimated by the learning model andpushed into the priority queue. Note that redupli-cated states will not be added. The algorithm endswhen a complete SQG QS is found and QS cannot generates any better subsequent states.

To reduce the search space of the state-transition process, we also propose several con-straints for each operation. Only when the condi-tions are satisfied, the corresponding operation canbe executed. Experiments show that those condi-tions not only speeds up the QA process but alsoimproves the precision (see Section 4.4). The con-ditions are listed as follows.

Condition 1 (Condition for Connect) Two nodesv1 and v2 can be connected if and only if thereexists no other node v∗ that occurs in the shortestpath between v1 and v2 of the dependency parsetree3 of question sentence N .

Condition 2 (Condition for Merge) For themerge operation, there should be at least one vari-able v in the two operate nodes. And v should bea wh-word or pronoun, which may co-reference3

with an other node.

Condition 3 (Condition for Expand) For the ex-pand operation, the operate node u should be avariable, and we need to be able to find u andits hidden information in the transition dictionaryDT .

Condition 4 (Condition for Fold) For the foldoperation, we require at least one of the connectededges with the operate node have no relation can-didates or the confidence probabilities of the cor-responding relations are less than a threshold τ .

3We utilize the Stanford CoreNLP dependency parser andcoreference annotator (Manning et al., 2014)

2102

The following example illustrates the process ofSQG generation. Note that it can have other tran-sition sequences to get the final state.

Example 1 Consider Figure 4. Given a questionN1 = “Which cosmonauts died in the same placethey were born in?”, we first extract four nodes v1,v2, v3, v4, which are all variable nodes. Accord-ing the condition 1, we connect (v1,v2), (v1,v3),and (v3,v4). The simple path between v1 and v3 inthe dependency parse tree can be regarded as therelation phrase of v1v3, which is “died in”. Simi-larly, the relation phrase of v3v4 is “born in”. Asv1 and v2 are neighbors in the dependency parsetree, the corresponding relation phrase is empty.Then we merge (v1,v2) as they are recognized co-reference. Similarly v1 and v4 are merged. Noticethat the associated edge v3v4 have been reserved,then there are two edges between v′1 and v′3 in thenew SQG. In the end, we expand v′1 to the sub-graph {v′1, v′4, v′2} and get the final SQG.

2.3 Finding entity/relation candidates andmatches of SQG

During SQG construction, there are two sub-tasks: entity extraction and relation extraction. Webriefly discuss the two steps as follows.

Given a node phrase in QS , we find the candi-date entities/types with confidence probabilities byusing existing entity linking approaches. Specifi-cally, we use S-MART (Yang and Chang, 2015)for Freebase dataset and DBpeida Lookup4 forDBpedia dataset.

Given two connected nodes vi and vj in QS ,we need to find the relation candidates betweenthem. Inspired by the recent success of neural rela-tion extraction models in KBQA (Yih et al., 2015;Dong et al., 2015; Xu et al., 2016), we proposea Multi-Channel Convolutional Neural Network(MCCNN) to extract all relations in the questionN exploiting both syntactic and sentential infor-mation. We use the simple path between vi and vjin the dependency tree Y (N) as input to the firstchannel. The input of second channel is the wholequestion sentence excluded all nodes, which rep-resents the context of the relation we want to pre-dict. If vi and vj are neighbors in Y (N), we needto extract an implicit relation. To tackle this task,we use vi, vj and their types as input to the thirdchannel. For vi = “Chinese” and vj = “actor”, theinput is “Chinese-Country-actor-Actor”.

4http://wiki.dbpedia.org/projects/dbpedia-lookup

Different from (Xu et al., 2016) which can onlyextract one explicit relation, our model can notonly extract multiple relations from N by takingdifferent node pairs, but also extract the implicitrelation considering the embedded information inthe two nodes vi and vj .

After generating the SQG QS , we employ thesubgraph matching algorithm (Hu et al., 2018)to find matches of QS over the underlying RDFknowledge graph G. Once the matches are found,it is straightforward to derive the answers.

3 Learning the reward function

Given an intermediate state s during SQG gener-ation process, we may transit to multiple subse-quent states s′ by applying different operations.Thus, we greedily select a subsequent state s′ withthe maximum γ(s′), where γ() is a reward func-tion taking the features and outputting the rewardof corresponding state. In this work, the rewardfunction γ() is a linear model trained with theSVM ranking loss. Below we describe the fea-tures, learning process and how to generate thetraining data.

3.1 Features

Note that for any intermediate state s, we also re-gard it as a SQG QS(s) even though it is a dis-connected graph. For a given SQG QS , we extractthe following features which are passed as an inputvector to the SVM ranker.

Nodes. For each entity/type node v, we utilizeentity linking systems to find the candidate entitiesor types and corresponding confidence probabili-ties. We use the largest confidence probability asthe score of v, and use the average score of all en-tity nodes as the feature. In addition, the numberof constant nodes and variable nodes in currentQS

are considered as features.Edges. For each edge vivj , the relation extrac-

tion model returns a list of candidate predicateswith the corresponding confidence probabilities.We use the largest confidence probability as thescore of vivj , and use the average score of alledges as the feature. We also consider the to-tal number of edges in QS , the number of edgeswhich have relation phrase and which have no re-lation phrase.

Matches. We also consider the matching be-tween QS and the knowledge graph G. A newstate s would get a low score if its corresponding

2103

SQG QS could not find any matches in G. Wedo not discard s directly from the search queue aswe allow the fold operation of it. To speed up thematching process, we build the index of all entitiesand relations in offline. For a given SQG QS , weuse the offline index to check whether it is valid.

Given a valid state s, if all of its possible subse-quent states s′ have smaller reward function valuethan that of s, we will terminate the state transi-tion process and return s as the final state. Thecorresponding SQG QS(s) is used to match theknowledge base to retrieve answers.

3.2 Learning

The task of reward function can be viewed as aranking problem, in which the appropriate SQGsshould be ranked in the front. In this work, werank them using a SVM rank classifier (Joachims,2006).

Given a SQG QS = {V,E}, the ranking scorethat we use to train our reward function γ() is cal-culated by the following function.

R(QS) =|P (V )||V ′|

+|P (E)||E′|

+max(F (Ai, A′))

(1)V ′ andE′ are the gold node set and relation set ex-tracted from the gold SPARQL query. P (V ) con-tains the nodes existing in both V and V ′. P (E)contains the relations existing in both E and E′.F (Ai, A

′) is the F-1 measure where A′ is the goldanswer set and Ai is one of the subgraph matchesbetween QS and knowledge graph G obtained bythe executing method in (Hu et al., 2018). No-tice that for the benchmark without gold SPARQLqueries, R(QS) = max(F (Ai, A

′)).

3.3 Generating Training Data

To generate the training data of the reward func-tion, given a question N = {w1, w2, ..., wn}, wefirst check all possible phrases wij by existing en-tity linking system and get the candidate entity listCvi = {e1, e2, ..., em} for each entity node vi. Iftwo entity node vi and vj have conflict, i.e, vi andvj share one or more words, we reserve the longerone. We replace each entity node vi with a spe-cial token to denote the type of entity ej , whereej ∈ Cvi and has the largest confidence possi-bility. Then the whole node set V can be pre-dicted by the well-trained node recognition model.QS = {V, φ} is the initial state. Further we dostate transition by applying the predefined opera-

tions using a BFS algorithm. Meanwhile, only thestates satisfied with corresponding conditions areconsidered if enabling the condition mechanism.Once we get a new QS we call the well-trainedrelation extraction model to get the relation can-didate list of each new edge vivj ∈ E, while thefeatures are collected according Section 3.1 andthe ranking score is calculated using Equation 1.

4 Evaluation

We evaluate our system on several benchmarkswith two knowledge base DBpedia and Freebase.For DBpedia, we use the QALD-6 benchmark. Wecompare our method STF (State Transition Frame-work) with all systems in QALD-6 competitionas well as Aqqu (Bast and Haussmann, 2015),gAnswer (Zou et al., 2014) and NFF (Hu et al.,2018). For Freebase, we use WebQuestions (Be-rant et al., 2013) and ComplexQuestions (Abuja-bal et al., 2017) as benchmarks and compare ourmethod with corresponding state-of-art systems.

4.1 Setup4.1.1 Knowledge BasesDBpedia RDF repository is a knowledge basederived from Wikipedia (Bizer et al., 2009). Weuse the version of DBpedia 2014 and the statisticsare given in Table 1.Freebase is a collaboratively edited knowledgebase. We use the version of Freebase 2013, whichis same with (Berant and Liang, 2014). The statis-tics are given in Table 1.

4.1.2 BenchmarksQALD is a series of open-domain question an-swering campaigns, which have several differenttasks. We only consider the task of English ques-tion answering over DBpedia in QALD-6 (Ungeret al., 2016), which has 350 training questions and100 test questions, with gold SPARQL queries andanswer sets.WebQuestions (Berant et al., 2013) consists of3778 training questions and 2032 test questions,which are collected using the Google Suggest APIand crowdsourcing. It only provides the pairs ofquestion and answer set.ComplexQuestions (Abujabal et al., 2017) iscomposed of 150 test questions that exhibitcompositionality through multiple clauses, whichis constructed using the crawl of WikiAnswers(http://wiki.answers.com) and human annotator. Ithas no training questions and SPARQL queries.

2104

Table 1: Statistics of Knowledge BasesDBpedia Freebase

Number of Entities 5.4 million 41 millionNumber of Triples 110 million 596 million

Number of Predicates 9708 19456Size of KBs (in GB) 8.7 56.9

Table 2: The average F1 score of WebQuestions andComplexQuestions benchmark

WQ CQSTF (Our approach) 53.6% 54.3%

STAGG (Yih et al., 2015) 52.5% -QUINT (Abujabal et al., 2017) 51.0% 49.2%

NFF (Hu et al., 2018) 49.6% -Aqqu (Bast and Haussmann, 2015) 49.4% 27.8%

Aqqu++ (Bast and Haussmann, 2015) 49.4% 46.7%

4.2 Results on WebQuestions andComplexQuestions

Table 2 shows the results on the test set ofWebQuestions(WQ) and ComplexQuestions(CQ),which are based on the Freebase. We com-pare our system with STAGG (Yih et al., 2015),QUINT (Abujabal et al., 2017), NFF (Hu et al.,2018) and Aqqu(Bast and Haussmann, 2015).QUINT is the model used in the original Com-plexQuestions paper and Aqqu is the best publiclyavailable QA system on WebQuestions. The aver-age F1 of our system (53.6% for WQ and 54.3%for CQ) are better than the other systems.

Aqqu defines three query templates for We-bQuestions and try to match test questions to pre-defined templates. It has a poor performance(27.8%) on ComplexQuestions when answeringthe test questions directly. Aqqu++ shows theresult (46.7%) by taking manually decomposedsubquestions as input and getting the intersectionof subquestions’ answer sets. QUINT is a sys-tem that automatically learns utterance-query tem-plates from the pairs of question and answer set.NFF builds a relation paraphrase dictionary andleverages it to extract relations from the questions.STAGG proposes a state-transition based querygraph generation method. However, its querygraph structure is limited as in the Figure 1, whichis very similar with the templates of Aqqu.

To generate the training data of relation extrac-tion model on the WebQuestions, for each twonodes u and v in the training question, we exploreall simple relations between u and v in Freebase.Note that if u or v is the answer node, we willanchor the answer entity first and select the rela-tion which can handle most of answer entities asthe gold relation. Each entity will be replaced intext by a special token representing its type before

Table 3: Evaluating QALD-6 Testing QuestionsProcessed Right Recall Precision F-1

Our approach 100 70 0.72 0.89 0.80CANaLI 100 83 0.89 0.89 0.89UTQA 100 63 0.69 0.82 0.75

KWGAnswer 100 52 0.59 0.85 0.70SemGraphQA 100 20 0.25 0.70 0.37

UIQA1 44 21 0.63 0.54 0.25UIQA2 36 14 0.53 0.43 0.17

NFF 100 68 0.70 0.89 0.78gAnswer 100 40 0.43 0.77 0.55

Aqqu 100 36 0.37 0.39 0.38

training. As there are no training data in Com-plexQuestions, we use the models trained on We-bQuestions directly.

Generally, WebQuestions benchmark has lowdiversity and contains some incorrect or incom-plete labeled answers. Most of questions in We-bQuestions are simple questions, which can betranslated to a “one-triple” query. The resultson WebQuestions benchmark prove our approachis able to obtain good performance on simplequestions. ComplexQuestions benchmark is morecomplex than WebQuestions. The experiments re-sults show that the two template based methodsQUINT and Aqqu can solve a part of the complexquestions. However, it still lack of the capacity totackle all questions due to the error of dependencyparse or the mismatch of the templates. On theother hand, our system outperforms them becausewe do not rely on templates and have better gener-alization ability to solve complex questions.

4.3 Results on QALD

Table 3 shows the evaluation results on the testset of QALD-6. We compare our system with allparticipants of QALD-6 as well as gAnswer (Zouet al., 2014), NFF (Hu et al., 2018) and Aqqu (Bastand Haussmann, 2015). To enable the comparison,we show the experiment results in the QALD com-petition report format. “Processed” denotes thenumber of questions can be processed and “Right”refers to the number of questions that were an-swered correctly. Our approach can answer 70questions correctly, and get the 0.80 F-1 score.

The experiment results show that we can beatall systems in QALD-6 campaign in F-1 ex-cept for CANaLI (Mazzeo and Zaniolo). Notethat CANaLI is not an automatic QA system,as it needs users to specify the precise entitiesand predicates in the question sentences. gAn-swer (Zou et al., 2014) proposes a relation firstframework to generate the semantic query graphwhile it can not detect implicit relations. NFF(Hu et al., 2018) proposes a node first frameworkwhich performs well on answering complex ques-

2105

Table 4: The performance comparison of state transi-tion with conditions or without conditions in QALD-6test set

Right Recall Precision F-1with condition 70 0.72 0.89 0.80no condition 62 0.64 0.84 0.72

10

100

1000

10000

Q3 Q14 Q25 Q30 Q42 Q57 Q66 Q71 Q81 Q91 AVG

use condition w/o condition

SQG

Co

nst

ruct

ion

Tim

e (i

n m

s)

Figure 5: The elapsed time comparison of state transi-tion with conditions or without conditions in QALD-6test set

tions (0.78 F1 score on QALD-6). However it re-lies on the predefined paraphrasing dictionaries toextract relations and could not tackle with redun-dant nodes.

As QALD has the gold SPARQL queries, wecan obtain the training data of node recognitionand relation extraction model directly. Differentfrom the WebQuestions and ComplexQuestionswhich have only special questions, the QALDbenchmark also has general questions and imper-ative sentence. Besides, there are some questionsthat have no entities or multiple entities.

Generally, QALD benchmark has both high di-versity and quality. The performance of Aqquin QALD is worse than its performance in We-bQuestions and ComplexQuestions as it has onlydesigned the templates to WebQuestions. In con-trast, our approach do not rely specific benchmarkor knowledge base. The evaluation results showthat our approach performs well on various typesof complex questions.

4.4 Comparison experiments of the fourconditions

We run experiments on QALD-6 test set to verifythe effectiveness of the four conditions proposedin Section 2.2. We first compare the elapsed timeof state transition with conditions or without con-ditions. Figure 5 shows the results of 10 ques-

Table 5: Failure Analysis on QALD-6Reason # (Ratio) Sample questionStructureFailure

6 (20%) Q55: In which countries dopeople speak Japanese?

EntityMapping

5 (17%) Q88: What color expressesloyalty?

RelationMapping

10 (33%) Q44: What is the full nameof Prince Charles?

ComplexAggregation

9 (30%) Q80: How short is theshortest active NBA player?

tions selected randomly and AVG means the av-erage time consumption among all 100 test ques-tions. The average elapsed time with conditionsis 700 ms while it rises to 1200 ms without theconditions. The results declare that using the con-ditions in the search process can avoid unneces-sary searches and save SQG construction time. Insome questions (such as Q66 and Q81), the gapof elapsed time are very little because the numberof recognized nodes are less than three. In otherwords, those simple questions do not need the con-ditions to speed up the SQG construction process.

On the other hand, using conditions can im-prove the evaluation performance (see Table 4).That because the conditions help system to re-duce search space and avoid local optimum, es-pecially the condition of connect operation. Whenthe question has multiple nodes, if we do not usethe condition of connect operation, it tries to con-nect each two nodes and enrolls too many mis-taken states.

4.5 Error AnalysisWe provide the error analysis of our approach onQALD-6 test set (100 questions), which containsgold SPARQL queries and more diversified ques-tions. The ratio of each error type and the corre-sponding examples are given in Table 5.

There are mainly four reasons for the failure ofsome questions in our approach. The first reasonis the structure failure, which means we generate awrong SQG. That usually occurred when the cor-rect SQG needs a fold operation. For example, thecorrect SQG of question “In which countries dopeople speak Japanese” has only two nodes “coun-tries”, “Japanese”. However, we recognized thephrase “people” as a variable node by mistake andconnected it to both “countries” and “Japanese” togenerate a wrong SQG QS′

. It should be elimi-nated by the fold operation to get the correct SQGQS . However, the score of QS′

that we get fromthe reward function is higher than the score ofQS .The inherent reason is the lack of training data.

2106

The second reason is the failure of entity link-ing, for example, we could not find the correct en-tity of the node “loyalty” in the question “Whatcolor expresses loyalty?”. It has smallest propor-tion as the existing entity linking algorithms al-ready have high accuracy on QA task.

The third reason is the failure of relation ex-traction. For example, the correct relation be-tween “What” and “Prince Charles” in the ques-tion ‘What is the full name of Prince Charles?” is〈alias〉. However, we got a wrong relation 〈name〉.The relation extraction problem is very importantto QA task, which can be improved by designingmore elegant models and combining more usefulinformation.

The last one is that our method cannot answersome complex aggregation questions, which needto detect the aggregation constraint first and theninfer the related node and corresponding relation.Using predefined templates or utilizing textual ev-idence maybe works but not good enough.

5 Related Work

Generally, the solutions of KBQA can be di-vided into information retrieval-based and seman-tic parsing-based methods. The general processof information retrieval-based methods is to se-lect candidate answers first and then rank themby various methods(Veyseh, 2016; Dong et al.,2015; Xu et al., 2016; Bordes et al., 2014; Yaoand Durme, 2014). The main difference among in-formation retrieval-based methods is how to rankthe candidate answers. (Bordes et al., 2014) uti-lizes subgraph embedding to predict the confi-dence of candidate answers. (Dong et al., 2015)maximize the similarity between the distributedrepresentation of a question and its answer can-didates using Multi-Column Convolutional NeuralNetworks (MCCNN), while (Xu et al., 2016) aimsto predicate the correct relations between topic en-tity and answer candidates with text evidence.

The semantic parsing-based methods try totranslate the natural language question into seman-tic equivalent logical forms, such as simple λ-DCS (Berant et al., 2013; Berant and Liang, 2014),query graph (Yih et al., 2015; Zou et al., 2014),or executable queries such as SPARQL (Bast andHaussmann, 2015; Unger et al., 2012; Yahya et al.,2012). Then the logical forms are executed bycorresponding technique and get the answers fromknowledge base. For answering complex ques-

tions, the key is how to construct the logical form.Different from existing systems relying on the pre-defined templates, we build the query graph bystate transition, which has stronger representationpower and more robustness.

(Dong and Lapata, 2016) trains a sequence totree model to translate natural language to logicalforms directly, which needs a lot of training data.(Yu et al., 2017) uses deep bidirectional LSTMsto learn both relation-level and word-level ques-tion/relation representations and match the ques-tion to candidate relations. The pipeline of (Yuet al., 2017) is similar with (Yih et al., 2015),which detect topic entity and relation first andthen recognize constraints by enumerating all con-nected entities of answer node or CVT node inknowledge base. The issue is they assume thequestion N only have one relation, which is fromthe topic entity to the answer. Different from (Yuet al., 2017), we consider the question can havemany nodes (including entities and variables) andrecognize them by a BLSTM model. We also al-low multi relations between these nodes and ex-tract them by the MCCNN model.

6 Conclusions

In this paper, we propose a state transition frame-work to utilize neural networks to answer complexquestions, which generates semantic query graphbased on four primitive operations. We train aBLSTM-CRF model to recognize nodes includingentities and variables from the question sentence.To extract the relations between those nodes, wepropose a MCCNN model which can tackle bothexplicit and implicit relations. Comparing withexisting solutions, our framework do not rely onhandcrafted templates and has the potential to gen-erate all kinds of SQGs given enough trainingdata. Extensive experiments on multi benchmarkdatasets confirm that our approach have the state-of-art performance.

Acknowledgments

This work was supported by The National Key Re-search and Development Program of China undergrant 2016YFB1000603 and NSFC under grant61622201 and 61532010. Lei Zou is the corre-sponding author of this work. This work is alsosupported by Key Laboratory of Science, Technol-ogy and Standard in Press Industry (Key Labora-tory of Intelligent Press Media Technology).

2107

ReferencesAbdalghani Abujabal, Mohamed Yahya, Mirek Riede-

wald, and Gerhard Weikum. 2017. Automated tem-plate generation for question answering over knowl-edge graphs. In Proceedings of the 26th Interna-tional Conference on World Wide Web, WWW 2017,Perth, Australia, April 3-7, 2017, pages 1191–1200.

Jun-Wei Bao, Nan Duan, Ming Zhou, and Tiejun Zhao.2014. Knowledge-based question answering as ma-chine translation. In Proceedings of the 52nd An-nual Meeting of the Association for ComputationalLinguistics, ACL 2014, June 22-27, 2014, Baltimore,MD, USA, Volume 1: Long Papers, pages 967–976.

Hannah Bast and Elmar Haussmann. 2015. More accu-rate question answering on freebase. In Proceedingsof the 24th ACM International Conference on Infor-mation and Knowledge Management, CIKM 2015,Melbourne, VIC, Australia, October 19 - 23, 2015,pages 1431–1440.

Jonathan Berant, Andrew Chou, Roy Frostig, and PercyLiang. 2013. Semantic parsing on freebase fromquestion-answer pairs. In EMNLP, pages 1533–1544.

Jonathan Berant and Percy Liang. 2014. Semanticparsing via paraphrasing. In Proceedings of the52nd Annual Meeting of the Association for Com-putational Linguistics, ACL 2014, June 22-27, 2014,Baltimore, MD, USA, Volume 1: Long Papers, pages1415–1425.

Christian Bizer, Jens Lehmann, Georgi Kobilarov,Soren Auer, Christian Becker, Richard Cyganiak,and Sebastian Hellmann. 2009. Dbpedia - a crys-tallization point for the web of data. J. Web Sem.,7(3):154–165.

Antoine Bordes, Sumit Chopra, and Jason Weston.2014. Question answering with subgraph embed-dings. In Proceedings of the 2014 Conference onEmpirical Methods in Natural Language Process-ing, EMNLP 2014, October 25-29, 2014, Doha,Qatar, A meeting of SIGDAT, a Special InterestGroup of the ACL, pages 615–620.

Li Dong and Mirella Lapata. 2016. Language to logi-cal form with neural attention. In Proceedings of the54th Annual Meeting of the Association for Compu-tational Linguistics, ACL 2016, August 7-12, 2016,Berlin, Germany, Volume 1: Long Papers.

Li Dong, Furu Wei, Ming Zhou, and Ke Xu.2015. Question answering over freebase with multi-column convolutional neural networks. In Proceed-ings of the 53rd Annual Meeting of the Associationfor Computational Linguistics and the 7th Interna-tional Joint Conference on Natural Language Pro-cessing of the Asian Federation of Natural LanguageProcessing, ACL 2015, July 26-31, 2015, Beijing,China, Volume 1: Long Papers, pages 260–269.

Sen Hu, Lei Zou, Jeffrey Xu Yu, Haixun Wang, andDongyan Zhao. 2018. Answering natural languagequestions by subgraph matching over knowledgegraphs. IEEE Trans. Knowl. Data Eng., 30(5):824–837.

Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidi-rectional LSTM-CRF models for sequence tagging.CoRR, abs/1508.01991.

Thorsten Joachims. 2006. Training linear svms inlinear time. In Proceedings of the Twelfth ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining, Philadelphia, PA, USA,August 20-23, 2006, pages 217–226.

Christopher D. Manning, Mihai Surdeanu, John Bauer,Jenny Rose Finkel, Steven Bethard, and David Mc-Closky. 2014. The stanford corenlp natural languageprocessing toolkit. In Proceedings of the 52nd An-nual Meeting of the Association for ComputationalLinguistics, ACL 2014, June 22-27, 2014, Baltimore,MD, USA, System Demonstrations.

Giuseppe M. Mazzeo and Carlo Zaniolo. Answer-ing controlled natural language questions on RDFknowledge bases. In Proceedings of the 19th Inter-national Conference on Extending Database Tech-nology, EDBT 2016, Bordeaux, France, March 15-16, 2016, Bordeaux, France, March 15-16, 2016.,pages 608–611.

Christina Unger, Lorenz Buhmann, Jens Lehmann,Axel-Cyrille Ngonga Ngomo, Daniel Gerber, andPhilipp Cimiano. 2012. Template-based questionanswering over rdf data. In WWW, pages 639–648.

Christina Unger, Axel-Cyrille Ngonga Ngomo, andElena Cabrio. 2016. 6th open challenge on questionanswering over linked data (QALD-6). In Seman-tic Web Challenges - Third SemWebEval Challengeat ESWC 2016, Heraklion, Crete, Greece, May 29 -June 2, 2016, Revised Selected Papers, pages 171–177.

Amir Pouran Ben Veyseh. 2016. Cross-lingual ques-tion answering using common semantic space. InProceedings of TextGraphs@NAACL-HLT 2016:the 10th Workshop on Graph-based Methods forNatural Language Processing, June 17, 2016, SanDiego, California, USA, pages 15–19.

Kun Xu, Siva Reddy, Yansong Feng, Songfang Huang,and Dongyan Zhao. 2016. Question answering onfreebase via relation extraction and textual evidence.In Proceedings of the 54th Annual Meeting of the As-sociation for Computational Linguistics, ACL 2016,August 7-12, 2016, Berlin, Germany, Volume 1:Long Papers.

Mohamed Yahya, Klaus Berberich, Shady Elbas-suoni, Maya Ramanath, Volker Tresp, and GerhardWeikum. 2012. Natural language questions for theweb of data. In EMNLP-CoNLL, pages 379–390.

2108

Yi Yang and Ming-Wei Chang. 2015. S-MART: noveltree-based structured learning algorithms applied totweet entity linking. In Proceedings of the 53rd An-nual Meeting of the Association for ComputationalLinguistics and the 7th International Joint Confer-ence on Natural Language Processing of the AsianFederation of Natural Language Processing, ACL2015, July 26-31, 2015, Beijing, China, Volume 1:Long Papers, pages 504–513.

Xuchen Yao and Benjamin Van Durme. 2014. Infor-mation extraction over structured data: Question an-swering with freebase. In Proceedings of the 52ndAnnual Meeting of the Association for Computa-tional Linguistics, ACL 2014, June 22-27, 2014,Baltimore, MD, USA, Volume 1: Long Papers, pages956–966.

Wen-tau Yih, Ming-Wei Chang, Xiaodong He, andJianfeng Gao. 2015. Semantic parsing via stagedquery graph generation: Question answering withknowledge base. In Proceedings of the 53rd AnnualMeeting of the Association for Computational Lin-guistics and the 7th International Joint Conferenceon Natural Language Processing of the Asian Fed-eration of Natural Language Processing, ACL 2015,July 26-31, 2015, Beijing, China, Volume 1: LongPapers, pages 1321–1331.

Mo Yu, Wenpeng Yin, Kazi Saidul Hasan,Cıcero Nogueira dos Santos, Bing Xiang, andBowen Zhou. 2017. Improved neural relationdetection for knowledge base question answering.In Proceedings of the 55th Annual Meeting of theAssociation for Computational Linguistics, ACL2017, Vancouver, Canada, July 30 - August 4,Volume 1: Long Papers, pages 571–581.

Xinbo Zhang and Lei Zou. 2017. Improving the pre-cision of RDF question/answering systems: A whynot approach. In Proceedings of the 26th Interna-tional Conference on World Wide Web Companion,Perth, Australia, April 3-7, 2017, pages 877–878.

Lei Zou, Ruizhe Huang, Haixun Wang, Jeffrey Xu Yu,Wenqiang He, and Dongyan Zhao. 2014. Natu-ral language question answering over RDF: a graphdata driven approach. In International Conferenceon Management of Data, SIGMOD 2014, Snowbird,UT, USA, June 22-27, 2014, pages 313–324.

Documents

A State-transition Framework to Answer Complex Questions