Learning to reason by reading text and answering questions · Learning to reason by reading text...

Preview:

Citation preview

Learningtoreasonbyreadingtextandansweringquestions

MinjoonSeoNaturalLanguageProcessingGroup

UniversityofWashingtonMay26,2017

@Kakao Brain

Whatisreasoning?

SimpleQuestionAnsweringModel

Whatis“Hello”inFrench? Bonjour.

Examples

• Mostneuralmachinetranslationsystems(Choetal.,2014;Bahdanau etal., 2014)• Needveryhighhiddenstatesize(~1000)• Noneedtoquerythedatabase(context)à veryfast

• Mostdependency,constituencyparser(Chenetal.,2014;Kleinetal.,2003)• Sentimentclassification(Socher etal.,2013)

• Classifyingwhetherasentenceispositiveornegative• Mostneuralimageclassificationsystems

• Thequestionisalways“Whatisintheimage?”

• Mostclassificationsystems

SimpleQuestionAnsweringModel

Whatis“Hello”inFrench? Bonjour.

Problem:parametricmodelhasfinitecapacity.

“Youcan’tevenfitasentenceintoasinglevector”-DanRoth

QAModelwithContext

English French

Hello Bonjour

Thankyou Merci

Whatis“Hello”inFrench? Bonjour.

Context(KnowledgeBase)

Examples

• WikiQA(Yangetal.,2015)• QASent(Wangetal.,2007)• WebQuestions (Berant etal.,2013)• WikiAnswer (Wikia)• Free917(Cai andYates,2013)

• Manydeeplearningmodelswithexternalmemory (e.g.MemoryNetworks)

QAModelwithContext

Eats IsA

(Amphibian, insect) (Frog, amphibian)

(insect,flower) (Fly,insect)

Whatdoesafrogeat? Fly

Context(KnowledgeBase)

Somethingismissing…

QAModelwithReasoningCapability

Eats IsA

(Amphibian, insect) (Frog, amphibian)

(insect,flower) (Fly,insect)

Whatdoesafrogeat? Fly

Context(KnowledgeBase)

FirstOrderLogicIsA(A, B)^IsA(C,D)^Eats(B,D)à Eats(A,C)

Examples

• Semanticparsing• GeoQuery (Krishnamurthyetal.,2013;Artzi etal.,2015)

• Sciencequestions• AristoChallenge(Clarketal.,2015)• ProcessBank (Berant etal.,2014)

• Machinecomprehension• MCTest (Richardsonetal.,2013)

“Vague”linebetweennon-reasoningQAandreasoningQA• Non-reasoning:• Therequiredinformationisexplicitinthecontext• Themodeloftenneedstohandlelexical/syntacticvariations

• Reasoning:• Therequiredinformationmaynot beexplicitinthecontext• Needtocombinemultiplefactstoderivetheanswer

• Thereisnoclearlinebetweenthetwo!

Ifourobjectiveisto“answer”difficultquestions…• Wecantrytomakethemachinemorecapableofreasoning(bettermodel)

• Wecantrytomakemoreinformationexplicitinthecontext(moredata)

OR

QAModelwithReasoningCapability

Eats IsA

(Amphibian, insect) (Frog, amphibian)

(insect,flower) (Fly,insect)

Whatdoesafrogeat? Fly

Context(KnowledgeBase)

FirstOrderLogicIsA(A, B)^IsA(C,D)^Eats(B,D)à Eats(A,C)

Whomakesthis?Tellmeit’s notme…

ReasoningQAModelwithUnstructuredData

Whatdoesafrogeat? Fly

Frogisanexampleofamphibian.Fliesareoneofthemostcommoninsectsaroundus.Insectsaregoodsourcesofproteinforamphibians.…

Contextinnaturallanguage

Iaminterestedin…

• Naturallanguageunderstanding• Naturallanguagehasdiversesurfaceforms(lexically,syntactically)

• Learningtoreadtextandreasonbyquestionanswering(dialog)• Textisunstructureddata• Derivingnewknowledgefromexistingknowledge

• End-to-endtraining• Minimizinghumanefforts

Reasoningcapability

NLUcapability End-to-end

AAAI2014EMNLP2015

ECCV2016CVPR2017

ICLR2017ACL2017

ICLR2017

Reasoningcapability

NLUcapability End-to-end

GeometryQA

GeometryQA

In the diagram at the right, circle O has a radius of 5, and CE = 2. Diameter AC is perpendicular to chord BD. What is the length of BD?

a) 2 b) 4 c) 6d) 8 e) 10

EB D

A

O

5

2

C

GeometryQAModel

WhatisthelengthofBD? 8

In the diagram at the right, circle O has a radius of 5, and CE = 2. Diameter AC is perpendicular to chord BD.

FirstOrderLogic

Localcontext Globalcontext

Method

• Learntomapquestiontologicalform• Learntomaplocalcontexttologicalform• Textà logicalform• Diagramà logicalform

• Globalcontextisalreadyformal!• Manually defined• “IfAB=BC,then<CAB=<ACB”

• Solveronalllogicalforms• Wecreatedareasonablenumericalsolver

Mappingquestion/texttologicalform

IntriangleABC,lineDEisparallelwithlineAC,DBequals4,ADis8,andDEis5.FindAC.(a)9(b)10(c)12.5(d)15(e)17

B

D E

A C

IsTriangle(ABC) ∧ Parallel(AC, DE) ∧

Equals(LengthOf(DB), 4) ∧ Equals(LengthOf(AD), 8) ∧ Equals(LengthOf(DE), 5) ∧ Find(LengthOf(AC))

TextInput

Logicalform

Difficulttodirectlymaptexttoalonglogicalform!

Mappingquestion/texttologicalformIntriangleABC,lineDEisparallelwithlineAC,DBequals4,ADis8,andDEis5.FindAC.(a)9(b)10(c)12.5(d)15(e)17

B

D E

A C

IsTriangle(ABC)Parallel(AC, DE)Parallel(AC, DB)Equals(LengthOf(DB), 4)Equals(LengthOf(AD), 8)Equals(LengthOf(DE), 5)Equals(4, LengthOf(AD))…

Over-generatedliterals0.960.910.740.970.940.940.31…

Textscores1.000.990.02n/an/an/an/a…

Diagramscores

Selectedsubset

TextInput

Logicalform

Ourmethod

IsTriangle(ABC) ∧ Parallel(AC, DE) ∧

Equals(LengthOf(DB), 4) ∧ Equals(LengthOf(AD), 8) ∧ Equals(LengthOf(DE), 5) ∧ Find(LengthOf(AC))

Numericalsolver

Literal EquationEquals(LengthOf(AB),d) (Ax-Bx)2+(Ay-By)2-d2 =0Parallel(AB,CD) (Ax-Bx)(Cy-Dy)-(Ay-By)(Cx-Dx)=0PointLiesOnLine(B,AC) (Ax-Bx)(By-Cy)-(Ay-By)(Bx-Cx)=0Perpendicular(AB,CD) (Ax-Bx)(Cx-Dx)+(Ay-By)(Cy-Dy)=0

• Findthesolutiontotheequationsystem• Useoff-the-shelfnumericalminimizers(WalesandDoye,1997;Kraft,1988)

• Numericalsolvercanchoosenot toanswerquestion

• Translateliteralstonumericequations

Dataset• Trainingquestions(67questions,121sentences)• Seoetal.,2014• Highschoolgeometryquestions

• Testquestions (119questions,215sentences)• Wecollectedthem• SAT(UScollegeentranceexam)geometryquestions

• Wemanuallyannotatedthetextparseofallquestions

Results(EMNLP2015)

0

10

20

30

40

50

60

Textonly Diagramonly

Rule-based GeoS Studentaverage

SATScore(%

)

***0.25penaltyforincorrectanswer

Demo(geometry.allenai.org/demo)

Limitations

• Datasetissmall• Requiredlevelofreasoningisveryhigh• Alotofmanualefforts(annotations,ruledefinitions,etc.)• End-to-endsystemissimplyhopeless

• Collectmoredata?• Changetask?• Curriculumlearning?(Domorehopeful tasksfirst?)

Reasoningcapability

NLUcapability End-to-end

DiagramQA

DiagramQA

Q:Theprocessofwaterbeingheatedbysunandbecominggasiscalled

A:Evaporation

IsDQAsubsetofVQA?

• Diagramsandrealimagesareverydifferent• Diagramcomponentsaresimplerthanrealimages• Diagramcontainsalotofinformationinasingleimage• Diagramsarefew(whereasrealimagesarealmostinfinitelymany)

Problem

Whatcomesbeforesecondfeed? 8

Difficulttolatentlylearnrelationships

Strategy

Whatdoesafrogeat? Fly

DiagramGraph

DiagramParsing

QuestionAnswering

Attentionvisualization

Results(ECCV2016)

Method Trainingdata Accuracy

Random(expected) - 25.00

LSTM+CNN VQA 29.06

LSTM+CNN AI2D 32.90

Ours AI2D 38.47

Limitations

• Youcan’treallycallthisreasoning…• Rathermatchtingalgorithm• Nocomplexinferenceinvolved

• Youneedalotofpriorknowledgetoanswersomequestions!• E.g.“Flyisaninsect”,“Frogisanamphibian”

TextbookQAtextbookqa.org (CVPR2017)

Reasoningcapability

NLUcapability End-to-end

MachineComprehension

QuestionAnsweringTask(StanfordQuestionAnsweringDataset,2016)

Q:WhichNFLteamrepresentedtheAFCatSuperBowl50?

A:DenverBroncos

WhyNeuralAttention?

Q:WhichNFLteamrepresentedtheAFCatSuperBowl50?

Allowsadeeplearningarchitecturetofocusonthemostrelevantphraseofthecontexttothequery

inadifferentiablemanner.

OurModel:Bi-directionalAttentionFlow(BiDAF)

Attention

Modeling

MLP+softmax

𝑖$ = 0 𝑖' = 1

BarakObamaisthepresidentoftheU.S. WholeadstheUnitedStates?

Attention

(Bidirectional)AttentionFlow

Modeling Layer

Output Layer

Attention Flow Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM

LSTM

LSTM

LSTM

Start End

h1 h2 hT

u1

u2

uJ

Softm

ax

h1 h2 hT

u1

u2

uJ

Max

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

LSTM + SoftmaxDense + Softmax

Context Query

Query2Context and Context2QueryAttention

WordEmbedding

GLOVE Char-CNN

Character Embed Layer

CharacterEmbedding

g1 g2 gT

m1 m2 mT

Char/WordEmbeddingLayers

Modeling Layer

Output Layer

Attention Flow Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM

LSTM

LSTM

LSTM

Start End

h1 h2 hT

u1

u2

uJ

Softm

ax

h1 h2 hT

u1

u2

uJ

Max

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

LSTM + SoftmaxDense + Softmax

Context Query

Query2Context and Context2QueryAttention

WordEmbedding

GLOVE Char-CNN

Character Embed Layer

CharacterEmbedding

g1 g2 gT

m1 m2 mT

CharacterandWordEmbedding

• Wordembeddingisfragileagainstunseenwords• Charembeddingcan’teasilylearnsemanticsofwords• Useboth!

• CharembeddingasproposedbyKim(2015)

Seattle

SeattleCNN

+MaxPooling

concat

Embeddingvector

PhraseEmbeddingLayer

Modeling Layer

Output Layer

Attention Flow Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM

LSTM

LSTM

LSTM

Start End

h1 h2 hT

u1

u2

uJ

Softm

ax

h1 h2 hT

u1

u2

uJ

Max

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

LSTM + SoftmaxDense + Softmax

Context Query

Query2Context and Context2QueryAttention

WordEmbedding

GLOVE Char-CNN

Character Embed Layer

CharacterEmbedding

g1 g2 gT

m1 m2 mT

PhraseEmbeddingLayer• Inputs:thechar/wordembeddingofqueryandcontextwords• Outputs:wordrepresentationsawareoftheirneighbors(phrase-awarewords)

• ApplybidirectionalRNN(LSTM)forbothqueryandcontext

Modeling Layer

Output Layer

Attention Flow Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM

LSTM

LSTM

LSTM

Start End

h1 h2 hT

u1

u2

uJ

Softm

ax

h1 h2 hT

u1

u2

uJ

Max

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

LSTM + SoftmaxDense + Softmax

Context Query

Query2Context and Context2QueryAttention

WordEmbedding

GLOVE Char-CNN

Character Embed Layer

CharacterEmbedding

g1 g2 gT

m1 m2 mT

Context Query

AttentionLayer

Modeling Layer

Output Layer

Attention Flow Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM

LSTM

LSTM

LSTM

Start End

h1 h2 hT

u1

u2

uJ

Softm

ax

h1 h2 hT

u1

u2

uJ

Max

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

LSTM + SoftmaxDense + Softmax

Context Query

Query2Context and Context2QueryAttention

WordEmbedding

GLOVE Char-CNN

Character Embed Layer

CharacterEmbedding

g1 g2 gT

m1 m2 mT

AttentionLayer

• Inputs:phrase-awarecontextandquerywords• Outputs:query-awarerepresentationsofcontextwords

• Context-to-queryattention:Foreach(phrase-aware)contextword,choosethemostrelevantwordfromthe(phrase-aware)querywords• Query-to-contextattention:Choosethecontextwordthatismostrelevanttoanyofquerywords.

Modeling Layer

Output Layer

Attention Flow Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM

LSTM

LSTM

LSTM

Start End

h1 h2 hT

u1

u2

uJ

Softm

ax

h1 h2 hT

u1

u2

uJ

Max

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

LSTM + SoftmaxDense + Softmax

Context Query

Query2Context and Context2QueryAttention

WordEmbedding

GLOVE Char-CNN

Character Embed Layer

CharacterEmbedding

g1 g2 gT

m1 m2 mT

Context-to-QueryAttention(C2Q)

Q:WholeadstheUnitedStates?

C:BarakObamaisthepresidentoftheUSA.

Foreachcontextword,findthemostrelevantqueryword.

Query-to-ContextAttention(Q2C)

WhileSeattle’sweatherisveryniceinsummer,itsweatherisveryrainyinwinter,makingitoneofthemostgloomycitiesintheU.S.LAis…

Q:Whichcityisgloomyinwinter?

ModelingLayer

Modeling Layer

Output Layer

Attention Flow Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM

LSTM

LSTM

LSTM

Start End

h1 h2 hT

u1

u2

uJ

Softm

ax

h1 h2 hT

u1

u2

uJ

Max

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

LSTM + SoftmaxDense + Softmax

Context Query

Query2Context and Context2QueryAttention

WordEmbedding

GLOVE Char-CNN

Character Embed Layer

CharacterEmbedding

g1 g2 gT

m1 m2 mT

ModelingLayer

• Attentionlayer:modelinginteractionsbetweenqueryandcontext• Modelinglayer:modelinginteractionswithin(query-aware)contextwordsviaRNN(LSTM)

• Divisionoflabor:letattentionandmodelinglayerssolelyfocusontheirowntasks• Weexperimentallyshowthatthisleadstoabetterresultthanintermixingattentionandmodeling

OutputLayer

Modeling Layer

Output Layer

Attention Flow Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM

LSTM

LSTM

LSTM

Start End

h1 h2 hT

u1

u2

uJ

Softm

ax

h1 h2 hT

u1

u2

uJ

Max

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

LSTM + SoftmaxDense + Softmax

Context Query

Query2Context and Context2QueryAttention

WordEmbedding

GLOVE Char-CNN

Character Embed Layer

CharacterEmbedding

g1 g2 gT

m1 m2 mT

Training

• Minimizesthenegativelogprobabilitiesofthetruestartindexandthetrueendindex

𝑦*+ Trueendindexofexamplei

𝑦*, Truestartindexofexamplei

𝐩+ Probabilitydistributionofstopindex

𝐩, Probabilitydistributionofstartindex

Previouswork

• Usingneuralattentionasacontroller(Xiong etal.,2016)• UsingneuralattentionwithinRNN(Wang&Jiang,2016)• Mostoftheseattentionsareuni-directional

• BiDAF (ourmodel)• usesneuralattentionasalayer,• Isseparatedfrommodelingpart(RNN),• Isbidirectional

VGG-16

Modeling Layer

Output Layer

Attention Flow Layer

Phrase Embed Layer

Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM

LSTM

LSTM

LSTM

Start End

h1 h2 hT

u1

u2

uJ

Softm

ax

h1 h2 hT

u1

u2

uJ

Max

Softmax

Context2Query

Query2Context

h1 h2 hT u1 uJ

LSTM + SoftmaxDense + Softmax

Context Query

Query2Context and Context2QueryAttention

WordEmbedding

GLOVE Char-CNN

Character Embed Layer

CharacterEmbedding

g1 g2 gT

m1 m2 mT

BiDAF (ours)

ImageClassifierandBiDAF

StanfordQuestionAnsweringDataset(SQuAD)(Rajpurkar etal.,2016)

• MostpopulararticlesfromWikipedia• QuestionsandanswersfromTurkers• 90ktrain,10kdev,?test(hidden)• Answermustlieinthecontext• Twometrics:ExactMatch(EM)andF1

SQuAD Results(http://stanford-qa.com)asofDec2

(ICLR2017)

Now..

50

55

60

65

70

75

80

NoCharEmbedding NoWordEmbedding NoC2QAttention NoQ2CAttention DynamicAttention FullModel

EM F1

Ablationsondevdata

InteractiveDemo

http://allenai.github.io/bi-att-flow/demo

AttentionVisualizations

There%are%13 natural%reserves%in%Warsaw%–among%others%,%Bielany Forest%,%KabatyWoods%,%Czerniaków Lake%.%About%15%kilometres (%9%miles%)%from%Warsaw%,%the%Vistula%river%'s%environment%changes%strikingly%and%features%a%perfectly%preserved%ecosystem%,%with%a%habitat%of%animals%that%includes%the%otter%,%beaver%and%hundreds%of%bird%species%.%There%are%also%several%lakes%in%Warsaw%– mainly%the%oxbow%lakes%,%like%Czerniaków Lake%,%the%lakes%in%the%Łazienkior%Wilanów Parks%,%Kamionek Lake%.%There%are%lot%of%small%lakes%in%the%parks%,%but%only%a%few%are%permanent%– the%majority%are%emptied%before%winter%to%clean%them%of%plants%and%sediments%.

Howmany

naturalreserves

arethere

inWarsaw

?

[]hundreds, few, among, 15, several, only, 13, 9natural, ofreservesare, are, are, are, are, includes[][]Warsaw, Warsaw, Warsawinter species

Where

did

Super

Bowl

50

take

place

?

Super%Bowl%50%was%an%American%football%game%to%determine%the%champion%of%the%National%Football%League%(%NFL%)%for%the%2015%season%.%The%American%Football%Conference%(%AFC%)%champion% Denver%Broncos%defeated%the%National%Football%Conference%(%NFC%)%champion%Carolina%Panthers%24–10%to%earn%their%third%Super%Bowl%title%.%The%game%was%played%on%February%7%,%2016%,%at%Levi%'s%Stadium%in%the%San%Francisco%Bay%Area%at%Santa%Clara%,%California .%As%this%was%the%50th%Super%Bowl%,%the%league%emphasized%the%"%golden%anniversary%"%with%various%goldZthemed%initiatives%,%as%well%as%temporarily%suspending%the%tradition%of%naming%each%Super%Bowl%game%with%Roman%numerals%(%under%which%the%game%would%have%been%known%as%"%Super%Bowl%L%"%)%,%so%that%the%logo%could%prominently%feature%the%Arabic%numerals%50%.

at, the, at, Stadium, Levi, in, Santa, Ana

[]

Super, Super, Super, Super, Super

Bowl, Bowl, Bowl, Bowl, Bowl

50

initiatives

EmbeddingVisualizationatWordvsPhraseLayers

January

September

August

July

May

may

effect and may result in

the state may not aid

of these may be more

Opening in May 1852 at

debut on May 5 ,

from 28 January to 25

but by September had been

Howdoesitcomparewithfeature-basedmodels?

CNN/DailyMail ClozeTest(Hermannetal.,2015)

• ClozeTest(PredictingMissingwords)• ArticlesfromCNN/DailyMail• Human-writtensummaries• Missingwordsarealwaysentities• CNN– 300karticle-querypairs• DailyMail – 1Marticle-querypairs

CNN/DailyMail ClozeTestResults

TransferLearning(ACL2017)

SomelimitationsofSQuAD

Reasoningcapability

NLUcapability End-to-end

bAbIQA&Dialog

ReasoningQuestionAnswering

DialogSystem

U:CanyoubookatableinRomeinItalianCuisine

S:Howmanypeopleinyourparty?

U:Forfourpeopleplease.

S:Whatpricerangeareyoulookingfor?

DialogtaskvsQA

• DialogsystemcanbeconsideredasQAsystem:• Lastuser’sutteranceisthequery• Allpreviousconversationsarecontexttothequery• Thesystem’snextresponseistheanswertothequery

• Posesafewuniquechallenges• Dialogsystemrequirestrackingstates• Dialogsystemneedstolookatmultiplesentencesintheconversation• Buildingend-to-enddialogsystemismorechallenging

Ourapproach:Query-Reduction

<START>Sandragottheapplethere.Sandradroppedtheapple.Danieltooktheapplethere.Sandrawenttothehallway.Danieljourneyedtothegarden.

Q:Whereistheapple?

Reducedquery:

Whereistheapple?WhereisSandra?WhereisSandra?WhereisDaniel?WhereisDaniel?WhereisDaniel?à garden

A:garden

Query-ReductionNetworks• Reducethequeryintoaneasier-to-answerqueryoverthesequenceofstate-changingtriggers(sentences),invectorspace

Sandragottheapplethere.

!"

!"

#""

#"$

%""

%"$

Where isSandra?

Sandradroppedtheapple

!$

!$

#$"

#$$

%""

%$$

Danieltooktheapplethere.

!&

!&

#&"

#&$

%""

%&$

Where isDaniel?

Sandrawenttothehallway.

!'

!'

#'"

#'$

%""

%'$

Where isDaniel?

Danieljourneyedtothegarden.

!(

!(

#("

#($

%""

%($ → *+

Where isDaniel?

Whereistheapple?

#

garden

Where isSandra?

∅ ∅ ∅ ∅

QRNCell

𝛼 𝜌

1 − ×

× +

𝐱𝑡 𝐪𝑡

𝐡𝑡−1 𝐡𝑡

𝐳𝑡 𝐡𝑡

sentence query

reducedquery(hiddenstate)

updategatecandidatereducedquery

updatefunc reductionfunc

CharacteristicsofQRN

• Updategatecanbeconsideredaslocalattention• QRNchoosestoconsider/ignoreeachcandidatereducedquery• Thedecisionismadelocally(asopposedtoglobalsoftmax attention)

• SubclassofRecurrentNeuralNetwork(RNN)• Twoinputs,hiddenstate,gatingmechanism• Abletohandlesequentialdependency(attentioncannot)

• Simplerrecurrentupdateenablesparallelization overtime• Candidatehiddenstate(reducedquery)iscomputedfrominputsonly• Hiddenstatecanbeexplicitlycomputedasafunctionofinputs

Parallelizationcomputedfrominputsonly,socanbetriviallyparallelized

Canbeexplicitlyexpressedasthegeometricsumofpreviouscandidatehiddenstates

Parallelization

CharacteristicsofQRN

• Updategatecanbeconsideredaslocalattention• SubclassofRecurrentNeuralNetwork(RNN)• Simplerrecurrentupdateenablesparallelization overtime

QRNsitsbetweenneuralattentionmechanismandrecurrentneuralnetworks,takingtheadvantageofbothparadigms.

bAbI QADataset

• 20 differenttasks• 1kstory-questionpairsforeachtask(10kalsoavailable)• Syntheticallygenerated• Manyquestionsrequirelookingatmultiplesentences• Forend-to-endsystemsupervisedbyanswersonly

What’sdifferentfromSQuAD?

• Synthetic• Morethanlexical/syntacticunderstanding• Differentkindsofinferences• induction,deduction,counting,pathfinding,etc.

• Reasoningovermultiplesentences• InterestingtestbedtowardsdevelopingcomplexQAsystem(anddialogsystem)

bAbI QAResults(1k)(ICLR2017)

0

10

20

30

40

50

60

LSTM DMN+ MemN2N GMemN2N QRN(Ours)

AvgError(%)

AvgError(%)

bAbI QAResults(10k)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

MemN2N DNC GMemN2N DMN+ QRN(Ours)

AvgError(%)

AvgError(%)

DialogDatasets

• bAbI DialogDataset• Synthetic• 5differenttasks• 1kdialogsforeachtask

• DSTC2*Dataset• Realdataset• EvaluationmetricisdifferentfromoriginalDSTC2:responsegenerationinsteadof“state-tracking”• Eachdialogis800+utterances• 2407possibleresponses

bAbI DialogResults(OOV)

0

5

10

15

20

25

30

35

MemN2N GMemN2N QRN(Ours)

AvgError(%)

AvgError(%)

DSTC2*DialogResults

0

10

20

30

40

50

60

70

MemN2N GMemN2N QRN(Ours)

AvgError(%)

AvgError(%)

bAbI QAVisualization

𝑧/ = Localattention(updategate)atlayerl

DSTC2(Dialog)Visualization

𝑧/ = Localattention(updategate)atlayerl

So…

Reasoningcapability

NLUcapability End-to-end

Isthispossible?

Reasoningcapability

NLUcapability End-to-end

Orthis?

So… Whatshouldwedo?

• Disclaimer:completelysubjective!

• Logic(reasoning)isdiscrete• Modelinglogicwithdifferentiablemodelishard• Relaxation:eitherhardtooptimizeorconvergetobadoptimum(lowgeneralizationerror)• Estimation:Low-biasorlow-variancemethodsareproposed(Williams,1992;Jangetal.,2017),butimprovementsarenotsubstantial.• Bigdata:howmuchdoweneed?Exponentiallymany?• Perhapsnewparadigmisneeded…

“Ifyougotabilliondollarstospendonahugeresearchproject,whatwouldyouliketodo?”

“I'dusethebilliondollarstobuildaNASA-sizeprogramfocusingonnaturallanguageprocessing(NLP),inallofitsglory(semantics,pragmatics,etc).”

MichaelJordanProfessorofComputerScienceUCBerkeley

TowardsArtificialGeneralIntelligence…

Naturallanguageisthebesttooltodescribeandcommunicate“thoughts”

Askingandansweringquestionsisaneffectivewaytodevelopdeeper“thoughts”

Thankyou!

• minjoon@cs.uw.edu• http://seominjoon.github.io

Recommended