Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Question Answering/Semantic Parsing
CS 287
(Slides from Yoav Artzi, Cornell NY)
Sequence-to-Sequence
the red dog
y1 y2 y3
ss1 ss2 ss3 st1 st2 st3
x1 x2 x3 x1 x2 x3
the red dog <s>
Question
It has become a popular strategy to handle sequential prediction
problems using a seq2seq setup such as attention-based translation.
How might you handle the following problems:
1. Segmentation (HW4)
2. Rare Word Replacement (HW3)
3. Part-of-Speech Tagging (HW2-HW5)
Semantics
Branch of linguistic focused on meaning
I Lexical Semantics
I Meaning of individual words
I e.g. BoW models, word2vec
I Compositional Semantics
I Meaning of utterances
I Concerned with the relations of meaning
I Often expressed with logical relations
Today’s Lecture
Survey of Question Answering
I Semantic Parsing: Linguistic model of questions and answers
I Knowledge Bases and Datasets
I IR-style approaches: Watson and Jeopardy!
Contents
Lambda Calculus
Data and Problems
Semantic Parsing
Jeopardy!
Lambda Calculus
• Formal system to express computation!
• Allows high-order functions
�a.move(a) ^ dir(a, LEFT ) ^ to(a, ◆y.chair(y))^pass(a, Ay.sofa(y) ^ intersect(Az.intersection(z), y))
[Church 1932]
Lambda Calculus Base Cases
• Logical constant!
• Variable!
• Literal!
• Lambda term
Lambda Calculus Logical Constants
NY C, CA, RAINIER, LEFT, . . .
located in, depart date, . . .
• Represent objects in the world
Lambda Calculus Variables
• Abstract over objects in the world!
• Exact value not pre-determined
x, y, z, . . .
Lambda Calculus Literals
• Represent function application
located in(AUSTIN, TEXAS)
city(AUSTIN)
ArgumentsPredicate
Lambda Calculus Literals
• Represent function application
located in(AUSTIN, TEXAS)
city(AUSTIN)
Logical expression List of logical expressions
Lambda Calculus Lambda Terms
• Bind/scope a variable!
• Repeat to bind multiple variables
�x.�y.located in(x, y)
�x.city(x)
Body
Lambda operator
Variable
Lambda Calculus Lambda Terms
• Bind/scope a variable!
• Repeat to bind multiple variables
�x.�y.located in(x, y)
�x.city(x)
Lambda Calculus Quantifiers?
• Higher order constants!
• No need for any special mechanics!
• Can represent all of first order logic
8(�x.big(x) ^ apple(x))
¬(9(�x.lovely(x))
◆(�x.beautiful(x) ^ grammar(x))
Lambda Calculus Syntactic Sugar
^ (A,^(B, C)) , A ^ B ^ C
_ (A,_(B, C)) , A _ B _ C
¬(A) , ¬A
Q(�x.f(x)) , Qx.f(x)
for Q 2 {◆, A, 9, 8}
�x.flight(x) ^ to(x, move)
�x.flight(x) ^ to(x, NY C)
�x.NY C(x) ^ x(to, move)
�x.flight(x) ^ to(x, move)
�x.flight(x) ^ to(x, NY C)
�x.NY C(x) ^ x(to, move)
Simply Typed Lambda Calculus
• Like lambda calculus!
• But, typed
[Church 1940]
�x.flight(x) ^ to(x, move)
�x.flight(x) ^ to(x, NY C)
�x.NY C(x) ^ x(to, move)
Lambda Calculus Typing
et Truth-
value
Entity
• Simple types!
• Complex types
< e, t >
<< e, t >, e >
Lambda Calculus Typing
et Truth-
value
Entity
• Simple types!
• Complex types
RangeDomain
< e, t >
<< e, t >, e >
Type constructor
Lambda Calculus Typinge
tr
t
loc
• Hierarchical typing system
• Simple types!
• Complex types
RangeDomain
< e, t >
<< e, t >, e >
Type constructor
Lambda Calculus Typinge
flafl
tr
gt
t
locap
ci
iti
• Hierarchical typing system
RangeDomain
• Simple types!
• Complex typesType
constructor
< e, t >
<< e, t >, e >
Simply Typed Lambda Calculus
�a.move(a) ^ dir(a, LEFT ) ^ to(a, ◆y.chair(y))^pass(a, Ay.sofa(y) ^ intersect(Az.intersection(z), y))
Type information usually omitted
MountainsName State
Bianca CO
Antero CO
Rainier WA
Shasta CA
Wrangel AK
Sill CA
Bona AK
Elbert CO
BorderState
Capturing Meaning with Lambda Calculus
Abbr. Capital Pop.
AL Montgomery 3.9
AK Juneau 0.4
AZ Phoenix 2.7
WA Olympia 4.1
NY Albany 17.5
IL Springfield 11.4
State1 State2WA OR
WA ID
CA OR
CA NV
CA AZ
Show me mountains in states bordering Texas
[Zettlemoyer and Collins 2005]
Parsing as Structure Prediction
show me flights to Boston
S/N N PP/NP NP�f.f �x.flight(x) �y.�x.to(x, y) BOSTON
>PP
�x.to(x, BOSTON)
N\N�f.�x.f(x) ^ to(x, BOSTON)
<N
�x.flight(x) ^ to(x, BOSTON)>
S�x.flight(x) ^ to(x, BOSTON)
Contents
Lambda Calculus
Data and Problems
Semantic Parsing
Jeopardy!
Dataset: Geoquery
I Maps natural language questions to lambda calculus.
I Assume a database with numerical values as well
(population(TEXAS))
I Queries contain rich nesting
size(stateid(X), S) :-
area(stateid(X), S).
size(cityid(X,St), S) :-
population(cityid(X,St), S).
size(riverid(X), S) :-
len(riverid(X),S).
size(placeid(X), S) :-
elevation(placeid(X),S).
size(X,X) :-
number(X).
next_to(stateid(X),stateid(Y)) :-
border(X,_,Ys),
member(Y,Ys).
Quiz: Geoquery
I What states border Texas?
I What is the largest state?
I What states border the state that borders the most state?
(Syntactic Sugar: arg max(f , g) returns max of element of g that
satisfies predicate f .)
Geoquery (Zelle and Mooney, 1996)
WebQuestions (Berant, 2013)
Questions:
I what high school did president bill clinton attend?
I what form of government does russia have today?
I what movies does taylor lautner play in?
Answers:
I Hot Springs High School
http://www.freebase.com/view/en/bill_clinton
I Constitutional republic
http://www.freebase.com/view/en/russia
I Eclipse, Valentine’s Day, The Twilight Saga: Breaking Dawn - Part
1, New Moon
http://www.freebase.com/view/en/taylor_lautner
WebQuestions
To collect this dataset, we used the Google Suggest API to
obtain questions that begin with a whword and contain
exactly one entity. We started with the question Where was
Barack Obama born? and performed a breadth-first search
over questions (nodes), using the Google Suggest API
supplying the edges of the graph. Specifically, we queried the
question excluding the entity, the phrase before the entity, or
the phrase after it; each query generates 5 candidate
questions, which are added to the queue. We iterated until
1M questions were visited; a random 100K were submitted to
Amazon Mechanical Turk.
WebQuestions
The AMT task requested that workers answer the question
using only the Freebase page of the questions entity, or
otherwise mark it as unanswerable by Freebase. The answer
was restricted to be one of the possible entities, values, or list
of entities on the page. As this list was long, we allowed the
user to filter the list by typing. We paid the workers $0.03 per
question. Out of 100K questions, 6,642 were annotated
identically by at least two AMT workers.
Contents
Lambda Calculus
Data and Problems
Semantic Parsing
Jeopardy!
Language to Meaning
More informative
Semantic Parsing
Recover complete meaning
representation
Database QueryExample Task
What states border Texas?
Oklahoma!New Mexico!
Arkansas!Louisiana
Language to Meaning
More informative
Semantic Parsing
Recover complete meaning
representation
Instructing a RobotExample Task
at the chair, turn right
Language to Meaning
More informative
Semantic Parsing
Recover complete meaning
representation
at the chair, move forward three steps past the sofa
�a.pre(a, ◆x.chair(x)) ^ move(a) ^ len(a, 3)^dir(a, forward) ^ past(a, ◆y.sofa(y))
Language to Meaning
More informative
Semantic Parsing
Recover complete meaning
representation
at the chair, move forward three steps past the sofa
�a.pre(a, ◆x.chair(x)) ^ move(a) ^ len(a, 3)^dir(a, forward) ^ past(a, ◆y.sofa(y))
Language to Meaning
at the chair, move forward three steps past the sofa
Learn
f : sentence ! logical form
�a.pre(a, ◆x.chair(x)) ^ move(a) ^ len(a, 3)^dir(a, forward) ^ past(a, ◆y.sofa(y))
Language to Meaning
at the chair, move forward three steps past the sofa
Learn
f : sentence ! logical form
Supervised Datashow me flights to Boston
S/N N PP/NP NP�f.f �x.flight(x) �y.�x.to(x, y) BOSTON
>PP
�x.to(x, BOSTON)
N\N�f.�x.f(x) ^ to(x, BOSTON)
<N
�x.flight(x) ^ to(x, BOSTON)>
S�x.flight(x) ^ to(x, BOSTON)
show me flights to Boston
S/N N PP/NP NP�f.f �x.flight(x) �y.�x.to(x, y) BOSTON
>PP
�x.to(x, BOSTON)
N\N�f.�x.f(x) ^ to(x, BOSTON)
<N
�x.flight(x) ^ to(x, BOSTON)>
S�x.flight(x) ^ to(x, BOSTON)
Supervised Data
Latent
Supervised DataSupervised learning is done from pairs
of sentences and logical forms
Show me flights to Boston
I need a flight from baltimore to seattle�x.flight(x) ^ from(x, BALTIMORE) ^ to(x, SEATTLE)
�x.flight(x) ^ to(x, BOSTON)
what ground transportation is available in san francisco�x.ground transport(x) ^ to city(x, SF )
[Zettlemoyer and Collins 2005; 2007]
Weak Supervision
• Logical form is latent!
• “Labeling” requires less expertise!
• Labels don’t uniquely determine correct logical forms!
• Learning requires executing logical forms within a system and evaluating the result
Weak Supervision Learning from Query Answers
What is the largest state that borders Texas?
New Mexico
[Clarke et al. 2010; Liang et al. 2011]
Weak Supervision Learning from Query Answers
What is the largest state that borders Texas?
New Mexicoargmax(�x.state(x)
^ border(x, TX),�y.size(y))
argmax(�x.river(x)
^ in(x, TX),�y.size(y))
[Clarke et al. 2010; Liang et al. 2011]
Weak Supervision Learning from Query Answers
What is the largest state that borders Texas?
New Mexicoargmax(�x.state(x)
^ border(x, TX),�y.size(y))
argmax(�x.river(x)
^ in(x, TX),�y.size(y))
New Mexico
Rio Grande
[Clarke et al. 2010; Liang et al. 2011]
Weak Supervision Learning from Query Answers
What is the largest state that borders Texas?
New Mexicoargmax(�x.state(x)
^ border(x, TX),�y.size(y))
argmax(�x.river(x)
^ in(x, TX),�y.size(y))
New Mexico
Rio Grande
[Clarke et al. 2010; Liang et al. 2011]
Weak Supervision
• Logical form is latent!
• “Labeling” requires less expertise!
• Labels don’t uniquely determine correct logical forms!
• Learning requires executing logical forms within a system and evaluating the result
Weak Supervision Learning from Query Answers
What is the largest state that borders Texas?
New Mexico
[Clarke et al. 2010; Liang et al. 2011]
Weak Supervision Learning from Query Answers
What is the largest state that borders Texas?
New Mexicoargmax(�x.state(x)
^ border(x, TX),�y.size(y))
argmax(�x.river(x)
^ in(x, TX),�y.size(y))
[Clarke et al. 2010; Liang et al. 2011]
Weak Supervision Learning from Query Answers
What is the largest state that borders Texas?
New Mexicoargmax(�x.state(x)
^ border(x, TX),�y.size(y))
argmax(�x.river(x)
^ in(x, TX),�y.size(y))
New Mexico
Rio Grande
[Clarke et al. 2010; Liang et al. 2011]
Weak Supervision Learning from Query Answers
What is the largest state that borders Texas?
New Mexicoargmax(�x.state(x)
^ border(x, TX),�y.size(y))
argmax(�x.river(x)
^ in(x, TX),�y.size(y))
New Mexico
Rio Grande
[Clarke et al. 2010; Liang et al. 2011]
Weak Supervision Learning from Demonstrations
[Chen and Mooney 2011; Kim and Mooney 2012; Artzi and Zettlemoyer 2013b]
at the chair, move forward three steps past the sofa
Weak Supervision Learning from Demonstrations
[Chen and Mooney 2011; Kim and Mooney 2012; Artzi and Zettlemoyer 2013b]
at the chair, move forward three steps past the sofa
Some examples from other domains:!
• Sentences and labeled game states [Goldwasser and Roth 2011]!
• Sentences and sets of physical objects [Matuszek et al. 2012]
Weak Supervision Learning from Conversation Logs
SYSTEM how can I help you ? (OPEN_TASK)USER i ‘ d like to fly to new yorkSYSTEM flying to new york . (CONFIRM: from(fl, ATL)) leaving what city ?
(ASK: λx.from(fl,x))USER from boston on june seven with american airlinesSYSTEM flying to new york . (CONFIRM: to(fl, NYC)) what date would you
like to depart boston ? (ASK: λx.date(fl,x)∧to(fl, BOS))USER june seventh[CONVERSATION CONTINUES]
[Artzi and Zettlemoyer 2011]
ModelingParsing
!
• Structured perceptron!
• A unified learning algorithm!
• Supervised learning!
• Weak supervision
Learning
Structured Perceptron
• Simple additive updates !
- Only requires efficient decoding (argmax)!
- Closely related to MaxEnt and other feature rich models!
- Provably finds linear separator in finite updates, if one exists!
• Challenge: learning with hidden variables
Structured Perceptron
Data: {(xi, yi) : i = 1 . . . n}
For t = 1 . . . T :
For i = 1 . . . n:y⇤ arg maxyh✓,�(xi, y)iIf y⇤ 6= yi:✓ ✓ + �(xi, yi)� �(xi, y
⇤)
[iterate epochs]!
[iterate examples]
[predict]![check]!
[update]
[Collins 2002]
Log-linear model:
Step 1: Differentiate, to maximize data log-likelihood
Step 2: Use online, stochastic gradient updates, for example i:
Step 3: Replace expectations with maxes (Viterbi approx.)
p(y|x) = ew·f(x,y)
Py0 ew·f(x,y0)
update =X
i
f(xi, yi)� Ep(y|xi)f(xi, y)
updatei = f(xi, yi)� Ep(y|xi)f(xi, y)
y⇤ = argmaxy
w · f(xi, y)whereupdatei = f(xi, yi)� f(xi, y⇤)
One Derivation of the Perceptron
The Perceptron with Hidden Variables
Log-linear model:
Step 1: Differentiate marginal, to maximize data log-likelihood
Step 2: Use online, stochastic gradient updates, for example i:
Step 3: Replace expectations with maxes (Viterbi approx.)
where
p(y, h|x) = ew·f(x,h,y)
Py0,h0 ew·f(x,h0,y0)
update =X
i
Ep(h|yi,xi)[f(xi, h, yi)]� Ep(y,h|xi)[f(xi, h, y)]
updatei = f(xi, h0, yi)� f(xi, h
⇤, y⇤)
y⇤, h⇤ = argmaxy,h
w · f(xi, h, y) and h0 = argmaxh
w · f(xi, h, yi)
p(y|x) =X
h
p(y, h|x)
updatei = Ep(yi,h|xi)[f(xi, h, yi)]� Ep(y,h|xi)[f(xi, h, y)]
Hidden Variable Perceptron
[iterate epochs]!
[iterate examples]
[predict]!
[check]!
[predict hidden]!
[update]
Data: {(xi, yi) : i = 1 . . . n}
For t = 1 . . . T :
For i = 1 . . . n:y⇤, h⇤ arg maxy,hh✓,�(xi, h, y)iIf y⇤ 6= yi:
h0 arg maxhh✓,�(xi, h, yi)
✓ ✓ + �(xi, h0, yi)� �(xi, h
⇤, y⇤)
[Liang et al. 2006; Zettlemoyer and Collins 2007]
Contents
Lambda Calculus
Data and Problems
Semantic Parsing
Jeopardy!
DeepQA
I Information Retrieval based factoid QA
I Combines search, rule-based extraction, and ranking
I Kitchen-sink type of approach
I (Nothing to do with deep learning)
DeepQA (Ferrucci, 2010)
I Massive parallelism: Exploit massive parallelism in the
consideration of multiple interpretations and hypotheses.
I Many experts: Facilitate the integration, application, and
contextual evaluation of a wide range of loosely coupled
probabilistic question and content analytics.
I Pervasive confidence estimation: No component commits to an
answer; all components produce features and associated
confidences, scoring different question and content interpretations.
An underlying confidence-processing substrate learns how to stack
and combine the scores.
I Integrate shallow and deep knowledge: Balance the use of strict
semantics and shallow semantics, leveraging many loosely formed
ontologies
This motivates an approach that merges answer scores before
ranking and confidence estimation. Using an ensemble of
matching, normalization, and coreference resolution
algorithms, Watson identifies equivalent and related
hypotheses (for example, Abraham Lincoln and Honest Abe)
and then enables custom merging per feature to combine
scores.
Given the kinds of questions and broad domain of the
Jeopardy Challenge, the sources for Watson include a wide
range of encyclopedias, dictionaries, thesauri, newswire
articles, literary works, and so on.
A variety of search techniques are used, including the use of
multiple text search engines with different underlying
approaches (for example, Indri and Lucene), document search
as well as passage search, knowledge base search using
SPARQL on triple stores, the generation of multiple search
queries for a single question, and backfilling hit lists to satisfy
key constraints identified in the question.
Hypothesis Ranking
Other Aspects
I Parsing algorithm
I Inducing rules and parameters
I Relationship between syntax and semantics
CCG Categories
ADJ : �x.fun(x)
• Basic building block!
• Capture syntactic and semantic information jointly
CCG Categories
Syntax SemanticsADJ : �x.fun(x)
• Basic building block!
• Capture syntactic and semantic information jointly
CCG Categories
• Primitive symbols: N, S, NP, ADJ and PP!
• Syntactic combination operator (/,\)!
• Slashes specify argument order and direction
Syntax ADJ : �x.fun(x)
NP : CCG
(S\NP )/ADJ : �f.�x.f(x)
Semantics
CCG Categories
• λ-calculus expression!
• Syntactic type maps to semantic type
ADJ : �x.fun(x)
NP : CCG
(S\NP )/ADJ : �f.�x.f(x)
CCG Lexical Entries
fun ` ADJ : �x.fun(x)
• Pair words and phrases with meaning!
• Meaning captured by a CCG category
CCG CategoryNatural!Language
CCG Lexical Entries
fun ` ADJ : �x.fun(x)
• Pair words and phrases with meaning!
• Meaning captured by a CCG category