85
Question Answering/Semantic Parsing CS 287 (Slides from Yoav Artzi, Cornell NY)

CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Question Answering/Semantic Parsing

CS 287

(Slides from Yoav Artzi, Cornell NY)

Page 2: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Sequence-to-Sequence

the red dog

y1 y2 y3

ss1 ss2 ss3 st1 st2 st3

x1 x2 x3 x1 x2 x3

the red dog <s>

Page 3: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional
Page 4: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Question

It has become a popular strategy to handle sequential prediction

problems using a seq2seq setup such as attention-based translation.

How might you handle the following problems:

1. Segmentation (HW4)

2. Rare Word Replacement (HW3)

3. Part-of-Speech Tagging (HW2-HW5)

Page 5: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Semantics

Branch of linguistic focused on meaning

I Lexical Semantics

I Meaning of individual words

I e.g. BoW models, word2vec

I Compositional Semantics

I Meaning of utterances

I Concerned with the relations of meaning

I Often expressed with logical relations

Page 6: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Today’s Lecture

Survey of Question Answering

I Semantic Parsing: Linguistic model of questions and answers

I Knowledge Bases and Datasets

I IR-style approaches: Watson and Jeopardy!

Page 7: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Contents

Lambda Calculus

Data and Problems

Semantic Parsing

Jeopardy!

Page 8: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Lambda Calculus

• Formal system to express computation!

• Allows high-order functions

�a.move(a) ^ dir(a, LEFT ) ^ to(a, ◆y.chair(y))^pass(a, Ay.sofa(y) ^ intersect(Az.intersection(z), y))

[Church 1932]

Page 9: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Lambda Calculus Base Cases

• Logical constant!

• Variable!

• Literal!

• Lambda term

Page 10: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Lambda Calculus Logical Constants

NY C, CA, RAINIER, LEFT, . . .

located in, depart date, . . .

• Represent objects in the world

Page 11: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Lambda Calculus Variables

• Abstract over objects in the world!

• Exact value not pre-determined

x, y, z, . . .

Page 12: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Lambda Calculus Literals

• Represent function application

located in(AUSTIN, TEXAS)

city(AUSTIN)

Page 13: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

ArgumentsPredicate

Lambda Calculus Literals

• Represent function application

located in(AUSTIN, TEXAS)

city(AUSTIN)

Logical expression List of logical expressions

Page 14: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Lambda Calculus Lambda Terms

• Bind/scope a variable!

• Repeat to bind multiple variables

�x.�y.located in(x, y)

�x.city(x)

Page 15: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Body

Lambda operator

Variable

Lambda Calculus Lambda Terms

• Bind/scope a variable!

• Repeat to bind multiple variables

�x.�y.located in(x, y)

�x.city(x)

Page 16: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Lambda Calculus Quantifiers?

• Higher order constants!

• No need for any special mechanics!

• Can represent all of first order logic

8(�x.big(x) ^ apple(x))

¬(9(�x.lovely(x))

◆(�x.beautiful(x) ^ grammar(x))

Page 17: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Lambda Calculus Syntactic Sugar

^ (A,^(B, C)) , A ^ B ^ C

_ (A,_(B, C)) , A _ B _ C

¬(A) , ¬A

Q(�x.f(x)) , Qx.f(x)

for Q 2 {◆, A, 9, 8}

Page 18: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

�x.flight(x) ^ to(x, move)

�x.flight(x) ^ to(x, NY C)

�x.NY C(x) ^ x(to, move)

Page 19: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

�x.flight(x) ^ to(x, move)

�x.flight(x) ^ to(x, NY C)

�x.NY C(x) ^ x(to, move)

Page 20: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Simply Typed Lambda Calculus

• Like lambda calculus!

• But, typed

[Church 1940]

�x.flight(x) ^ to(x, move)

�x.flight(x) ^ to(x, NY C)

�x.NY C(x) ^ x(to, move)

Page 21: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Lambda Calculus Typing

et Truth-

value

Entity

• Simple types!

• Complex types

< e, t >

<< e, t >, e >

Page 22: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Lambda Calculus Typing

et Truth-

value

Entity

• Simple types!

• Complex types

RangeDomain

< e, t >

<< e, t >, e >

Type constructor

Page 23: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Lambda Calculus Typinge

tr

t

loc

• Hierarchical typing system

• Simple types!

• Complex types

RangeDomain

< e, t >

<< e, t >, e >

Type constructor

Page 24: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Lambda Calculus Typinge

flafl

tr

gt

t

locap

ci

iti

• Hierarchical typing system

RangeDomain

• Simple types!

• Complex typesType

constructor

< e, t >

<< e, t >, e >

Page 25: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Simply Typed Lambda Calculus

�a.move(a) ^ dir(a, LEFT ) ^ to(a, ◆y.chair(y))^pass(a, Ay.sofa(y) ^ intersect(Az.intersection(z), y))

Type information usually omitted

Page 26: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

MountainsName State

Bianca CO

Antero CO

Rainier WA

Shasta CA

Wrangel AK

Sill CA

Bona AK

Elbert CO

BorderState

Capturing Meaning with Lambda Calculus

Abbr. Capital Pop.

AL Montgomery 3.9

AK Juneau 0.4

AZ Phoenix 2.7

WA Olympia 4.1

NY Albany 17.5

IL Springfield 11.4

State1 State2WA OR

WA ID

CA OR

CA NV

CA AZ

Show me mountains in states bordering Texas

[Zettlemoyer and Collins 2005]

Page 27: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional
Page 28: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Parsing as Structure Prediction

show me flights to Boston

S/N N PP/NP NP�f.f �x.flight(x) �y.�x.to(x, y) BOSTON

>PP

�x.to(x, BOSTON)

N\N�f.�x.f(x) ^ to(x, BOSTON)

<N

�x.flight(x) ^ to(x, BOSTON)>

S�x.flight(x) ^ to(x, BOSTON)

Page 29: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Contents

Lambda Calculus

Data and Problems

Semantic Parsing

Jeopardy!

Page 30: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Dataset: Geoquery

I Maps natural language questions to lambda calculus.

I Assume a database with numerical values as well

(population(TEXAS))

I Queries contain rich nesting

Page 31: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

size(stateid(X), S) :-

area(stateid(X), S).

size(cityid(X,St), S) :-

population(cityid(X,St), S).

size(riverid(X), S) :-

len(riverid(X),S).

size(placeid(X), S) :-

elevation(placeid(X),S).

size(X,X) :-

number(X).

next_to(stateid(X),stateid(Y)) :-

border(X,_,Ys),

member(Y,Ys).

Page 32: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Quiz: Geoquery

I What states border Texas?

I What is the largest state?

I What states border the state that borders the most state?

(Syntactic Sugar: arg max(f , g) returns max of element of g that

satisfies predicate f .)

Page 33: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Geoquery (Zelle and Mooney, 1996)

Page 34: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

WebQuestions (Berant, 2013)

Questions:

I what high school did president bill clinton attend?

I what form of government does russia have today?

I what movies does taylor lautner play in?

Answers:

I Hot Springs High School

http://www.freebase.com/view/en/bill_clinton

I Constitutional republic

http://www.freebase.com/view/en/russia

I Eclipse, Valentine’s Day, The Twilight Saga: Breaking Dawn - Part

1, New Moon

http://www.freebase.com/view/en/taylor_lautner

Page 35: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional
Page 36: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional
Page 37: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional
Page 38: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

WebQuestions

To collect this dataset, we used the Google Suggest API to

obtain questions that begin with a whword and contain

exactly one entity. We started with the question Where was

Barack Obama born? and performed a breadth-first search

over questions (nodes), using the Google Suggest API

supplying the edges of the graph. Specifically, we queried the

question excluding the entity, the phrase before the entity, or

the phrase after it; each query generates 5 candidate

questions, which are added to the queue. We iterated until

1M questions were visited; a random 100K were submitted to

Amazon Mechanical Turk.

Page 39: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

WebQuestions

The AMT task requested that workers answer the question

using only the Freebase page of the questions entity, or

otherwise mark it as unanswerable by Freebase. The answer

was restricted to be one of the possible entities, values, or list

of entities on the page. As this list was long, we allowed the

user to filter the list by typing. We paid the workers $0.03 per

question. Out of 100K questions, 6,642 were annotated

identically by at least two AMT workers.

Page 40: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional
Page 41: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Contents

Lambda Calculus

Data and Problems

Semantic Parsing

Jeopardy!

Page 42: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Language to Meaning

More informative

Semantic Parsing

Recover complete meaning

representation

Database QueryExample Task

What states border Texas?

Oklahoma!New Mexico!

Arkansas!Louisiana

Page 43: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Language to Meaning

More informative

Semantic Parsing

Recover complete meaning

representation

Instructing a RobotExample Task

at the chair, turn right

Page 44: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Language to Meaning

More informative

Semantic Parsing

Recover complete meaning

representation

at the chair, move forward three steps past the sofa

�a.pre(a, ◆x.chair(x)) ^ move(a) ^ len(a, 3)^dir(a, forward) ^ past(a, ◆y.sofa(y))

Page 45: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Language to Meaning

More informative

Semantic Parsing

Recover complete meaning

representation

at the chair, move forward three steps past the sofa

�a.pre(a, ◆x.chair(x)) ^ move(a) ^ len(a, 3)^dir(a, forward) ^ past(a, ◆y.sofa(y))

Page 46: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Language to Meaning

at the chair, move forward three steps past the sofa

Learn

f : sentence ! logical form

�a.pre(a, ◆x.chair(x)) ^ move(a) ^ len(a, 3)^dir(a, forward) ^ past(a, ◆y.sofa(y))

Page 47: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Language to Meaning

at the chair, move forward three steps past the sofa

Learn

f : sentence ! logical form

Page 48: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Supervised Datashow me flights to Boston

S/N N PP/NP NP�f.f �x.flight(x) �y.�x.to(x, y) BOSTON

>PP

�x.to(x, BOSTON)

N\N�f.�x.f(x) ^ to(x, BOSTON)

<N

�x.flight(x) ^ to(x, BOSTON)>

S�x.flight(x) ^ to(x, BOSTON)

Page 49: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

show me flights to Boston

S/N N PP/NP NP�f.f �x.flight(x) �y.�x.to(x, y) BOSTON

>PP

�x.to(x, BOSTON)

N\N�f.�x.f(x) ^ to(x, BOSTON)

<N

�x.flight(x) ^ to(x, BOSTON)>

S�x.flight(x) ^ to(x, BOSTON)

Supervised Data

Latent

Page 50: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Supervised DataSupervised learning is done from pairs

of sentences and logical forms

Show me flights to Boston

I need a flight from baltimore to seattle�x.flight(x) ^ from(x, BALTIMORE) ^ to(x, SEATTLE)

�x.flight(x) ^ to(x, BOSTON)

what ground transportation is available in san francisco�x.ground transport(x) ^ to city(x, SF )

[Zettlemoyer and Collins 2005; 2007]

Page 51: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Weak Supervision

• Logical form is latent!

• “Labeling” requires less expertise!

• Labels don’t uniquely determine correct logical forms!

• Learning requires executing logical forms within a system and evaluating the result

Page 52: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Weak Supervision Learning from Query Answers

What is the largest state that borders Texas?

New Mexico

[Clarke et al. 2010; Liang et al. 2011]

Page 53: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Weak Supervision Learning from Query Answers

What is the largest state that borders Texas?

New Mexicoargmax(�x.state(x)

^ border(x, TX),�y.size(y))

argmax(�x.river(x)

^ in(x, TX),�y.size(y))

[Clarke et al. 2010; Liang et al. 2011]

Page 54: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Weak Supervision Learning from Query Answers

What is the largest state that borders Texas?

New Mexicoargmax(�x.state(x)

^ border(x, TX),�y.size(y))

argmax(�x.river(x)

^ in(x, TX),�y.size(y))

New Mexico

Rio Grande

[Clarke et al. 2010; Liang et al. 2011]

Page 55: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Weak Supervision Learning from Query Answers

What is the largest state that borders Texas?

New Mexicoargmax(�x.state(x)

^ border(x, TX),�y.size(y))

argmax(�x.river(x)

^ in(x, TX),�y.size(y))

New Mexico

Rio Grande

[Clarke et al. 2010; Liang et al. 2011]

Page 56: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Weak Supervision

• Logical form is latent!

• “Labeling” requires less expertise!

• Labels don’t uniquely determine correct logical forms!

• Learning requires executing logical forms within a system and evaluating the result

Page 57: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Weak Supervision Learning from Query Answers

What is the largest state that borders Texas?

New Mexico

[Clarke et al. 2010; Liang et al. 2011]

Page 58: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Weak Supervision Learning from Query Answers

What is the largest state that borders Texas?

New Mexicoargmax(�x.state(x)

^ border(x, TX),�y.size(y))

argmax(�x.river(x)

^ in(x, TX),�y.size(y))

[Clarke et al. 2010; Liang et al. 2011]

Page 59: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Weak Supervision Learning from Query Answers

What is the largest state that borders Texas?

New Mexicoargmax(�x.state(x)

^ border(x, TX),�y.size(y))

argmax(�x.river(x)

^ in(x, TX),�y.size(y))

New Mexico

Rio Grande

[Clarke et al. 2010; Liang et al. 2011]

Page 60: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Weak Supervision Learning from Query Answers

What is the largest state that borders Texas?

New Mexicoargmax(�x.state(x)

^ border(x, TX),�y.size(y))

argmax(�x.river(x)

^ in(x, TX),�y.size(y))

New Mexico

Rio Grande

[Clarke et al. 2010; Liang et al. 2011]

Page 61: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Weak Supervision Learning from Demonstrations

[Chen and Mooney 2011; Kim and Mooney 2012; Artzi and Zettlemoyer 2013b]

at the chair, move forward three steps past the sofa

Page 62: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Weak Supervision Learning from Demonstrations

[Chen and Mooney 2011; Kim and Mooney 2012; Artzi and Zettlemoyer 2013b]

at the chair, move forward three steps past the sofa

Some examples from other domains:!

• Sentences and labeled game states [Goldwasser and Roth 2011]!

• Sentences and sets of physical objects [Matuszek et al. 2012]

Page 63: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Weak Supervision Learning from Conversation Logs

SYSTEM how can I help you ? (OPEN_TASK)USER i ‘ d like to fly to new yorkSYSTEM flying to new york . (CONFIRM: from(fl, ATL)) leaving what city ?

(ASK: λx.from(fl,x))USER from boston on june seven with american airlinesSYSTEM flying to new york . (CONFIRM: to(fl, NYC)) what date would you

like to depart boston ? (ASK: λx.date(fl,x)∧to(fl, BOS))USER june seventh[CONVERSATION CONTINUES]

[Artzi and Zettlemoyer 2011]

Page 64: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

ModelingParsing

!

• Structured perceptron!

• A unified learning algorithm!

• Supervised learning!

• Weak supervision

Learning

Page 65: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Structured Perceptron

• Simple additive updates !

- Only requires efficient decoding (argmax)!

- Closely related to MaxEnt and other feature rich models!

- Provably finds linear separator in finite updates, if one exists!

• Challenge: learning with hidden variables

Page 66: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Structured Perceptron

Data: {(xi, yi) : i = 1 . . . n}

For t = 1 . . . T :

For i = 1 . . . n:y⇤ arg maxyh✓,�(xi, y)iIf y⇤ 6= yi:✓ ✓ + �(xi, yi)� �(xi, y

⇤)

[iterate epochs]!

[iterate examples]

[predict]![check]!

[update]

[Collins 2002]

Page 67: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Log-linear model:

Step 1: Differentiate, to maximize data log-likelihood

Step 2: Use online, stochastic gradient updates, for example i:

Step 3: Replace expectations with maxes (Viterbi approx.)

p(y|x) = ew·f(x,y)

Py0 ew·f(x,y0)

update =X

i

f(xi, yi)� Ep(y|xi)f(xi, y)

updatei = f(xi, yi)� Ep(y|xi)f(xi, y)

y⇤ = argmaxy

w · f(xi, y)whereupdatei = f(xi, yi)� f(xi, y⇤)

One Derivation of the Perceptron

Page 68: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

The Perceptron with Hidden Variables

Log-linear model:

Step 1: Differentiate marginal, to maximize data log-likelihood

Step 2: Use online, stochastic gradient updates, for example i:

Step 3: Replace expectations with maxes (Viterbi approx.)

where

p(y, h|x) = ew·f(x,h,y)

Py0,h0 ew·f(x,h0,y0)

update =X

i

Ep(h|yi,xi)[f(xi, h, yi)]� Ep(y,h|xi)[f(xi, h, y)]

updatei = f(xi, h0, yi)� f(xi, h

⇤, y⇤)

y⇤, h⇤ = argmaxy,h

w · f(xi, h, y) and h0 = argmaxh

w · f(xi, h, yi)

p(y|x) =X

h

p(y, h|x)

updatei = Ep(yi,h|xi)[f(xi, h, yi)]� Ep(y,h|xi)[f(xi, h, y)]

Page 69: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Hidden Variable Perceptron

[iterate epochs]!

[iterate examples]

[predict]!

[check]!

[predict hidden]!

[update]

Data: {(xi, yi) : i = 1 . . . n}

For t = 1 . . . T :

For i = 1 . . . n:y⇤, h⇤ arg maxy,hh✓,�(xi, h, y)iIf y⇤ 6= yi:

h0 arg maxhh✓,�(xi, h, yi)

✓ ✓ + �(xi, h0, yi)� �(xi, h

⇤, y⇤)

[Liang et al. 2006; Zettlemoyer and Collins 2007]

Page 70: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Contents

Lambda Calculus

Data and Problems

Semantic Parsing

Jeopardy!

Page 71: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional
Page 72: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

DeepQA

I Information Retrieval based factoid QA

I Combines search, rule-based extraction, and ranking

I Kitchen-sink type of approach

I (Nothing to do with deep learning)

Page 73: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

DeepQA (Ferrucci, 2010)

I Massive parallelism: Exploit massive parallelism in the

consideration of multiple interpretations and hypotheses.

I Many experts: Facilitate the integration, application, and

contextual evaluation of a wide range of loosely coupled

probabilistic question and content analytics.

I Pervasive confidence estimation: No component commits to an

answer; all components produce features and associated

confidences, scoring different question and content interpretations.

An underlying confidence-processing substrate learns how to stack

and combine the scores.

I Integrate shallow and deep knowledge: Balance the use of strict

semantics and shallow semantics, leveraging many loosely formed

ontologies

Page 74: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional
Page 75: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

This motivates an approach that merges answer scores before

ranking and confidence estimation. Using an ensemble of

matching, normalization, and coreference resolution

algorithms, Watson identifies equivalent and related

hypotheses (for example, Abraham Lincoln and Honest Abe)

and then enables custom merging per feature to combine

scores.

Page 76: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Given the kinds of questions and broad domain of the

Jeopardy Challenge, the sources for Watson include a wide

range of encyclopedias, dictionaries, thesauri, newswire

articles, literary works, and so on.

Page 77: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

A variety of search techniques are used, including the use of

multiple text search engines with different underlying

approaches (for example, Indri and Lucene), document search

as well as passage search, knowledge base search using

SPARQL on triple stores, the generation of multiple search

queries for a single question, and backfilling hit lists to satisfy

key constraints identified in the question.

Page 78: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Hypothesis Ranking

Page 79: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Other Aspects

I Parsing algorithm

I Inducing rules and parameters

I Relationship between syntax and semantics

Page 80: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

CCG Categories

ADJ : �x.fun(x)

• Basic building block!

• Capture syntactic and semantic information jointly

Page 81: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

CCG Categories

Syntax SemanticsADJ : �x.fun(x)

• Basic building block!

• Capture syntactic and semantic information jointly

Page 82: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

CCG Categories

• Primitive symbols: N, S, NP, ADJ and PP!

• Syntactic combination operator (/,\)!

• Slashes specify argument order and direction

Syntax ADJ : �x.fun(x)

NP : CCG

(S\NP )/ADJ : �f.�x.f(x)

Page 83: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

Semantics

CCG Categories

• λ-calculus expression!

• Syntactic type maps to semantic type

ADJ : �x.fun(x)

NP : CCG

(S\NP )/ADJ : �f.�x.f(x)

Page 84: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

CCG Lexical Entries

fun ` ADJ : �x.fun(x)

• Pair words and phrases with meaning!

• Meaning captured by a CCG category

Page 85: CS 287 (Slides from Yoav Artzi, Cornell NY)Semantics Branch of linguistic focused on meaning I Lexical Semantics I Meaning of individual words I e.g. BoW models, word2vec I Compositional

CCG CategoryNatural!Language

CCG Lexical Entries

fun ` ADJ : �x.fun(x)

• Pair words and phrases with meaning!

• Meaning captured by a CCG category