32
Outline Introduction review methods results discussion conclusion Time flies like an arrow and fruit flies like a banana Parsing multiword constructions with DepVis Seongmin Mun 12 Ilaine Wang 2 Guillaume Desagulier 3 Gyeongcheol Choi 1 Kyungwon Lee 1 1 Ajou University, Suwon, South Korea 2 MoDyCo (UMR 7114), CNRS, Paris Nanterre 3 MoDyCo (UMR 7114), Paris 8, CNRS, Paris Nanterre & Institut Universitaire de France ICCG 10 Université Sorbonne Nouvelle – Paris 3 July 17 th , 2018 Time flies like an arrow... Mun, Wang & Desagulier

Timeflieslikeanarrowandfruitflieslikeabanana Parsing

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Outline Introduction review methods results discussion conclusion

Time flies like an arrow and fruit flies like a bananaParsing multiword constructions with DepVis

Seongmin Mun 12 Ilaine Wang 2 Guillaume Desagulier 3

Gyeongcheol Choi 1 Kyungwon Lee 1

1Ajou University, Suwon, South Korea

2MoDyCo (UMR 7114), CNRS, Paris Nanterre

3MoDyCo (UMR 7114), Paris 8, CNRS, Paris Nanterre & Institut Universitaire de France

ICCG 10Université Sorbonne Nouvelle – Paris 3

July 17th, 2018

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

outline

1 Introduction

2 review

3 methods

4 results

5 discussion

6 conclusion

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

multiword expressions (MWEs)

minimal working definition• a string of 2+ lexemes• idiomatic in some respect

MWEs are frequent

reference share of MWEs corpusSag et al. (2002) 41% WordNet 1.7Graça Krieger and Finatto (2004) 70% specialized corpusRamisch (2009) 50%-80% scientific biomedical abstracts

Ramisch et al. (2013) 51.4% (nouns) English WordNet25.5% (verbs)

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

multiword expressions (MWEs)A vast inventory Sag et al. (2002)’s pain-in-the-neck typology

institutionalized phrases and clichés

(1) love conquers all

idioms

(2) sweep under the rug

fixed phrases

(3) by and large

compounds

(4) frequent-flyer program

verb-particle constructions

(5) eat/look/write up

light verbs

(6) a. have a drink/?an eatb. make/*do a mistake

named entities

(7) Oakland A’s, Oakland, theA’s

lexical collocations

(8) a. telephonebox/booth/*cabin

b. emotionalbaggage/*luggage

etc.

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

multiword expressions (MWEs)NLP vs linguistics

As a term, ’MWE’ has a strong NLP flavor since Sag et al. (2002) andunder the influence of the the Stanford MWE project(http://mwe.stanford.edu/)

• NLP → capture, recognition, and comprehension• task-oriented (computer-mediated lexicography, OCR, morphological

and syntactic tagging, information retrieval, computer-mediatedtranslation, L2 teaching, word-sense disambiguation, etc.)

• linguistics → acquisition and usage• linguistic competence & performance

Looks like we don’t have the same priorities

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWEs in linguisticscorpus linguistics

working definitions for MWEsFirth (1957)

• you shall know a word by the company it keeps• mutual expectations

collocations & phraseological units(Sinclair 1991)

• the open-choice principle• the idiom principle

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWEs in linguisticsMeaning-Text Theory (Mel’čuk)

3 kinds of entries in the Explanatory Combinatorial Dictionary (Mel’čukand Polguere 1987; Mel’čuk and Polguère 1995)

• full phrasemes (≈ non-compositional collocations)e.g. long time no see

• semi-phrasemes (partially compositional collocations)e.g. Magn(rain) = {heavy, torrential}

• quasi-phrasemes (semantically-constrained, partially-compositionalcollocations)e.g. bus stop is the place where the bus stops and passengers get in,not the moment when it stops (at a traffic light)

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWEs in linguisticsgenerative linguistics & CxG

• the “rules vs. the lexicon” debate (Langacker 1987; Pinker 1999;Pinker and Prince 1988; Rumelhart and McClelland 1986)

Because rules capture all the regularities in language, MWEs shouldhave no place in the grammar proper because they are lexical.Because the lexicon consists of words or morphemes, it does notinclude MWEs because they are phrasal

• Fillmore et al. (1988, p. 504)the descriptive linguist needs to append to this maximally generalmachinery [. . . ] knowledge that will account for speakers’ ability toconstruct and understand phrases and expressions in their languagewhich are not covered by the grammar, the lexicon and the principlesof compositional semantics, as these are familiarly conceived. Such alist of exceptional phenomena contains things which are larger thanwords, which are like words in that they have to be learned separatelyas individual whole facts about pieces of the language, [. . . ]

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWEs in linguisticsgenerative linguistics & CxG

3 classes (Fillmore et al. 1988)• unfamiliar pieces unfamiliarly combinede.g. the X-er, the Y-er

• familiar pieces unfamiliarly combinede.g. all of a sudden

• familiar pieces familiarly combinede.g. once every blue moon

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWEs in linguisticsgenerative linguistics & CxG

• “phrasal lexical items” (i.e. “lexical items larger than X 0”) should bepart of the lexicon (Jackendoff 1997, chapter 7)

• MWEs are part of the ‘constructicon’ (Goldberg 2006, p. 64)

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWEs in linguisticslanguage acquisition

computational simulations of acquisition models• Joyce and Srdanović (2008)• Rapp (2008)

studies on specific MWEs• verb-particle constructions (Villavicencio et al. 2012)• nominal compounds (Devereux and Costello 2012)• light-verb constructions (Nematzadeh et al. 2013)• multiword terms (Lavagnino and Park 2010)

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWEs in NLPsome landmark studies

• Choueka (1988): collocation extraction based on n-gram statistics• Smadja (1993): Xtract, a tool for collocation extraction based onsimple POS filters + µ and σ of word distance

• Dagan and Ken Church (1994): Termight, a semi-automatic tool tohelp translators and terminologists identify technical terms and theirtranslations (candidate terms selected with a variant of PMI, asfound in Kenneth Church and Hanks 1990)

• Lin (1998a,b) use of syntactic dependencies to extract candidatecollocations restricted to a specific category

• the Stanford MWE project (early 2000s), (see Sag et al. 2002)• Ramisch (2014)

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWEs in NLPscientific events

MWEs are featured at major conferences• COLING• ACL (SIGLEX, EACL, NAACL)• LREC

and courses/tutorials• ACL• SemEval• PARSEME

but again, task-oriented

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWEs in GxCthe constructicon

Tenet 7. The totality of ourknowledge of language is capturedby a network of constructions a‘construct-i-con’. (Goldberg 2003)

Constr 1

Constr 2

Constr 3

Constr 4

Constr 5

Constr 6

Constr 7

Constr 8

Constr 9

Constr 10

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWEs in GxCthe constructicon

An undirected graph based oncorpus data (after Bresnan et al.2007) and the languageR dataset

accord

afford

allocate

allot

allow

assess

assign

assure

award

bequeath

bet

bring

carry

cause

cede

charge

cost

deal

deliver

deny

do

extend

feed

fine

flipfloat

funnel

get

givegrant

guarantee

hand

hand_over

issue

lease

leave

lend

loan

mail

make

net

offer

owe

pay

pay_backpermit

prepay

present

promise

quote

read

refuse

reimburse

repay

resell

run

sell

sell_back

sell_off

send

serve

show

slipsubmitsupply

swap

take

teachtell

tender

trade

vote

will

wish

write

NP

PP

the dative alternation

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWEs in GxCthe constructicon

A directed graph based on corpusdata and association measures(Desagulier 2015)

air

angel

angry

anthracite

arrow

baby_s_skin

bald

bat

beautiful

bell

berry

bigbird

birds

bitter

black

blade

bleak

blind

blood

blueboard

bold

bone

bowstring

brass

brightbrittle

brown

brush

bug

butter

buttoncat

chalk

chalk_and_cheese

child

church

clear

clock

clockwork

coal

cold

cool

coot

cream

crystal

cucumber

daft

daisies

daisy

dark

day

day_is_long

dead

deathdie

different

dodo

dog

doornail

drunk

dry

dust

easy

eel

eggs

empty

excited

feather

fiddle

fit

flash

flat

folk

football

free

fresh

gall

gauche

gentle

ghost

glass

gold

good

granite

grave

guilty

happy

hard

hatter

heather

heavy

hell

high

hills

history

holly

honest

horse

hot

house

houses ice

innocent

iron

jealous

jumpy

keen

kite

kitten

lamb

large

lead

life

light

lion_s_heart

lord

love

mad

man

marble

march_hare

meek

mice

midnight

miserable

money_in_the_bank

monkey

muck

mud

mule

mustard

mutton

nails

nervous

nice

night

night_follows_day

nose

nut

old

ox

pale

pancake

paper

parchment

parrot

picture

pie

pig

pig_in_mud

pikestaff

pitch

plain

pleased

poker

poor

pretty

punch

putty

queer

quick

quiet

rain

rake

ramrod

razor

red

reed

regular

rich

right

rock

rusty

safe

sandboy

schoolgirl

sharp

sheet

shit

shite

shy

sick

silent

silk

sin

sky

slim

slippery

smoothsmug

snow

snug

soft

solid

spring

steady

steel

stiff

stonestraight

strong

stubborn

sugar

sun

sure

sweet

syrup

tack

tall

taut

thick

thieves

thin

things

thistledown

thunder

toast

tough

ugly

uncomfortable

velvet

warm

water

weak

white

wind

wool

world

yellow

A as NP

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWEs in GxCthe constructicon

our long-term objective• simulate the construction network from a very large corpus

how• collect a large database of existing MW-constructions• train an algorithm so that it can detect them• vectorize the constructions by means of artificial neural networks• plot the results by means of a graph• further train the algorithm so that it can detect new constructions

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWEs in GxCthe constructicon

problems• ambiguity• polysemy• homonymy• long-distance dependencies• etc.

Most, if not all the issues listed in Sag et al. (2002) are still unresolvedtoday

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

When things go surprisingly wrongAI

displaCy (https://demos.explosion.ai/displacy/)

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

previous workmwetoolkit (Ramisch 2014)

Framework for MWE extraction with mwetoolkit

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

data processingoverview

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

data processingstep 1

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

data processingstep 2

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

data processingstep 3

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

data processingstep 4

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

data processingstep 4

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

data processingstep 4

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

data processingstep 5

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

data processingstep 6

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

MWCs parser

Video link (https://www.youtube.com/embed/meTWC5Nk9F4)

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

an ambiguous sentence

DepVis link (http://stat34.github.io/DepVis/)

Time flies like an arrow... Mun, Wang & Desagulier

Outline Introduction review methods results discussion conclusion

conclusion

• results show that our parsing algorithm recognize MWCs accurately,including in ambiguous sentences.

• storing more sentences will improve the speed of the algorithm.• storing more MWEs will allow the algorithm to recognize moreMWEs.

Time flies like an arrow... Mun, Wang & Desagulier