Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Outline Introduction review methods results discussion conclusion
Time flies like an arrow and fruit flies like a bananaParsing multiword constructions with DepVis
Seongmin Mun 12 Ilaine Wang 2 Guillaume Desagulier 3
Gyeongcheol Choi 1 Kyungwon Lee 1
1Ajou University, Suwon, South Korea
2MoDyCo (UMR 7114), CNRS, Paris Nanterre
3MoDyCo (UMR 7114), Paris 8, CNRS, Paris Nanterre & Institut Universitaire de France
ICCG 10Université Sorbonne Nouvelle – Paris 3
July 17th, 2018
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
outline
1 Introduction
2 review
3 methods
4 results
5 discussion
6 conclusion
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
multiword expressions (MWEs)
minimal working definition• a string of 2+ lexemes• idiomatic in some respect
MWEs are frequent
reference share of MWEs corpusSag et al. (2002) 41% WordNet 1.7Graça Krieger and Finatto (2004) 70% specialized corpusRamisch (2009) 50%-80% scientific biomedical abstracts
Ramisch et al. (2013) 51.4% (nouns) English WordNet25.5% (verbs)
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
multiword expressions (MWEs)A vast inventory Sag et al. (2002)’s pain-in-the-neck typology
institutionalized phrases and clichés
(1) love conquers all
idioms
(2) sweep under the rug
fixed phrases
(3) by and large
compounds
(4) frequent-flyer program
verb-particle constructions
(5) eat/look/write up
light verbs
(6) a. have a drink/?an eatb. make/*do a mistake
named entities
(7) Oakland A’s, Oakland, theA’s
lexical collocations
(8) a. telephonebox/booth/*cabin
b. emotionalbaggage/*luggage
etc.
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
multiword expressions (MWEs)NLP vs linguistics
As a term, ’MWE’ has a strong NLP flavor since Sag et al. (2002) andunder the influence of the the Stanford MWE project(http://mwe.stanford.edu/)
• NLP → capture, recognition, and comprehension• task-oriented (computer-mediated lexicography, OCR, morphological
and syntactic tagging, information retrieval, computer-mediatedtranslation, L2 teaching, word-sense disambiguation, etc.)
• linguistics → acquisition and usage• linguistic competence & performance
Looks like we don’t have the same priorities
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWEs in linguisticscorpus linguistics
working definitions for MWEsFirth (1957)
• you shall know a word by the company it keeps• mutual expectations
collocations & phraseological units(Sinclair 1991)
• the open-choice principle• the idiom principle
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWEs in linguisticsMeaning-Text Theory (Mel’čuk)
3 kinds of entries in the Explanatory Combinatorial Dictionary (Mel’čukand Polguere 1987; Mel’čuk and Polguère 1995)
• full phrasemes (≈ non-compositional collocations)e.g. long time no see
• semi-phrasemes (partially compositional collocations)e.g. Magn(rain) = {heavy, torrential}
• quasi-phrasemes (semantically-constrained, partially-compositionalcollocations)e.g. bus stop is the place where the bus stops and passengers get in,not the moment when it stops (at a traffic light)
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWEs in linguisticsgenerative linguistics & CxG
• the “rules vs. the lexicon” debate (Langacker 1987; Pinker 1999;Pinker and Prince 1988; Rumelhart and McClelland 1986)
Because rules capture all the regularities in language, MWEs shouldhave no place in the grammar proper because they are lexical.Because the lexicon consists of words or morphemes, it does notinclude MWEs because they are phrasal
• Fillmore et al. (1988, p. 504)the descriptive linguist needs to append to this maximally generalmachinery [. . . ] knowledge that will account for speakers’ ability toconstruct and understand phrases and expressions in their languagewhich are not covered by the grammar, the lexicon and the principlesof compositional semantics, as these are familiarly conceived. Such alist of exceptional phenomena contains things which are larger thanwords, which are like words in that they have to be learned separatelyas individual whole facts about pieces of the language, [. . . ]
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWEs in linguisticsgenerative linguistics & CxG
3 classes (Fillmore et al. 1988)• unfamiliar pieces unfamiliarly combinede.g. the X-er, the Y-er
• familiar pieces unfamiliarly combinede.g. all of a sudden
• familiar pieces familiarly combinede.g. once every blue moon
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWEs in linguisticsgenerative linguistics & CxG
• “phrasal lexical items” (i.e. “lexical items larger than X 0”) should bepart of the lexicon (Jackendoff 1997, chapter 7)
• MWEs are part of the ‘constructicon’ (Goldberg 2006, p. 64)
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWEs in linguisticslanguage acquisition
computational simulations of acquisition models• Joyce and Srdanović (2008)• Rapp (2008)
studies on specific MWEs• verb-particle constructions (Villavicencio et al. 2012)• nominal compounds (Devereux and Costello 2012)• light-verb constructions (Nematzadeh et al. 2013)• multiword terms (Lavagnino and Park 2010)
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWEs in NLPsome landmark studies
• Choueka (1988): collocation extraction based on n-gram statistics• Smadja (1993): Xtract, a tool for collocation extraction based onsimple POS filters + µ and σ of word distance
• Dagan and Ken Church (1994): Termight, a semi-automatic tool tohelp translators and terminologists identify technical terms and theirtranslations (candidate terms selected with a variant of PMI, asfound in Kenneth Church and Hanks 1990)
• Lin (1998a,b) use of syntactic dependencies to extract candidatecollocations restricted to a specific category
• the Stanford MWE project (early 2000s), (see Sag et al. 2002)• Ramisch (2014)
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWEs in NLPscientific events
MWEs are featured at major conferences• COLING• ACL (SIGLEX, EACL, NAACL)• LREC
and courses/tutorials• ACL• SemEval• PARSEME
but again, task-oriented
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWEs in GxCthe constructicon
Tenet 7. The totality of ourknowledge of language is capturedby a network of constructions a‘construct-i-con’. (Goldberg 2003)
Constr 1
Constr 2
Constr 3
Constr 4
Constr 5
Constr 6
Constr 7
Constr 8
Constr 9
Constr 10
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWEs in GxCthe constructicon
An undirected graph based oncorpus data (after Bresnan et al.2007) and the languageR dataset
accord
afford
allocate
allot
allow
assess
assign
assure
award
bequeath
bet
bring
carry
cause
cede
charge
cost
deal
deliver
deny
do
extend
feed
fine
flipfloat
funnel
get
givegrant
guarantee
hand
hand_over
issue
lease
leave
lend
loan
make
net
offer
owe
pay
pay_backpermit
prepay
present
promise
quote
read
refuse
reimburse
repay
resell
run
sell
sell_back
sell_off
send
serve
show
slipsubmitsupply
swap
take
teachtell
tender
trade
vote
will
wish
write
NP
PP
the dative alternation
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWEs in GxCthe constructicon
A directed graph based on corpusdata and association measures(Desagulier 2015)
air
angel
angry
anthracite
arrow
baby_s_skin
bald
bat
beautiful
bell
berry
bigbird
birds
bitter
black
blade
bleak
blind
blood
blueboard
bold
bone
bowstring
brass
brightbrittle
brown
brush
bug
butter
buttoncat
chalk
chalk_and_cheese
child
church
clear
clock
clockwork
coal
cold
cool
coot
cream
crystal
cucumber
daft
daisies
daisy
dark
day
day_is_long
dead
deathdie
different
dodo
dog
doornail
drunk
dry
dust
easy
eel
eggs
empty
excited
feather
fiddle
fit
flash
flat
folk
football
free
fresh
gall
gauche
gentle
ghost
glass
gold
good
granite
grave
guilty
happy
hard
hatter
heather
heavy
hell
high
hills
history
holly
honest
horse
hot
house
houses ice
innocent
iron
jealous
jumpy
keen
kite
kitten
lamb
large
lead
life
light
lion_s_heart
lord
love
mad
man
marble
march_hare
meek
mice
midnight
miserable
money_in_the_bank
monkey
muck
mud
mule
mustard
mutton
nails
nervous
nice
night
night_follows_day
nose
nut
old
ox
pale
pancake
paper
parchment
parrot
picture
pie
pig
pig_in_mud
pikestaff
pitch
plain
pleased
poker
poor
pretty
punch
putty
queer
quick
quiet
rain
rake
ramrod
razor
red
reed
regular
rich
right
rock
rusty
safe
sandboy
schoolgirl
sharp
sheet
shit
shite
shy
sick
silent
silk
sin
sky
slim
slippery
smoothsmug
snow
snug
soft
solid
spring
steady
steel
stiff
stonestraight
strong
stubborn
sugar
sun
sure
sweet
syrup
tack
tall
taut
thick
thieves
thin
things
thistledown
thunder
toast
tough
ugly
uncomfortable
velvet
warm
water
weak
white
wind
wool
world
yellow
A as NP
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWEs in GxCthe constructicon
our long-term objective• simulate the construction network from a very large corpus
how• collect a large database of existing MW-constructions• train an algorithm so that it can detect them• vectorize the constructions by means of artificial neural networks• plot the results by means of a graph• further train the algorithm so that it can detect new constructions
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWEs in GxCthe constructicon
problems• ambiguity• polysemy• homonymy• long-distance dependencies• etc.
Most, if not all the issues listed in Sag et al. (2002) are still unresolvedtoday
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
When things go surprisingly wrongAI
displaCy (https://demos.explosion.ai/displacy/)
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
previous workmwetoolkit (Ramisch 2014)
Framework for MWE extraction with mwetoolkit
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
data processingoverview
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
data processingstep 1
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
data processingstep 2
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
data processingstep 3
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
data processingstep 4
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
data processingstep 4
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
data processingstep 4
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
data processingstep 5
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
data processingstep 6
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
MWCs parser
Video link (https://www.youtube.com/embed/meTWC5Nk9F4)
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
an ambiguous sentence
DepVis link (http://stat34.github.io/DepVis/)
Time flies like an arrow... Mun, Wang & Desagulier
Outline Introduction review methods results discussion conclusion
conclusion
• results show that our parsing algorithm recognize MWCs accurately,including in ambiguous sentences.
• storing more sentences will improve the speed of the algorithm.• storing more MWEs will allow the algorithm to recognize moreMWEs.
Time flies like an arrow... Mun, Wang & Desagulier