96
ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

Embed Size (px)

Citation preview

Page 1: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

ParaMor&

Morpho Challenge 2008Christian Monson

Jaime Carbonell, Alon Lavie, Lori Levin

Page 2: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

2

You are not being taken

Turkish Morphology – Beads on a String

götür ül m sunüyor

take passive negativepresent

progressive2nd person singular

One Turkish Word

Page 3: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

3

Computational Morphology Improves:

Machine TranslationTurkish-English (Oflazer, 2007)

Czech-English (Goldwater and McClosky, 2005)

Information RetrievalEnglish, German, Finnish (Kurimo et al., 2008)

Speech RecognitionFinnish (Creutz, 2006)

Grapheme-to-Phoneme ConversionGerman (Demberg, 2007)

Page 4: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

4

Morphology is Complex – Operations

Prefixation

Suffixation

Page 5: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

5

Morphology is Complex – Operations

Prefixation

Reduplication

Suffixation

Page 6: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

6

Morphology is Complex – Operations

Prefixation

Reduplication

Infixation

Suffixation

Page 7: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

7

Morphology is Complex – Operations

Prefixation

Reduplication

Infixation

Suffixation

Page 8: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

8

Morphology is Complex – Operations

Prefixation

Reduplication

Infixation

Suffixation

Page 9: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

9

götür ül m sunüyor

take passive negativepresent

progressive

You are not being taken

2nd person singular

Morphology is Complex – Morphophonology

Page 10: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

10

sunyecek

future2nd person singular

Morphology is Complex – Morphophonology

götür ül m

take passive negative

You will not be taken

Page 11: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

11

sunyecek

future2nd person singular

Morphology is Complex – Morphophonology

götür ül m

take passive negative

You will not be taken

Page 12: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

12

sunyecek

future2nd person singular

Morphology is Complex – Morphophonology

götür ül me

take passive negative

You will not be taken

Page 13: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

13

sinyecek

future2nd person singular

Morphology is Complex – Morphophonology

götür ül me

take passive negative

You will not be taken

Page 14: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

14

sinyecek

future2nd person singular

Morphology is Complex – Morphophonology

götür ül me

take passive negative

You will not be taken

Page 15: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

15

Morphology is Complex – Ambiguity

Hungarianmentek

men +tekgo +Present.2nd.Plural‘yinz go’

Page 16: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

16

Morphology is Complex – Ambiguity

Hungarianmentek

men +tekgo +Present.2nd.Plural‘yinz go’

men +t +ekgo +PastParticiple

+Plural‘those who have gone’

Page 17: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

17

In Morphology Systems for New Languages

Complexity Time + Expertise

Page 18: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

18

In Morphology Systems for New Languages

Complexity Time + Expertise

Kemal OflazerExpert on

Turkish

Computational morphology

Time3 - 4 Months to manually build a basic Turkish analyzer

Plus lexicon development and maintenance

Page 19: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

19

The SolutionRaw Text

Unsupervised Morphology

Induction

Page 20: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

20

The SolutionRaw Text

?

Page 21: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

21

The SolutionRaw Text

Language Structure

Page 22: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

22

Techniques for Unsupervised Morphology Induction

Transition Likelihood

Harris (1955) – Finite State Automata

Bernhard (2007)

Page 23: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

23

Transition Likelihood

Harris (1955) – Finite State Automata

Bernhard (2007)

Minimum Description LengthGoldsmith (2001, 2006)

Creutz’s Morfessor (2006)

Techniques for Unsupervised Morphology Induction

Page 24: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

24

Contextual Similarity

Wicentowski (2002)

Schone (2002)

Techniques for Unsupervised Morphology Induction

Page 25: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

25

Contextual Similarity

Wicentowski (2002)

Schone (2002)

The ParadigmSnover (2002)

ParaMor (2007)

Techniques for Unsupervised Morphology Induction

Page 26: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

26

What is a Paradigm?

ül m sunüyor

take passive negativepresent

progressive2nd person singular

götür

Page 27: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

27

ül m sunüyor

take passive negativepresent

progressive2nd person singular

götür

Person & Number

Paradigms Structure Inflectional Morphology

Page 28: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

28

um

Person & Number

1st person singular

umül m üyor

take passive negativepresent

progressive

götür

Paradigms Structure Inflectional Morphology

Page 29: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

29

um

Person & Number

3rd person singular

umØ

ül m üyor

take passive negativepresent

progressive

götür

Paradigms Structure Inflectional Morphology

Page 30: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

30

umumØuz

ül m üyor

take passive negativepresent

progressive

götür

Person & Number

Paradigms Structure Inflectional Morphology

Page 31: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

31

umumØuz

ül m üyor

take passive negativepresent

progressive

götür

ParadigmMutually substitutable morphological operations

Paradigm

Paradigms Structure Inflectional Morphology

Page 32: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

32

ül m um

Voice PolarityTense & Aspect

Person & Number

umØuz

üyoryecek

Paradigms Structure Inflectional Morphology

Page 33: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

33

Paradigms

ParadigmMutually substitutable morphological operations

ül m umumØuz

üyoryecek

Paradigms Structure Inflectional Morphology

Page 34: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

34

Paradigm

ül m umumØuz

üyoryecek

ParadigmMutually substitutable strings

The ParaMor Algorithm

Page 35: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

35

Paradigm

ül m umumØuz

üyoryecek

Candidate Stems

1 Morpheme Boundary

The ParaMor Algorithm

Page 36: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

36

The ParaMor Algorithm

Simplifying Assumptions

Suffixes only70% of the World’s Languages are Suffixing (Dryer, 2005)

Strict Concatenation

Page 37: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

37

The ParaMor Algorithm

Simplifying Assumptions

Suffixes only70% of the World’s Languages are Suffixing (Dryer, 2005)

Strict Concatenation

Only a High-Level Overview

Page 38: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

38

The ParaMor Algorithm

Identify Paradigms in 3 Steps

ParaMorIdentify

Page 39: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

39

The ParaMor Algorithm

Identify Paradigms in 3 Steps1. Search for candidate paradigms

ParaMorIdentify

Search

Page 40: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

40

The ParaMor Algorithm

Identify Paradigms in 3 Steps1. Search for candidate paradigms

2. Cluster candidates modeling the same paradigm

ParaMorIdentify

SearchCluster

Page 41: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

41

The ParaMor Algorithm

Identify Paradigms in 3 Steps1. Search for candidate paradigms

2. Cluster candidates modeling the same paradigm

3. Filter least likely candidates

ParaMorIdentify

SearchClusterFilter

Page 42: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

42

The ParaMor Algorithm

Identify Paradigms in 3 Steps1. Search for candidate paradigms

2. Cluster candidates modeling the same paradigm

3. Filter least likely candidates

Segment Words Using the discovered paradigms

ParaMorIdentify

SearchClusterFilter

Segment

Page 43: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

43

The ParaMor Algorithm

Identify Paradigms in 3 Steps1. Search for candidate paradigms

2. Cluster candidates modeling the same paradigm

3. Filter

Segment Words Using the discovered paradigms

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Today

Page 44: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

44

The ParaMor Algorithm

Identify Paradigms in 3 Steps1. Search for candidate paradigms

2. Cluster candidates modeling the same paradigm

3. Filter

Segment Words Using the discovered paradigms

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 45: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

45

s10697

autorizacionesbuscabamos

costasimportadoras

vallas…

Search for Candidate Paradigms

Propose a morpheme boundary at every character boundary in every word

Consolidate identical candidate suffixes into paradigm seeds

Word List50,000 Types

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Spanish Example

Page 46: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

46

s10697

autorizacionesbuscabamos

costaØ costasimportadoraØ importadoras

vallaØ vallas…

Ø s5513

Identify the most frequent mutually replaceable candidate suffix

Stems that occur with one suffix in a paradigm will likely occur with other suffixes in that paradigm

Search for Candidate ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

Spanish Example

Page 47: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

47

s10697

A Parameter halts the introduction of suffixes When the most frequent

mutually replaceable candidate suffix severely decreases the stem count

Ø s5513

Ø r s

281autorizaciones

buscabamos costar costaØ

costasimportadoraØ importadoras

vallaØ vallas…

Search for Candidate ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

Page 48: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

48

s10697

Ø s5513

Ø r s

281autorizaciones

buscabamos costar costaØ

costasimportadoraØ importadoras

vallaØ vallas…

Search for Candidate ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

Parameters set to produce High-recall

Spanish paradigms

And then frozen

Page 49: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

49

Move on to the next most frequent paradigm seed

a9020

s10697

Ø s5513

Ø r s

281

Search for Candidate ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

Page 50: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

50

a9020

a o2325

a o os

1418

a as o os899

s10697

Ø s5513

Ø r s

281

Search for Candidate ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

Page 51: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

51

n6039

Ø n1863

Ø n r

512

Ø do n r357

Ø da das do dos n ndo r ron

115

a9020

a o2325

a o os

1418

a as o os899

s10697

Ø s5513

Ø r s

281

Search for Candidate ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

Page 52: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

52

es2750

Ø es845

n6039

Ø n1863

Ø n r

512

Ø do n r357

Ø da das do dos n ndo r ron

115

a9020

a o2325

a o os

1418

a as o os899

s10697

Ø s5513

Ø r s

281

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

Page 53: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

53

an1784

a an1045

a an ar

417

a an ar ó355

a ada adas ado ados an

ar aron ó148

es2750

Ø es845

n6039

Ø n1863

Ø n r

512

Ø do n r357

Ø da das do dos n ndo r ron

115

a9020

a o2325

a o os

1418

a as o os899

s10697

Ø s5513

Ø r s

281

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

Page 54: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

54

strado15

rado167

rada radas rado rados

53

rada radorados

67

rada rado89

ra rada radasrado rados ran

rar raron ró23

strada strado12

strada strado stró

9

strada strado strar stró

8

strada stradas strado strar stró

7

...an

1784

a an1045

a an ar

417

a an ar ó355

a ada adas ado ados an

ar aron ó148

es2750

Ø es845

n6039

Ø n1863

Ø n r

512

Ø do n r357

Ø da das do dos n ndo r ron

115

a9020

a o2325

a o os

1418

a as o os899

s10697

Ø s5513

Ø r s

281

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

Page 55: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

55

strado15

rado167

rada rado89

strada strado12

...an

1784

a an1045

es2750

Ø es845

n6039

Ø n1863

a9020

a o2325

s10697

Ø s5513

Ø r s

281

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

Size of Search Space

Huge: 2|candidate suffixes|

Most candidate suffixes have no common stems

Still Exponential

Greedily searched space: O(|candidate suffixes|)

This example is just 0.1% of the searched space

Page 56: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

56

Step 2: Clustering

Identify Paradigms in 3 Steps1. Search for candidate paradigms

2. Cluster candidates modeling the same paradigm

3. Filter

Segment Words Using the discovered paradigms

Bottom-up Agglomerative Clustering

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 57: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

57

Step 3: Filtering

Identify Paradigms in 3 Steps1. Search for candidate paradigms

2. Cluster candidates modeling the same paradigm

3. Filter least likely candidates

Segment Words Using the discovered paradigms

Adapted from Harris (1955) and Goldsmith (2006)

Improved over 2007 Challenge

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 58: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

58

A Few of the 42 Final Paradigms4 SuffixesØ menente mente s

11 Suffixes a amente as illa illas o or ora oras ores os

41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó

29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían

20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían

29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco

6 SuffixesØ es idad idades mente ísima

Page 59: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

59

4 SuffixesØ menente mente s

11 Suffixes a amente as illa illas o or ora oras ores os

41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó

29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían

20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían

29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco

6 SuffixesØ es idad idades mente ísima

A Few of the 42 Final Paradigms

Number on Nouns

Page 60: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

60

A Few of the 42 Final Paradigms4 SuffixesØ menente mente s

11 Suffixes a amente as illa illas o or ora oras ores os

41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó

29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían

20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían

29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco

6 SuffixesØ es idad idades mente ísima

Number & Gender on Adjectives

Page 61: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

61

A Few of the 42 Final Paradigms4 SuffixesØ menente mente s

11 Suffixes a amente as illa illas o or ora oras ores os

41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó

29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían

20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían

29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco

6 SuffixesØ es idad idades mente ísima

Verbal Suffixes

Page 62: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

62

The ParaMor Algorithm

Identify Paradigms in 3 Steps1. Search for candidate paradigms

2. Cluster candidates modeling the same paradigm

3. Filter

Segment Words Using the discovered paradigms

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Improved over 2007 Challenge

Page 63: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

63

Segment Words Using the Paradigms4 SuffixesØ menente mente s

11 Suffixes a amente as illa illas o or ora oras ores os

41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó

29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían

20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían

29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco

6 SuffixesØ es idad idades mente ísima

administradas‘Feminine gender nouns under administration’

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 64: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

64

Segment Words Using the Paradigms4 SuffixesØ menente mente s

11 Suffixes a amente as illa illas o or ora oras ores os

41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó

29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían

20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían

29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco

6 SuffixesØ es idad idades mente ísima

administr + ad + a + s

Past Participle

FemininePlural

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 65: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

65

4 SuffixesØ menente mente s

11 Suffixes a amente as illa illas o or ora oras ores os

41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó

29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían

20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían

29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco

6 SuffixesØ es idad idades mente ísima

administradas

Segment Words Using the ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

Page 66: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

66

4 SuffixesØ menente mente s

11 Suffixes a amente as illa illas o or ora oras ores os

41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó

29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían

20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían

29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco

6 SuffixesØ es idad idades mente ísima

administradas administrada

Also in corpus

Segment Words Using the ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

Page 67: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

67

4 SuffixesØ menente mente s

11 Suffixes a amente as illa illas o or ora oras ores os

41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó

29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían

20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían

29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco

6 SuffixesØ es idad idades mente ísima

administradas administrada

Segment Words Using the ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

Morpheme Boundary

Page 68: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

68

4 SuffixesØ menente mente s

11 Suffixes a amente as illa illas o or ora oras ores os

41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó

29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían

20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían

29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco

6 SuffixesØ es idad idades mente ísima

administradas administrada

Segment Words Using the ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

Morpheme Boundary

Page 69: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

69

4 SuffixesØ menente mente s

11 Suffixes a amente as illa illas o or ora oras ores os

41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó

29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían

20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían

29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco

6 SuffixesØ es idad idades mente ísima

administradas administradaØ

Segment Words Using the ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

Morpheme Boundary

Page 70: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

70

Segment Words Using the Paradigms4 SuffixesØ menente mente s

11 Suffixes a amente as illa illas o or ora oras ores os

41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó

29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían

20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían

29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco

6 SuffixesØ es idad idades mente ísima

administr + ad + a + s

Recovers multiple morpheme boundaries from candidate paradigms which each propose single morpheme boundaries

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 71: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

71

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

52.8

Par

aMor

Page 72: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

72

MorfessorBaseline system for Challenge

Freely available

Minimum Description Length

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

47.2

52.8

Mor

fess

or

Par

aMor

Page 73: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

73

MorfessorBaseline system for Challenge

Freely available

Minimum Description Length

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

47.2

52.8

Mor

fess

or

Par

aMor

Page 74: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

74

Join ParaMor and MorfessorFor each word, submit 2 analyses:

a ParaMor analysis and

a Morfessor analysis

The EffectOracle Recall

Averaged Precision

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

47.2

52.8

Mor

fess

or

Par

aMor

Page 75: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

75

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

47.2

52.8

56.3

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

Page 76: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

76

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

60.8

47.2

52.8

56.3

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Page 77: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

77

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

60.8

52.9

47.2 47.8

52.8

44.5

56.3

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Page 78: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

78

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

60.8

52.9

47.2 47.8

52.8

44.5

56.354.1

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Page 79: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

79

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

60.8

52.9

48.247.2 47.8

40.6

52.8

44.5

39.5

56.354.1

Mor

fess

or

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

48.5

Page 80: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

80

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

60.8

52.9

48.2

24.7

47.2 47.8

40.6

52.8

44.5

39.5

56.354.1

Mor

fess

or

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

48.5

Page 81: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

81

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

60.8

52.9

48.2

24.7

47.2 47.8

40.6

37.1

52.8

44.5

39.5

56.354.1

Mor

fess

or

Mor

fess

or

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

48.5

Page 82: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

82

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

60.8

52.9

48.2

24.7

47.2 47.8

40.6

37.1

52.8

44.5

39.5

46.5

56.354.1

Mor

fess

or

Mor

fess

or

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

Par

aMor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

48.5

Page 83: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

83

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

60.8

52.9

48.2

24.7

47.2 47.8

40.6

37.1

52.8

44.5

39.5

46.5

56.354.1

52.0

Mor

fess

or

Mor

fess

or

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

Par

aMor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

48.5

Page 84: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

84

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

60.8

52.9

48.2

24.7

21.9

47.2 47.8

40.6

37.1

34.0

52.8

44.5

39.5

46.5

15.4

56.354.1

52.0

Mor

fess

or

Mor

fess

or

Mor

fess

or

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

Par

aMor

P.

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Zem

an

48.5

Page 85: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

85

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

60.8

52.9

48.2

24.7

21.9

47.2 47.8

40.6

37.1

34.0

52.8

44.5

39.5

46.5

15.4

56.354.1

52.0

40.9

Mor

fess

or

Mor

fess

or

Mor

fess

or

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

Par

aMor

P.

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Zem

an

48.5

Page 86: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

86

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

60.8

52.9

48.2

24.7

21.9

47.2 47.8

40.6

37.1

34.0

52.8

44.5

39.5

46.5

15.4

56.354.1

52.0

40.9

Mor

fess

or

Mor

fess

or

Mor

fess

or

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

Par

aMor

P.

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Zem

an

48.5

Sometimes Morfessor wins

Page 87: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

87

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

60.8

52.9

48.2

24.7

21.9

47.2 47.8

40.6

37.1

34.0

52.8

44.5

39.5

46.5

15.4

56.354.1

52.0

40.9

Mor

fess

or

Mor

fess

or

Mor

fess

or

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

Par

aMor

P.

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Zem

an

48.5

Sometimes ParaMor wins

Page 88: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

88

Linguistic EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

50

30

10English German Finnish Turkish Arabic

60.8

52.9

48.2

24.7

21.9

47.2 47.8

40.6

37.1

34.0

52.8

44.5

39.5

46.5

15.4

56.354.1

52.0

40.9

Mor

fess

or

Mor

fess

or

Mor

fess

or

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

Par

aMor

P.

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

Zem

an

48.5

ParaMor and Morfessor are Complementary

Page 89: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

89

IR EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

45

35

25English German Finnish Turkish Arabic

39.4

36.4

39.339.9

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Page 90: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

90

IR EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

45

35

25English German Finnish Turkish Arabic

39.4

47.3

36.4

46.7

39.3

36.3

39.9

47.3

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Page 91: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

91

IR EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

45

35

25English German Finnish Turkish Arabic

39.4

47.3

49.2

36.4

46.7 46.8

39.3

36.3

39.739.9

47.3

Mor

fess

or

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

46.7

Page 92: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

92

IR EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

F1

45

35

25English German Finnish Turkish Arabic

39.4

47.3

49.2

36.4

46.7 46.8

39.3

36.3

39.739.9

47.3

Mor

fess

or

Mor

fess

or

Mor

fess

or

Par

aMor

Par

aMor

Par

aMor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Par

aMor

& M

orfe

ssor

Ber

nhar

d

Ber

nhar

d

Ber

nhar

d

46.7

Page 93: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

93

ParaMor: State-of-the-Art Unsupervised Morphology Induction System

ParaMorIdentifies paradigms

The organizing structure of inflectional morphology

Segments words As discovered paradigms suggest

Combined with MorfessorAmong the best in Morpho Challenge

Consistent across languages

Page 94: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

94

The Next Steps for ParaMor

Beyond suffixesStraightforward extension to ParaMor for

Prefixes

More ChallengingReduplication, Infixation, etc.

Morphophonology

Incorporate contextual information when clustering

Improve system combinationTrue merging of analysesCombine more systems

Page 95: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

95

Page 96: ParaMor & Morpho Challenge 2008 Christian Monson Jaime Carbonell, Alon Lavie, Lori Levin

96